How to feed data to multithreaded program?

Q139 · 11 Aug 2016, 18:32

HI,
How to feed double data to multithreaded program.
Data is vector of 100k+ structs and only gets readed ,no writes.
Sometimes only 2-3 doubles of 6 needed and mostly acessed by stepping foward or backwards, sometimes acessed randomly.
Data currently consists of vector containing struct with 6 doubles and for each thread seperate copy of the vector.
For some reason it increased speed when each thread has seperate copy of data but it also increase ram usage.

What could cause speed to increase if each thread has seperate copy , would it not be more likely that cache holds next data if no seperate copy for each thread?

I read somewhere that if 2 threads acess same data and one write something then cache gets re synced to ram but if all threads only read would 2 threads accessin same data cause any speed problems?

If i would test with 6 seperate vectors instead of struct and only have one instance of data shared by all threads, should the algo run faster if running on multiple threads?

SGaist · 9 Nov 2016, 16:06

Hi,

What hardware are you running your application on ?

There are several factors to into account between the multiples caches available, the type of structure you are using, how you arrange your data and how you access them.

Can you give more details about your setup ?
And the data structures you are using.

Q139 · S SGaist 8 Nov 2016, 21:36

@SGaist Currently running on 5930k cpu with about 20% overclock.
6 x 32 KB 8-way set associative instruction caches
6 x 32 KB 8-way set associative data caches
Level 2 cache size 6 x 256 KB 8-way set associative caches
Level 3 cache size 15 MB 20-way set associative shared cache
Cache line size 64 bytes

Data structure contains 5-6 double values.
Mostly acessed by creating object of struct and loading the needed strut to object.
Other way might be to acess each variable in struct directly without object.
All variables are not always needed , mostly 2-3 of 5-6.
Tested with data shared between threads vs seperate copy for each.
If very little data it work faster by having seperate copies for all but at some point shared is faster and if hundreads of megabytes of data the speed remains about same with both ways.
It may have something to do with cache misses or if multiple threads happen to acess same data.
Every last percent of performance does not matter but coding method that can improve are good to know.