I have written a very tricky and large bit of code. Its processing a data set of 5 million values. In outline the code goes like this.
1. Outer parfor loop (1: 500K).
2. Next loop (1: ~100)
3. Test lots of conditions
4. Inner for loop. 1: K. Assign values to a growing cell array, where K is the length of it. Each cell contains a struct, which in turn contains cell arrays (Its a really high dimensional data set!).
The problem is that it currently takes 6 seconds to carry out one run of the outer loop. I need to dramatically speed it up. (24 hours run time would be ok. 24 mins would be better :)).
I have used the profiler extensivly, and other than a warning telling me to pre-allocate, it all looks ok.
Stuff I am NOT currently doing:
1. pre-allocate the cell object. The reason is that I would need to search through the object to find which values are active or not each time I accessed the object. I assume this would take more time than would be saved by pre-allocation.
2. Vectorization. This is what I normally use when possible. However, this is such a complex bit of code, with many loops inside loops I dont even know where to start. Any hints?
3. I have the parallel toolbox, though only one 64 bit machine with enough RAM to load the dataset. Should I be using this? I have not done so before.
I am using win7 on a quod core machine with 16GB.
Any sensible comments welcome. thank you.