Although multi-threading is an essential topic in the times of multi-core processors, the obvious lack of answers is a clear mark:
There is no sufficient general strategy to decide for the number of cores.
There are too many factors, which influence the efficiency of the distribution to threads:
- Hyper-threading can increase or decrease the processing time.
- The (dynamically changing) number of not busy cores is unknown.
- Turbo-boost can slow down the processing, when more cores are active. (Btw., "turbo-boost" is a funny name for a feature, which slows down the processor, when all cores are busy. It is amusing, that the processor manufacturers choose the opposite view, that it runs faster, when some cores are sleeping.)
- It cannot be estimated, if the caches are exhausted.
- The time to start a new thread differs widely between different processors.
Therefore the following might be a fair solution:
- Let N be the number of real (or virtual) processors.
- Split the work into M independent chunks.
- While there are unprocessed chunks and the number of started threads <= N
- Start a new thread which fetches a new chunk autonomously until all data are processed.
- Back to 3.
- Close threads.
Then starting a new, but not needed thread wastes time, but only on a core, which is not working on the problem yet. A small problem will be solved before all threads are started.
Of course this is not an optimal strategy also, but I think it is more efficient than any fixed relation between the number of indpendent chunks and cores.