Hi, I am helping an organization process many files (several terabytes). I have set it up where the script runs a `parfor` loop for each file to be processed. Each worker runs a function that will process the file number. Therefore, there should be minimal or no communication between each worker because the only thing passed is the file number.
My problem is that the code runs by using what appears to be only half the cores (physical cores). We have a NUMA server consisting of two 10-core(20 logical) Intel processors.
Currently I am running using different manual parpool numbers (14-20) to see if that makes any difference in the speed of processing a single file. When it runs, starting with parpool(14), Windows Resource Monitor shows that NUMA node 1 is almost maxing out (both physical and logical cores), but NUMA node 0 has minimal (average less than 20% total) use. Do I need to set up a distributed, albeit local, cluster to make both NUMA nodes run?
I'm not looking to simply make every processor run to 100%, but that I would like to process these files in less time.