Compute mean and diff faster
14 views (last 30 days)
Show older comments
Hello everyone. I am working on a FD code and need to do a lot of averaging in very large 3D/2D matrices. I want to do the following task, take 1D as an example.
A vector is: A=[a,b,c,d,e,f], and I want to get the average value in between each two neighboring values, so I do A=(A(2:end)+A(1:end-1))/2. And I also do diff a lot, e.g. diff(A,1). And it is the same in 2D or 3D cases. But it becomes very slow when I am dealing with very large matrix, say 1000*1000*1000. Is there any faster way to do this?
Thank you very much.
0 Comments
Accepted Answer
John D'Errico
on 28 Dec 2017
Edited: John D'Errico
on 28 Dec 2017
Get more memory. Huge problems require more memory, or a cup of coffee. Sit down, relax, take out that old copy of War and Peace, and read away.
If your matrix is 1000x1000x1000, then it has 1e9 elements, each of which will require 8 bytes of RAM to store. So that matrix uses roughly 8 gigabytes of RAM to store.
Now, when you compute some operation on your array, like diff or the local average between consecutive elements, this creates a NEW array that is almost the same size. So a new array is formed that also requires 8 gigabytes of RAM.
Every copy of that array forces MATLAB to allocate 8 more gigabytes of RAM. How many gigs of RAM does your computer have? For example, mine is now just a bit old, so it has only 8 gigs in total.
What does MATLAB do when it runs out of RAM? It starts swapping things around, using virtual memory. That gets SLOW, real fast, even if it can find the disk space to do so.
So if you want your computations to be faster, you need more memory.
A poor alternative might be to use singles, instead of doubles. Create your matrix as a single array, and it will now require 4 gigabytes of RAM. It is still gonna be a memory hog, but a slightly leaner one. The cost of course is a loss of precision in your computations.
2 Comments
John D'Errico
on 28 Dec 2017
Two copies of an 8 GB array require 16 GB. Don't forget that MATLAB itself consumes some RAM. So it does not matter how the computation is done, you will start having problems as soon as you start to push that limit. If your computations can tolerate the use of single, go in that direction.
If you have an actual disk drive that uses s spinning platter, the best thing you can do is replace the disk drive with a SSD drive. That can hugely increase your VM access speed, limited by the bus speeds between memory and your drive. SSD drives are not that expensive. Mine was well worth the money spent to keep my computer hopping along happily for a few more years.
More Answers (2)
Benjamin Kraus
on 28 Dec 2017
It may be time to start looking into the Parallel Computing Toolbox or some of the new Big Data capabilities in MATLAB (such as tall arrays). Some links to check out:
0 Comments
David Santos
on 20 Aug 2019
I will recomend you to put all your data in a big .mat matrix using matfile(doesn't load all the data in memory just the necessary) and process in chunks, preferably by columns.
Doing this you can control the ammount of data you put into memory and been able to process very long matrix (> 1TB).
Tall arrays are ok if you don't need to acess to all the data because once you increase the number of acces it becames slower than matfiles
All the best
0 Comments
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!