Can I tell Matlab not to use contiguous memory?
21 views (last 30 days)
Show older comments
Christopher Thomas
on 24 Mar 2021
Commented: Christopher Thomas
on 6 Apr 2021
Matlab eats enormous amounts of memory and rarely or never releases it. As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks. As the heap becomes fragmented, Matlab asks for new memory when it wants to allocate a variable larger than the largest available contiguous fragment. MathWorks' recommended solution is "exit Matlab and restart". The only defragmentation option at present is "pack", which saves everything to disk, releases everything, and reloads variables from disk. This a) takes a while and b) only saves variables that are 2 GB or smaller.
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
The only reasons I can think of for asking for contiguous physical memory would be to reduce the number of TLB misses (for code running on CPUs) or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments. Like most of the other people who were complaining about this issue, I'd rather have TLB misses and give up GPU acceleration for my tasks and not run out of memory. I understand that for large problems on large machines, these features are important, but it's very strange that there's no way to turn it off.
(Even for our big machines, RAM is more scarce than CPU power or time, so "throw more RAM in the machine" is not viable past the first quarter-terabyte or so.)
Edit: Since there is apparently confusion about what I'm asking:
- Matlab certainly stores variables in contiguous virtual memory. As far as user-mode code is concerned, an array is stored as an unbroken block.
- Normal user-mode memory allocation does not guarantee contiguous physical memory. Pages that are mapped to contiguous virtual addresses may be scattered anywhere in RAM (or on disk).
- Previous forum threads about Matlab's memory usage got responses stating that Matlab does ask for contiguous physical memory, requiring variables to be stored as unbroken blocks in physical RAM (pages that are adjacent in virtual address space also being adjacent in physical address space).
- The claim in those past threads was that Matlab's requirement for contiguous physical memory was responsible for its enormous memory use under certain conditions.
- If that is indeed the case, I wanted to know if there was a way to turn that allocation requirement off.
As of 02 April 2021, I've gotten conflicting responses about whether Matlab does this at all, and have been told that if it does do it there's no way to turn it off and/or that turning it off would do horrible things to performance. I am no longer sure that these responses were to my actual question; hence the clarification.
Edit: As of 06 April 2021, consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.
8 Comments
Bruno Luong
on 4 Apr 2021
So after many elaboration, the question now becomes "can I tell MATALB to use contiguous (physical) memory?".
Accepted Answer
More Answers (4)
Walter Roberson
on 27 Mar 2021
You are mistaken.
>> clearvars
>> pack
>> foo = ones(1,2^35,'uint8');
>> clearvars
I allocated 32 gigabytes of memory on my 32 gigabyte Mac, it took up physical memory and virtual memory, and when I cleared the variable, MATLAB returned the memory to the operating system.
There has been proof posted in the past (but it might be difficult to locate in the mass of postings) that MATLAB returns physical memory for MS Windows.
I do not have information about Linux memory use at the moment.
MATLAB has two memory pools: the small object pool and the large object pool. I do not recall the upper limit on the small object pool at the moment; I think it is 512 bytes. Scalars get recycled a lot in MATLAB.
Historically, it was at least documented (possibly in blogs) that MATLAB did keep hold of all memory it allocated, and so could end up with fragmented memory. But I have never seen any evidence that that was a physical memory effect: you get exactly the same problem if you ask the operating system for memory and it pulls together a bunch of different physical banks and provides you with the physical memory bundled up as consecutive virtual addresses.
At some point, evidence started accumulating that at least on Windows, at least for larger objects, MATLAB was using per-object virtual memory, and returning the entire object when it was done with it, instead of keeping it in a pool. I have not seen anything from Mathworks describing the circumstances under which objects are returned directly to the operating system instead of being kept for the memory pools.
Side note: I have proven that allocation of zeros is treated differently in MATLAB. I have been able to allocate large arrays, and then when I change a single element of the array, been told that the array is too large.
12 Comments
Bruno Luong
on 5 Apr 2021
Edited: Bruno Luong
on 5 Apr 2021
"Not inplace change, no: there are places now where if you index an array, then instead of making a copy of the desired section, that MATLAB instead creates a new header that points "inside" the existing data."
I think we speak about the same thing: I call this "inplace" in the sense that data (RHS) is inplace.
II have submitted such package in the pass on FEX, it works with some older versions then it is broken since MATLAB probihits users for doing that and constantly they change their data management since. I stopped trying to follow them, and I must admit that I can't even follow them for lack of published information that I need to make such pakage work reliably.
But now it is integrated in the engine, such package is no longer relevant.
Joss Knight
on 27 Mar 2021
You might want to ask yourself why you need so many variables in your workspace at once and whether you couldn't make better use of the file system and of functions and classes to tidy up temporaries. If you actively need your variables, it's probably because you're using them, which means they're going to need to be stored in contiguous address space for any algorithm to operate on them efficiently. If it's data you're processing sequentially, consider storing as files and using one of MATLAB's file iterators (datastore) or a tall array. If it's results you're accumulating, consider writing to a file or using datastore or tall's write capabilities.
Memory storage efficiency is ultimately the job of the operating system, not applications. If you want to store large arrays as a single variable but in a variety of physical memory locations, talk to your OS provider about that. They in turn are bound by the physics and geography of the hardware.
16 Comments
Bruno Luong
on 6 Apr 2021
I have work more than 20 years with MATLAB with small, large simulation from simple tasks but long simulation (that last weeks to months) to complex tasks (simulation a full autonomous twin robots working during day).
AFAIR I never seen memory increases foreever due to memory fragmentation.
In my various expexriences, once the simulation is going, the memory state quickly stabilize.
If it doesn' stabilize than your simulations must do something constantly different over time in the computer memory.
Anyway up to you to stick with your assumption.
Jan
on 24 Mar 2021
Edited: Jan
on 24 Mar 2021
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
No, there is absolutely no chance to impelement this. All underlying library functions expect the data to be represented as contiguous blocks.
If you need a distributed representation of memory, e.g. storing matrices as list of vectors, you have to develop the operations from scratch. This has severe drawbacks, e.g. processing a row vector is rather expensive (when the matrix is stored as column vectors). But of course it works. You only have to relinquish e.g. BLAS and LAPACK routines.
I've written my first ODE integrator with 1 kB RAM. Therefore I know, how lean the first 250 GB of RAM are. But the rule remains the same: Large problems need large machines.
7 Comments
Jan
on 2 Apr 2021
I've clarified my mistake in my former comment: Matlab does use the pagefile under Windows.
Your text code shows, that Matlab uses a copy-on-write method. This is documented. Your assumption, that this is done automatically using exceptions for write-protected virtual pages, does not match the implementation in Matlab.
Steven Lord
on 25 Mar 2021
You know, I really want to read the newest Brandon Sanderson novel. But I don't have room on that shelf in my bookshelf for the volume. Let me store pages 1 through 20 on this bookshelf upstairs. Pages 21 through 40 would fit nicely in that little table downstairs. Pages 41 through 50 can squeeze into the drawer on my nightstand upstairs. Pages 51 through 80 could get stacked on top of the cereal in the kitchen cupboard downstairs. Pages ...
That's going to make it a lot more time consuming to read the new book. And since Sanderson's newest book is over 1200 pages long, I'm going to wear a path in the carpet on the stairs before I'm finished.
So no, there is no setting to tell MATLAB not to use contiguous memory.
The bits may be physically located in different locations on the chip, but to MATLAB and to the libraries we use they have to appear contiguous. Since in actuality I'm more likely to read that Sanderson novel on my tablet, the pages could be stored as a file split across different locations in the physical hardware of the SD card in my tablet but the reader software handles it so I see the pages in order and I would be annoyed if I had to manually switch to reading a different book every chapter to match the physical location of the data.
6 Comments
John D'Errico
on 27 Mar 2021
Remember that knowing those elements are stored contiguously in memory is a hugely important feature, and making them stored in those memory locations improves the way the BLAS works. And that is a big feature in terms of speed. So while a few people MIGHT want to have a feature that would slow down MATLAB for everybody else, I doubt most users would be happy to know that because one person thinks it important, suddenly a matrix multiply is now significantly slower.
It won't happen, nor would I and a lot of other people be happy if it did.
See Also
Categories
Find more on Scope Variables and Generate Names in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!