Can I tell Matlab not to use contiguous memory?

Question

0 votes

Matlab eats enormous amounts of memory and rarely or never releases it. As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks. As the heap becomes fragmented, Matlab asks for new memory when it wants to allocate a variable larger than the largest available contiguous fragment. MathWorks' recommended solution is "exit Matlab and restart". The only defragmentation option at present is "pack", which saves everything to disk, releases everything, and reloads variables from disk. This a) takes a while and b) only saves variables that are 2 GB or smaller.

Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?

The only reasons I can think of for asking for contiguous physical memory would be to reduce the number of TLB misses (for code running on CPUs) or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments. Like most of the other people who were complaining about this issue, I'd rather have TLB misses and give up GPU acceleration for my tasks and not run out of memory. I understand that for large problems on large machines, these features are important, but it's very strange that there's no way to turn it off.

(Even for our big machines, RAM is more scarce than CPU power or time, so "throw more RAM in the machine" is not viable past the first quarter-terabyte or so.)

Edit: Since there is apparently confusion about what I'm asking:

Matlab certainly stores variables in contiguous virtual memory. As far as user-mode code is concerned, an array is stored as an unbroken block.
Normal user-mode memory allocation does not guarantee contiguous physical memory. Pages that are mapped to contiguous virtual addresses may be scattered anywhere in RAM (or on disk).
Previous forum threads about Matlab's memory usage got responses stating that Matlab does ask for contiguous physical memory, requiring variables to be stored as unbroken blocks in physical RAM (pages that are adjacent in virtual address space also being adjacent in physical address space).
The claim in those past threads was that Matlab's requirement for contiguous physical memory was responsible for its enormous memory use under certain conditions.
If that is indeed the case, I wanted to know if there was a way to turn that allocation requirement off.

As of 02 April 2021, I've gotten conflicting responses about whether Matlab does this at all, and have been told that if it does do it there's no way to turn it off and/or that turning it off would do horrible things to performance. I am no longer sure that these responses were to my actual question; hence the clarification.

Edit: As of 06 April 2021, consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.

8 Comments
Show 6 older comments Hide 6 older comments

Christopher Thomas on 2 Apr 2021

Edited: Christopher Thomas on 2 Apr 2021

Short anwer: I don't know offhand.

It's even possible that Matlab itself isn't asking for contiguous physical memory (which would make this entire thread moot).

I'd seen previous forum responses indicating that it was, but I'm starting to wonder if those were unreliable. I'd have to run Matlab, set up an appropriate memory allocation test, and then manually check the page tables to be certain of that (on my OS of choice that's doable but fiddly, so I'd be spending a day on it; your OS experiences may vary).

I'd looked into Octave, but it has enough missing elements that it would not be a useful replacement for Matlab for what our lab is using it for (this will vary by user; it may be fine for your purposes). It's using decent back-end libraries but by default those are single-threaded. You can compile it to link against multi-threaded back-end libraries but forum reports suggest that doing so is a pain. That also won't help you enough on a highly-parallel machine; for that you'd need support for Matlab's "parfor" and similar functions. This does not seem to be in Octave [at] present (they support fork(), but the entire point of using Octave at all would be to be Matlab-compatible). Matlab also has a just-in-time compiler that makes interpreted code run at near-native speed, and if I understand correctly Matlab will vectorize loops if dependencies allow it. Last I checked Octave doesn't do that, so the only operations that would be fast would be operations handled by the back-end libraries. As for whether Octave asks for contiguous pages/allows fragmentation or not, I'd have to dig into the source code to find out. I'm not planning to do so any time soon.

The other option that comes up in the forum is writing in Python using NumPy and SciPy. These may work, but would again not have automatic vectorization and if I'm reading the Python documentation correctly [it] wouldn't be compilable either. You would also be responsible for your own multi-threading. For our lab, we'd have the added hassle of having to rewrite all of our existing scripts (an enormous effort).

Long story short, Matlab has invested enough time into their tools that - for our lab's purposes - they have a clear advantage over the free alternatives. Whether that's true for you depends on exactly what you're doing.

Bruno Luong on 2 Apr 2021

Edited: Bruno Luong on 3 Apr 2021

Disclaim: My expertise of memory management is limited. But I have a decent knowledge with MATLAB provided I'm not working for TMW thus some of the thing I expose here might not be accurate.

But to me when user asks to allocate a big array, MATLAB simply calls a mxMalloc in their API which is no more than the C single malloc() with the size of the array, presumably close to OS HEAP allocation with contiguous address, fill this memory block by 0s, and then it might use some tracking management system on top for garbage collecting purpose.

It seems the malloc() used by MATLAB is entirely taken over by OS kernel control as with 99% apps, and they don't do any anything OS specific or customized. The swap seems to me is handled by OS, not by MATLAB where the physical RAM requested is not available.

This method has not changed by TMW since many years, and a lot of library, Blas, Lapack, stock MEX functions, user MEX functions have be built based on this assumption for such long time that there is 0 chance that can be changed to ensure obvious backward compatibility.

I think the contiguous adressing makes the processing really faster. If you claim that only contiguous physical memory can be befenit for speed, and contiguous in virtual address does not matter, then we clearly have different view, and certainly one of use is wrong.

Bruno Luong on 4 Apr 2021

So after many elaboration, the question now becomes "can I tell MATALB to use contiguous (physical) memory?".

Christopher Thomas on 5 Apr 2021

If Matlab is not presently asking for contiguous physical memory, I'm not particularly interested in getting it to do so; that would be a topic best moved to its own forum post.

I took a look at Linux's kernel functions for memory management last week (as the last time I had to manage physical memory was quite a few years ago). Long story short, it could be done but would be ugly. The situation is probably similar for other OSs. Further details are beyond the scope of this thread.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Christopher Thomas on 6 Apr 2021

0 votes

Consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 2

Walter Roberson on 27 Mar 2021

Open in MATLAB Online

3 votes

You are mistaken.

>> clearvars
>> pack

>> foo = ones(1,2^35,'uint8');

>> clearvars

I allocated 32 gigabytes of memory on my 32 gigabyte Mac, it took up physical memory and virtual memory, and when I cleared the variable, MATLAB returned the memory to the operating system.

There has been proof posted in the past (but it might be difficult to locate in the mass of postings) that MATLAB returns physical memory for MS Windows.

I do not have information about Linux memory use at the moment.

MATLAB has two memory pools: the small object pool and the large object pool. I do not recall the upper limit on the small object pool at the moment; I think it is 512 bytes. Scalars get recycled a lot in MATLAB.

Historically, it was at least documented (possibly in blogs) that MATLAB did keep hold of all memory it allocated, and so could end up with fragmented memory. But I have never seen any evidence that that was a physical memory effect: you get exactly the same problem if you ask the operating system for memory and it pulls together a bunch of different physical banks and provides you with the physical memory bundled up as consecutive virtual addresses.

At some point, evidence started accumulating that at least on Windows, at least for larger objects, MATLAB was using per-object virtual memory, and returning the entire object when it was done with it, instead of keeping it in a pool. I have not seen anything from Mathworks describing the circumstances under which objects are returned directly to the operating system instead of being kept for the memory pools.

Side note: I have proven that allocation of zeros is treated differently in MATLAB. I have been able to allocate large arrays, and then when I change a single element of the array, been told that the array is too large.

12 Comments
Show 10 older comments Hide 10 older comments

Walter Roberson on 30 Mar 2021

Did I say that MATLAB does not use swap? Did I use the word "swap" anywhere in my Answer?

What I said is that you are mistaken. In particular, you started your posting by saying,

"Matlab eats enormous amounts of memory and rarely or never releases it."

and I demonstrated that (at least on Mac) that it releases large objects promptly. I have seen posts in the past that show the same thing for Windows. I have not happened to see any relevant posts about the Linux memory handling.

"As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks."

I just did some process tracing to be sure... it is not impossible that I missed something, but as far as I could tell, MATLAB made no attempt to allocate physical memory, only virtual memory. Do you have some process trace logs showing MATLAB requesting memory that had to be physically contiguous (rather than memory that was given to it with continguous virtual addresses, with the physical memory being allocated in however many fragments the operating system felt like) ?

I just checked my MATLAB installation, and I cannot see any setuid or seteuid in the installation, but user-mode processes cannot (specifically) allocate contiguous physical memory

or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments.

https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/KernelProgramming/vm/vm.html

MacOS supports mapping contiguous hardware addresses (such as for PCI) to non-contiguous physical addresses.

The other option is to move your data to the GPU's own memory, but my understanding is that that isn't usually done (GPU memory is instead used as a cache for data fetched from main memory).

MATLAB GPU only supports NVIDIA CUDA devices at the moment. NVIDIA's programming model does permit "page locked host memory" (more commonly known as "pinned" memory) to share I/O space with CUDA devices; the details are discussed at

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#page-locked-host-memory

The important part of the discussion there is that the sharing techniques are expected to be limited, a scarce resource. It is clear from the discussion of the different kinds of memory that for CUDA devices, GPU memory is not only used as a "cache" for data from main memory: arranging data properly within the GPU is considered important for performance

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#data-transfer-between-host-and-device

64 bit processes use a Unified memory, described at https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd . It is designed to make memory access more efficient between host and GPU. It I understand my skimming properly, it does not require continguous physical pages.

Unified Memory has two basic requirements:

a GPU with SM architecture 3.0 or higher (Kepler class or newer)
a 64-bit host application and non-embedded operating system (Linux or Windows)

GPUs with SM architecture 6.x or higher (Pascal class or newer) provide additional Unified Memory features such as on-demand page migration and GPU memory oversubscription that are outlined throughout this document. Note that currently these features are only supported on Linux operating systems. Applications running on Windows (whether in TCC or WDDM mode) will use the basic Unified Memory model as on pre-6.x architectures even when they are running on hardware with compute capability 6.x or higher.

MATLAB's support for cc3.0 was removed as of R2021a, with support for cc3.5 and cc3.7 due to be removed in the next release. I believe we can deduce from that that MATLAB is not using the unified memory interface yet (unless I am misunderstanding the charts and it has been supporting it since R2018a), but perhaps it is on the way. But notice the part about how the advanced unified access is not available for Windows yet.

The document does not mention Mac because Mac is no longer supported by NVIDIA :( The CUDA drivers for Mac did not get further than Kepler.

My point ("and I do have one") is that:

Shared memory is only one of the ways to communicate with NVIDIA
NVIDIA never required contiguous physical memory, only that the memory is "host-locked" (pinned) -- in other words, memory that was being prevented from being swapped away
The unified memory model that is likely coming in (but I don't think is in place yet) will largely remove the need for the pinned memory
No, the on-device memory is not just acting like a cache

Walter Roberson on 4 Apr 2021

I am unclear as to what problem was originally being encountered?

For everything except possibly some device driver work, or possibly interface to GPU, MATLAB uses malloc() to allocate additional memory. malloc() is OS and library dependent as to exactly how it works, and we do not know at the moment which malloc() is being linked against (but probably the standard one rather than a specialized one.)

On Windows, MacOS, and Linux, if malloc() has to go to the operating system for more space, then the operating system allocates multiple physical memory pages and gathers them into one virtual space and returns the address of the virtual space.

On Windows, MacOS, and Linux, it is not certain that free() will necessarily return the allocated memory to the operating system, or the allocated memory might only be returned under some conditions. Sometimes operating systems provide special forms of allocating and releasing memory that make it easier for the operating system to reclaim the memory; there is at present no evidence that MATLAB is using those special forms.

Does memory released by MATLAB get returned to the operating system? Tests on Windows and MacOS suggest that Yes, large enough allocations get returned to the operating system. But that is not necessily the case for all allocations. Hypothetically there might be a bound below which instead of getting returned to the operating system, free memory is getting cached for reuse. That bound might be operating system dependent, or might depend upon the malloc() getting used.

At some point in the past, I found explicit documentation that MATLAB keeps two pools, one for small fixed-sized blocks, and one for larger objects; that when a large block was released, it was returned to the MATLAB-managed pool for potential re-allocation. However, I am having trouble locating that documentation now... and things might have changed since then.

Considering that [IIRC] MATLAB can crash if you malloc() yourself and put the address in a data-pointer field of an object that MATLAB can later release, it seems likely to me that MATLAB does have its own memory manager that can get confused when asked to release something it did not allocate. If MATLAB did not do any of its own memory management, then it would just free() and there would not be any problem.

So... hypothetically, the situation might be:

very small blocks such a descriptors of variables get allocated. They have available room for a fairly small number of memory elements, so scalars and very small vectors or arrays are written in directly instead of needing a separate memory block. These small blocks get actively managed by MATLAB; it uses them a lot and it makes sense to keep a pool of them instead of malloc()'ing each of them all the time
mid-sized blocks get allocated out of a MATLAB-managed pool, and get returned to the pool if the pool size is below a high-water mark, and otherwise released to the operating system
(uncertain) large-sized blocks get allocated, possibly not within any pool, and get returned to the operating system when done

That last hypothetical special handling is uncertain. Much the same behaviour could happen if the managed pool had an upper size limit but all non-small-block variables went through the managed pool: returning the memory for a large enough block would exceed the upper bound, so it would just naturally trigger release back to the operating system, with no special handling needed.

Can such an arrangement lead to the kind of problems that pack() is intended to deal with?

Yes, the pool of mid-sized blocks can get fragmented
Even if everything other than the small fixed-sized blocks is handled by the OS instead of managed by MATLAB, virtual address space can get fragmented

MATLAB does not, as far as I know, offer any controls over where in virtual address space that device drivers or DLLs get mapped.

Bruno Luong on 5 Apr 2021

Edited: Bruno Luong on 5 Apr 2021

"Not inplace change, no: there are places now where if you index an array, then instead of making a copy of the desired section, that MATLAB instead creates a new header that points "inside" the existing data."

I think we speak about the same thing: I call this "inplace" in the sense that data (RHS) is inplace.

II have submitted such package in the pass on FEX, it works with some older versions then it is broken since MATLAB probihits users for doing that and constantly they change their data management since. I stopped trying to follow them, and I must admit that I can't even follow them for lack of published information that I need to make such pakage work reliably.

But now it is integrated in the engine, such package is no longer relevant.

Christopher Thomas on 5 Apr 2021

Regarding the original problem being encountered, the issue was that several of the people at our lab were finding that Matlab would ask for far more memory than needed for the variables we thought we were storing, and that this was causing problems with our compute server (it's not handling hitting swap as gracefully as it should; that's its own problem).

I spent a bit of time forum-searching to see why this sort of thing happened and what to do about it, and forum consensus seemed to be what I presented in my original post (that Matlab was asking for contiguous physical memory and having serious problems if the heap became fragmented as a result). If that was the case, there seemed to be a straightforward solution, which I asked about.

If those original forum posts were in error, then I'm back to square one figuring out the conditions under which this occurs and how to prevent it.

Sign in to comment.

Answer 3

Joss Knight on 27 Mar 2021

2 votes

You might want to ask yourself why you need so many variables in your workspace at once and whether you couldn't make better use of the file system and of functions and classes to tidy up temporaries. If you actively need your variables, it's probably because you're using them, which means they're going to need to be stored in contiguous address space for any algorithm to operate on them efficiently. If it's data you're processing sequentially, consider storing as files and using one of MATLAB's file iterators (datastore) or a tall array. If it's results you're accumulating, consider writing to a file or using datastore or tall's write capabilities.

Memory storage efficiency is ultimately the job of the operating system, not applications. If you want to store large arrays as a single variable but in a variety of physical memory locations, talk to your OS provider about that. They in turn are bound by the physics and geography of the hardware.

16 Comments
Show 14 older comments Hide 14 older comments

Christopher Thomas on 1 Apr 2021

Since there is context you appear to have overlooked, I'll recap it for you:

You always get sequential virtual addresses, whether the pages are contiguous in physical memory or not.
Asking for pages to be contiguous in physical memory results in a much larger heap if heap fragmentation occurs. Under many conditions this results in Matlab asking the OS for a heap that's very much larger than the data being stored in the heap.
This enormous heap growth is a problem, that many users - including myself - have asked for a way to prevent.
You and others in this thread have asserted that allowing pages to be non-contiguous in physical memory would result in a large performance drop. This is manifestly not the case, for reasons that have already been explained in this thread in detail (addresses within pages are contiguous and most cache hits and misses will be the same under both scenarios).

You also appear to be falling back to one of the patterns of behavior other users have reported ("blame the OS heap manager"). The heap manager is doing exactly what you're asking it to.

What "gets my goat" is repeated assertions by many users in this thread who appear to have been under misconceptions about how virtual and physical addressing, paging, and caches work.

Christopher Thomas on 2 Apr 2021

Edited: Christopher Thomas on 2 Apr 2021

The people with a "staff" logo beside their name are (I would hope) answering in their formal capacity as Mathworks employees, on company time. As a result, I'm holding anyone with that logo to a (slightly) higher standard - one where I would expect that, if a disagreement occurs, they take a moment to doublecheck their position (and if they feel it necessary to either look up auxiliary information or pass the support ticket to someone else).

Regarding "allocating non-contiguous memory", that function is called "malloc()". This returns a buffer that is contiguous in virtual memory that may be fragmented in physical memory. To allocate pages that are contiguous to each other in physical memory, you need to either be in kernel space and call the appropriate OS-specific functions, or call some OS-specific user-space function for getting a physically contiguous buffer (if the OS provides such a function at all).

Your phrasing also suggests potential confusion between "non-contiguous memory" and pages that are not contiguous to each other. Within each page, addresses are always contiguous, which is why the claim elsewhere in this thread that non-contiguous pages have a large performance hit is puzzling.

Regarding "the answer is obviously no", I agree that this thread could have been over a long time ago if I'd gotten a straight answer. One plausible conversation along those lines would have been something like the following:

"Can I tell Matlab to use non-contiguous pages instead of requiring physically contiguous ones?"
"We don't offer that feature, sorry."
"Why not? It seems straightforward to implement and your users have been asking for it for 10-15 years."
"It may be, but the customers responsible for most of our revenue have asked that we prioritize working on different features, and I'm afraid those requests are our top priority."
[edit:] (alternatively) "It may be, but I'm afraid I don't have the answer to that question."

If your priority was to close this ticket, anything like that would have worked. Instead, there's been a conversation along the lines of the following:

"Can I tell Matlab to use non-contiguous pages instead of requiring physically contiguous ones?"
"That's impossible because (blatantly mistaken statement)."
"Um, no, I'm afraid you're mistaken about (item)."
[Cycle repeats for a week and a half and counting.]

I do not understand what is intended to be accomplished by this second conversation pattern. It's wasting your time too, which presumably doesn't benefit you or Mathworks. The fact that other users have been reporting this conversation pattern from Mathworks about this topic over the years is even more puzzling - it suggests that this pattern is a consistent deliberate choice on the part of Mathworks.

If you feel that I'm mistaken about something, by all means politely explain what you feel I'm overlooking, after doublechecking.

Joss Knight on 2 Apr 2021

If you want that kind of conversation, raise a Tech Support query, don't come to MATLAB Answers. This isn't a support ticket, and regardless of what it says next to my name, it is just me answering questions to the best of my knowledge in my own personal time. Hold me to a higher standard if you like, be more irritated with me if I'm wrong if you like, but it's not likely to get you where you want to go any faster. Being difficult isn't likely to make me think it's worthwhile pursuing this with greater depth.

Your answer about malloc just makes me more confused. Have you taken the MATLAB documentation regarding 'contiguous memory' to mean MATLAB uses something other than malloc? Because that really is what MATLAB does. MATLAB doesn't take any special steps to prevent memory from being non-contiguous in physical address space if that's what the OS wants to do. All the documentation is trying to do is point out that the elements of a numeric array are adjacent even when the array is multi-dimensional, while for structures and cell arrays they are not. Indeed, it's perfectly clear that MATLAB doesn't force physical adjacency since it's easy to see that MATLAB uses swap space just by checking the Task Manager or top.

Joss Knight on 2 Apr 2021

Edited: Joss Knight on 3 Apr 2021

MATLAB stores variables in continuous address space. I don't know what past replies have made you think this somehow means that MATLAB prevents the OS from allocating memory in whatever way it normally does. I don't even know how that's possible. All that matters is that MATLAB does not store one array in multiple, independently allocated blocks. That is what causes MATLAB to exhibit certain behaviour like running out of memory or entering swap when there is still theoretically enough memory left for a new array, or needing to perform copies of significant size when arrays are resized. Which is the sort of thing people tend to ask questions about. I was imagining you were going to tell me there was a way to ask the OS to allocate memory in some way that is more efficient with physical memory at the cost of performance, but it seems like you thought the other way round - that MATLAB was doing something special to prevent that happening.

If I seem stupid to you it's because I don't understand this distinction you make between physical RAM and virtual addressing, and I think my original post makes that pretty clear. Only the OS and probably the BIOS decides how addresses map to physical memory. I assumed when you said physical you were making some sort of distinction between memory from a single allocation and an array made up of multiple allocations. Similarly, you used the terms contiguous and fragmented like that is what you were requesting - an array made up from multiple allocations that can therefore better handle fragmented memory.

So I guess this answers your question? That MATLAB already does what you wanted? I certainly can't rule out both further limitations on my ability to understand what you want and what you mean, my knowledge of the way OSs and computer hardware manage memory, and on my precise knowledge of MATLAB's memory management system. I'll do my best to help though. I'm stubborn that way.

Christopher Thomas on 6 Apr 2021

Per the original post, the symptom being seen is that Matlab is often grabbing far more memory than it should need (based on the variable sizes reported by "whos"), and that the memory footprint rarely decreases (instead slowly growing over time even when the amount that "should" be used doesn't).

This causes problems when Matlab's footprint becomes larger than the amount of physical memory on the compute server we use, as a) swapping is slow and b) the compute server can become unstable when sudden large demands on swap space are made (that's a problem with the compute server configuration, which is being followed up on with our server admins).

Per my original post, previous forum threads had stated that this was due to the way that Matlab handled memory allocation - asking for contiguous physical memory - which was something that should be straightforward to change (hopefully straightforward enough that there would be a configuration switch for it). Those forum posts appear to have been in error, meaning that there isn't a straightforward way to get Matlab to use less memory for a given collection of variables.

The next step for me would be to run a lengthy series of tests figuring out exactly what Matlab's memory allocation patterns are (and checking the OS's allocation patterns for Matlab's memory while I'm at it). That would take enough of a time investment that other approaches to mitigating the problem are easier (convincing the worst-affected users to refactor their code, convincing the boss to spend $5k on more RAM for the compute server, convincing people to pay to run their tasks on the supercomputer rather than on the compute server).

My goal is to have the affected users (myself and two others) be able to run the data processing tasks that they need to without the machines they're running the tasks on misbehaving. There are several approaches to achieving that goal, and this particular one ("ask about the previously reported memory allocation strangeness on the Matlab forum") has reached the point of diminishing returns. I'll mark this thread "closed" and move on to other approaches shortly.

(In case you're wondering, my best guess now is that Matlab and the OS are both trying to do slab-based heap management and their attempts are interacting with each other in bad ways under some conditions. Rather than trying to test that guess, I'm going to pursue other approaches to resolving the problem instead.)

Bruno Luong on 6 Apr 2021

I have work more than 20 years with MATLAB with small, large simulation from simple tasks but long simulation (that last weeks to months) to complex tasks (simulation a full autonomous twin robots working during day).

AFAIR I never seen memory increases foreever due to memory fragmentation.

In my various expexriences, once the simulation is going, the memory state quickly stabilize.

If it doesn' stabilize than your simulations must do something constantly different over time in the computer memory.

Anyway up to you to stick with your assumption.

Christopher Thomas on 6 Apr 2021

I've been using Matlab since 2003 and have been coding much longer than that. I acknowledge your expertise, but it's pretty obvious that our tasks have different memory access patterns.

For my tasks and one other user's tasks, intermediate results are frequently allocated and de-allocated. These intermediate results are not necessarily the same size (it depends on the data). If there's a situation where subsequent buffers are allocated that are larger than their predecessors, that's exactly the type of scenario where heap fragmentation might occur (buffer of size N is allocated, smaller objects are allocated that are placed after it, buffer of size N is released freeing up a slot of size N, buffer of size N+K is allocated, won't fit into that slot, and is placed at the end of the heap instead).

I haven't looked at the third user's code, so I can't comment on whether this specific case is driving their memory usage or not.

Yes, the code could be rewritten to change those allocation patterns. There is a different, simpler rewrite that addresses the problem in a different way that I've already suggested to the other user instead. My priority at this point is finding the solutions that involve the smallest programmer time investment, as that is the most scarce resource in our lab at the moment (we're not flush with money either, but time is still harder to come by).

Sign in to comment.

Answer 4

Jan on 24 Mar 2021

Edited: Jan on 24 Mar 2021

1 vote

Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?

No, there is absolutely no chance to impelement this. All underlying library functions expect the data to be represented as contiguous blocks.

If you need a distributed representation of memory, e.g. storing matrices as list of vectors, you have to develop the operations from scratch. This has severe drawbacks, e.g. processing a row vector is rather expensive (when the matrix is stored as column vectors). But of course it works. You only have to relinquish e.g. BLAS and LAPACK routines.

I've written my first ODE integrator with 1 kB RAM. Therefore I know, how lean the first 250 GB of RAM are. But the rule remains the same: Large problems need large machines.

7 Comments
Show 5 older comments Hide 5 older comments

Christopher Thomas on 25 Mar 2021

I'll put this more clearly: Matlab already does use virtual addressing for meory access. Your machine is not running in ring zero once it's finished booting. When you ask the OS for contiguous physical memory, what you get is a set of pages that are adjacent in physical memory rather than wherever the OS decided to put them. Memory access goes through the page table in both cases.

What's changing is your memory access patterns, which affects the number of cache misses and TLB misses (to some extent; cache lines are much smaller than pages, so most of the cache misses will still happen, and the number of pages your data is spread across is the same, so the number of page lookups and your pressure on the TLB are similar).

Since the actual assembly-language instructions used to access memory are identical (only the page table layout changes), there is no downside to allowing non-contiguous memory allocation if the user asks for it. Anyone who wants to use contiguous physical memory (due to using DMA hardware or for some other reason) can still get it, by leaving the configuration switch at the default setting.

Christopher Thomas on 26 Mar 2021

In case anyone else is under misconceptions about Matlab's use of virtual addressing - using swap memory, which Matlab does, requires virtual addressing.

(Pages that aren't presently in memory are marked as unusable, generating a protection fault when the application tries to access them; the OS shuffles physical pages to and from disk, points the page table entry for the attempted access at the newly-loaded physical page, and returns control to the application).

Matlab also makes extensive use of copy-on-write for passing function arguments, which requires virtual addressing. Function arguments are passed by value but aren't actually copied unless the function changes them.

(Pages in the original copy are marked as read-only, and new page table entries are created with a different virtual address range but mapping to the same physical pages in RAM. These are also marked as read-only. When either the caller or the function try writing to their copies, a protection fault occurs. The OS copies the page that was aliased to a new location, so that the caller and function now have different copies of that portion of the data, and returns control to the application. For structures allocated in contiguous physical memory, all pages are copied when the fault occurs rather than just the page that was written to.)

Jan on 26 Mar 2021

Edited: Jan on 2 Apr 2021

Open in MATLAB Online

[EDITED, original]:

I tried it: My Matlab does not use the pagefile. If the RAM is exhausted, it is exhausted and a huge pagefile does not allow Matlab to create larger arrays.

[EDITED, fixed]: This was a mistake from a test in a virtual machine. Matlab does use the pagefile under Windows. Increasing the size of the pagefile allows to create larger arrays.

It was your initial point, that "Matlab stores variables in contiguous physical memory blocks". Now you claim, that virtual addressing is used instead, which would allow an automatic paging. This is a contradiction.

"Matlab also makes extensive use of copy-on-write for passing function arguments, which requires virtual addressing."

No, this does not need virtual addressing. Of course you can implement a copy-on-write strategy by an exception of a write-protected page, but this does not match the behaviour of Matlab: You can write to the memory directly inside a C-mex function and this destroyes the copy-on-write strategy: By this way you can poke into variables, which share the memory but are not provided as input:

x = ones(1, 1e6);
y = x;
yourCMexFunction(y);
% ==>
   *mxGetPr(prhs[0]) + 1 = 5;
% <==
x(1:3)   % [1 5 1] !!!

There is no automatic detection of a write access. This is a severe problem and discussed in the Matlab forums exhaustively for 25 years.

"For structures allocated in contiguous physical memory, all pages are copied when the fault occurs rather than just the page that was written to."

This is exactly what happens in Matlab.

Obviously your conceptions about Matlab's memory management are not matching the facts. You sound very convinced, if you try to tell others about their "misconceptions". Unfortunately you do not understand the topic you are talking of.

By the way, you could control the memory manager of Matlab 6.5 with different startup parameters. As far as I know this was not officially documented. You still find the undocumented mex functions mxSetAllocFcns and mxSetAllocListeners in the libraries. The licence conditions forbid a reverse-enineering of these functions.

You can simply write some Mex function, which allocate memory by malloc and VirtualAlloc, and compare the run times when calling optimized BLAS and LAPACK functions. My conclusion: I do not want this for standard variables. If some data exhaust my computer, I buy a larger computer or use tall array and a distributed processing in a cluster.

Christopher Thomas on 30 Mar 2021

Per my post accidentally attached to another user's response, Matlab certainly does use swap. Feel free to run the sample program I attached to test this (if you have access to a Linux system; Matlab doesn't seem have an OS-independent way to check memory use).

The same test program will demonstrate copy-on-write behavior. Toggle the "modify the input argument" flag "true" or "false" to observe this.

Jan on 2 Apr 2021

I've clarified my mistake in my former comment: Matlab does use the pagefile under Windows.

Your text code shows, that Matlab uses a copy-on-write method. This is documented. Your assumption, that this is done automatically using exceptions for write-protected virtual pages, does not match the implementation in Matlab.

Sign in to comment.

Answer 5

Steven Lord on 25 Mar 2021

1 vote

You know, I really want to read the newest Brandon Sanderson novel. But I don't have room on that shelf in my bookshelf for the volume. Let me store pages 1 through 20 on this bookshelf upstairs. Pages 21 through 40 would fit nicely in that little table downstairs. Pages 41 through 50 can squeeze into the drawer on my nightstand upstairs. Pages 51 through 80 could get stacked on top of the cereal in the kitchen cupboard downstairs. Pages ...

That's going to make it a lot more time consuming to read the new book. And since Sanderson's newest book is over 1200 pages long, I'm going to wear a path in the carpet on the stairs before I'm finished.

So no, there is no setting to tell MATLAB not to use contiguous memory.

The bits may be physically located in different locations on the chip, but to MATLAB and to the libraries we use they have to appear contiguous. Since in actuality I'm more likely to read that Sanderson novel on my tablet, the pages could be stored as a file split across different locations in the physical hardware of the SD card in my tablet but the reader software handles it so I see the pages in order and I would be annoyed if I had to manually switch to reading a different book every chapter to match the physical location of the data.

6 Comments
Show 4 older comments Hide 4 older comments

Christopher Thomas on 26 Mar 2021

I agree that it's possible to partition data that is larger than physical memory. My objection is to being told that this is the only possible approach when the data held in memory (as reported by whos) is vastly smaller than available physical memory, when an apparently-straightforward feature that's been requested for 10-15 years would resolve it.

I'd be happy just getting a plausible answer about why this hasn't been addressed in that time, even if it's "the customers driving 90% of our revenue prefer that we work on different features instead".

Instead, in this thread I've been given a rationale that was demonstrably mistaken and have been told that despite having very much more physical RAM than data, the problem is the machines I'm using, not Matlab's memory handling. Past threads about this (over 10-15 years) report that you often blame the OS's heap manager as well.

If you're worried about push-back from "it's not worth our time to do this", you could easily side-step it by saying "if a group of customers holding annual licenses worth (10 times the estimated implementation cost) ask us to, we'll be happy to prioritize that feature". It would be win/win: either the people asking would go away (with fewer hard feelings) or you'd learn that it really is important enough to your customers to be worthwhile.

John D'Errico on 27 Mar 2021

Remember that knowing those elements are stored contiguously in memory is a hugely important feature, and making them stored in those memory locations improves the way the BLAS works. And that is a big feature in terms of speed. So while a few people MIGHT want to have a feature that would slow down MATLAB for everybody else, I doubt most users would be happy to know that because one person thinks it important, suddenly a matrix multiply is now significantly slower.

It won't happen, nor would I and a lot of other people be happy if it did.

Christopher Thomas on 29 Mar 2021

Regarding which rationales were demonstrably mistaken:

The claim that Matlab's computation routines must use single contiguous physical memory segments to store arrays (as opposed to putting pages wherever there's space). The only thing that actually needs this is DMA from devices that don't have access to the page table.
The claim that Matlab uses physical addressing rather than virtual addressing. Matlab uses copy-on-write and swapping, and runs as user-space code. Direct access to physical memory requires privileged instructions because it bypasses memory protection (which is implemented by the page table).
The claim that the OS's heap manager is in any way to blame for this (from previous form threads). It's doing exactly what you're asking it to.
The claim that I need a machine with more memory. If my dataset is very much smaller than physical memory and Matlab is asking for very much more space than I have physical memory, the machine isn't the problem.

Regarding "elements being stored contiguously in memory being a hugely important feature", all that's needed is that they be contiguous in virtual memory, which makes them contiguous within pages. Remember that page size (2 MB last I checked; 4 kB on very old systems) is very much larger than cache row size (typically anywhere from 32 bytes to 256 bytes). Virtually all of your cache hits and misses will happen the same way whether pages are contiguous with each other or fragmented. Moving to a different page requires a new page table lookup whether that page is contiguous to the old one or not, so it's not obvious to me that making pages contiguous with each other helps cache performance at all.

Per my original post - all I'm looking for is that there be the option to store pages in fragmented memory rather than contiguously with each other. The computing library implementation is unchanged (it literally can't tell the difference); the only changes you'd need to make are to switch to different memory allocation calls when the "allow fragmented memory" flag is set, and to lock out hardware accelerators like GPUs when the flag is set.

This in no way "slows down MATLAB for everyone else". The entire point of a flag is that it's something you have to set. By default memory allocation would still be physically contiguous.

Sign in to comment.

Can I tell Matlab not to use contiguous memory?

8 Comments
Show 6 older comments Hide 6 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (4)

12 Comments
Show 10 older comments Hide 10 older comments

16 Comments
Show 14 older comments Hide 14 older comments

7 Comments
Show 5 older comments Hide 5 older comments

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Can I tell Matlab not to use contiguous memory?

8 Comments Show 6 older comments Hide 6 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (4)

12 Comments Show 10 older comments Hide 10 older comments

16 Comments Show 14 older comments Hide 14 older comments

7 Comments Show 5 older comments Hide 5 older comments

6 Comments Show 4 older comments Hide 4 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

0 Comments
Show -2 older comments Hide -2 older comments

12 Comments
Show 10 older comments Hide 10 older comments

16 Comments
Show 14 older comments Hide 14 older comments

7 Comments
Show 5 older comments Hide 5 older comments

6 Comments
Show 4 older comments Hide 4 older comments