Memory fragmentation

casperbear

Hello!
Qt documentation says:

Fragmentation

Fragmentation is a C++ development issue. If the application developer is not defining any C++ types or plugins, they may safely ignore this section.

Over time, an application will allocate large portions of memory, write data to that memory, and subsequently free some portions of it once it has finished using some of the data. This can result in "free" memory being located in non-contiguous chunks, which cannot be returned to the operating system for other applications to use. It also has an impact on the caching and access characteristics of the application, as the "living" data may be spread across many different pages of physical memory. This in turn could force the operating system to swap, which can cause filesystem I/O - which is, comparatively speaking, an extremely slow operation.

Fragmentation can be avoided by utilizing pool allocators (and other contiguous memory allocators), by reducing the amount of memory which is allocated at any one time by carefully managing object lifetimes, by periodically cleansing and rebuilding caches, or by utilizing a memory-managed runtime with garbage collection (such as JavaScript).

Frankly, I didn't understand either problem or solution.

What do they mean by allocating large portion of memory and freeing part of it? As I understand, whenever I use new/delete or malloc/free I always free entire memory chunk that was previously allocated. Or did they mean allocating large portion of memory with several new/malloc calls?
So let's say there is a number of those "free" memory chunks that cannot be returned to the OS. If application tries to allocate new memory, will it first try to re-allocate one of those memory chunks (if one of them is big enough of course) ?
How to utilize "pool allocators" in Qt? Are there any pool allocators readily available in Qt? I found help page for QMallocPool but it seems to be for Qt 4.4

VRonin

99.99999999% of the cases you should not worry about it. This is the equivalent of the defrag problem on hard drives. In short:
Imagine you allocate 3 objects that occupy 1MB each. They will be allocated in contiguous places in memory. Then you delete the middle item. Then you or the OS needs 2MB of memory, you'll need to use a brand new memory location as the 1MB you freed can't be used as it's too small.
Again in most cases it doesn't matter but a quick improvement on this front is to use QVector over other containers and not allocate sparsely objects that have a relatively short life span

fcarney

In gaming development for consoles memory management is a big issue. Much less of an issue on desktop as the OS helps a lot. On console, not so much. One thing they do on console development is to allocate space for objects. Chunks of memory like an array of C++ objects. Then when creating a new object they don't allocate memory, but reference a location in this memory. This allows that memory to be reused over and over again without ever actually calling new/delete. This requires the class to be designed differently. I don't know if Qt has something equivalent, but that may be one way to do that. I could see this strategy being useful in embedded environments, or long running apps that allocate a bunch of memory a lot. Especially small amounts. I would consider a game server to be a long running app that might benefit from controlling how much memory is new/deleted. Anything that needs uptime of days or many hours.

casperbear

@VRonin From your response I gather that if application wants to allocate 4th 1MB object then it will go in where 2nd object was. Am I a correct gatherer?
(For clarity, I understand that 1MB is most likely big enough to be returned to the OS, but for the sake of the example let's say it cannot be returned)

@fcarney Ah, and they probably use in-place constructors/destructors in such systems. I think I start to understand.

Kent-Dorfman

heap fragmentation isn't something folks pay much attention to in the desktop app world, but that's only one domain. When dealing with resource constrained, high availabilitiy systems, it most certainly is an issue. The idea of pools is that you're reserving several pools of different size granularity, and allocating based on a best fit for the needs of the object. To be useful this requires careful analysis of the data flow in the system, often simply using the default allocator during development to get a handle on the typical memory needs of the system. Then after other debugging is done, a pooled allocator is substituted that has been tuned for the most commonly allocated size chunks....deterministic time response of the allocator is a whole other issue that needs to be addressed in time critical systems. Cannot have an allocation take a few nanoseconds most of the time, but occasionally take 100s of milliseconds other times. that can lead to missed real-time event deadlines.

fcarney

I know Java is a different animal, but it sure would be interesting to see how the memory is used/reused in a modded version of Minecraft. It is common to allocate between 4 and 8 gigabytes of ram to java when running. My guess is it allocates this when the program starts, or does it in pieces. Then uses/reuses it as it runs. You can watch the usage go from 2gb to 6 gb or more while running and drop back down as it does its work. That program deals with a massive amount of data and seems to do it very well.

Kent-Dorfman

@fcarney said in Memory fragmentation:

I know Java is a different animal, but it sure would be interesting to see how the memory is used/reused in a modded version of Minecraft.

There is ample online documentation about the JAVA VM memory manager. It runs garbage collection in a separate thread, and when certain events occur it attempts to clean up the heap and minimize fragmentation...but that level of complexity comes at a cost. Not sure about the current license but due to the non-validated and non-deterministric nature of JAVA memory manager the original SUN license forbid it from being used in safety critical applications. Fine for business apps but don't trust your life on it.

fcarney

@Kent-Dorfman said in Memory fragmentation:

Fine for business apps but don't trust your life on it.

Yeah, I always love the EULAs for programs that say something like:
"Not suitable for any purpose."

Then people wonder why I don't think its a good idea to run control systems on a general purpose OS rather than a dedicated platform like a PLC.

kshegunov

@Kent-Dorfman said in Memory fragmentation:

but that level of complexity comes at a cost

Indeed, I've observed on occasion when somebody calling me java deciding to run the gc at the exact same moment. It's annoying as hell as the phone (android) goes pretty much unresponsive.

casperbear

Don't know about Java, but C# (.Net) has two types of heap: Small Object Heap (for objects that < 85000 bytes) and Large Object Heap. Not only garbage collection is performed for both of them, but also memory defragmentation is performed for Small Object Heap. (As I understand this is possible because references in C# are kind of pointers to pointers). If this matter with reallocating free non-returned memory blocks in C++ doesn't resolve, it will mean that C# memory model is more superior than one of C++ from consumption perspective in some situations.

kshegunov

@casperbear said in Memory fragmentation:

If this matter with reallocating free non-returned memory blocks in C++ doesn't resolve, it will mean that C# memory model is more superior than one of C++ from consumption perspective in some situations.

It's more complicated than that. C and C++ have no heap themselves, strictly speaking, they rely on the OS' heap manager, which is the correct approach; they are stack-based languages. Without special care on the side of the programmer, memory (de)fragmentation will depend on how the OS' heap manager is implemented. The best way to avoid the problem is to not have it in the first place - i.e. to put everything on the stack. The stack can never get fragmented, ever. From practical point of view, however, this is bordering impossible.

C# (and Java) have an interpreter, they are compiled to intermediate code, not to assembly, so their memory model is following the memory model of the VM. This is not superior by any conceivable measure. To make that worse, they miss on having stack allocations wherever possible, at least as far as I know. Suppose you need an intermediate array of some small size. You can have that on the stack in C/C++, but it always ends up in the VM's heap in Java (and C#), so they delegate the burden to the heap manager for no apparent reason. Example:

static constexpr int size = 10;

int smallArray[size];
someFunctionThatManipulatesTheArray(smallArray, size);

vs

static constexpr int size = 10;

int * smallArray = new int[size];
someFunctionThatManipulatesTheArray(smallArray, size);
delete [] smallArray;

The heap is always (much) slower (about 10 times), no matter the usage or the implementation. This derives from the complication that it's non-contiguous by definition. So if it seems that C/C++'s memory model isn't simple, well, that's because it isn't. No one glove is going to fit every hand.

On the other hand the language (C++) allows you to build your own heap manager (via placement new) if that's warranted, so if you're not satisfied with the default implementation you can roll your own.

casperbear

@kshegunov Yeah, you caught me. I'm inclined to use heap because I'm primarily a C# developer. Sounds like I need to learn that move semantics after all, to use stack approach more effectively.

Kent-Dorfman

@casperbear said in Memory fragmentation:

Yeah, you caught me. I'm inclined to use heap because I'm primarily a C# developer. Sounds like I need to learn that move semantics after all, to use stack approach more effectively.

free store memory management is a whole field of study unto itself, so yes...research research research.

@kshegunov -- Well put. You saved me from posting what could have inadvertantly turned into a rant. LOL