Malloc and Scalability
The naive approach to a malloc implementation is to wrap a single global heap area with a lock, and have all concurrent attempts to allocate and deallocate memory contend for that lock. Obviously, scalability cannot be high under such a system.
Issues with the efficiency of dynamic memory management are not new. Doug Lea's venerable dlmalloc was written in response to such concerns over a decade ago; However, programmers rely increasingly heavily on such systems, and now they pose a new challenge as multiprocessors become ubiquitous and affordable. Applications that use modern C++ features, such as the STL, may be performing a significant amount of dynamic memory management behind the scenes. Not only do programmers need to be aware of the concurrency issues within their own code, they must also be aware of those issues within each 3rd party library used within a project. The memory allocation behaviour of such libraries is often far from obvious.
Fortunately, C++ allows for the overriding of the global new and delete operators (plus the equivalents for arrays) so that memory allocations can be performed by a scalable and concurrent allocator used as a drop in replacement for the system allocator. This process is explained in Bruce Eckel's Thinking in C++.
If memory management is becoming a bottleneck for your concurrent application, it is relatively easy to obtain a significant performance increase. In one of our in house applications, simply replacing the default malloc with a 3rd party malloc has increased performance by up to 80% on an eight core x86 system.
Links to further resources:
NedMalloc - A Scalable Malloc PT Malloc Googles Thread Caching Malloc Multiprocessor Malloc Comparison