Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Wednesday, January 27, 2016

Which is faster: Stack allocation or Heap allocation

Which is faster: Stack allocation or Heap allocation


This question may sound fairly elementary, but this is a debate I had with another developer I work with.

I was taking care to stack allocate things where I could, instead of heap allocating them. He was talking to me and watching over my shoulder and commented that it wasn't necessary because they are the same performance wise.

I was always under the impression that growing the stack was constant time, and heap allocation's performance depended on the current complexity of the heap for both allocation (finding a hole of the proper size) and de-allocating (collapsing holes to reduce fragmentation, as many standard library implementations take time to do this during deletes if I am not mistaken).

This strikes me as something that would probably be very compiler dependent. For this project in particular I am using a Metrowerks compiler for the PPC architecture. Insight on this combination would be most helpful, but in general, for GCC, and MSVC++, what is the case? Is heap allocation not as high performing as stack allocation? Is there no difference? Or are the differences so minute it becomes pointless micro-optimization.

Answer by Chris Jester-Young for Which is faster: Stack allocation or Heap allocation


You can write a special heap allocator for specific sizes of objects that is very performant. However, the general heap allocator is not particularly performant.

Also I agree with Torbjrn Gyllebring about the expected lifetime of objects. Good point!

Answer by Torbjrn Gyllebring for Which is faster: Stack allocation or Heap allocation


Stack allocation is much faster since all it really does is move the stack pointer. Using memory pools, you can get comparable performance out of heap allocation, but that comes with a slight added complexity and its own headaches.

Also, stack vs. heap is not only a performance consideration; it also tells you a lot about the expected lifetime of objects.

Answer by MarkR for Which is faster: Stack allocation or Heap allocation


I don't think stack allocation and heap allocation are generally interchangable. I also hope that the performance of both of them is sufficient for general use.

I'd strongly recommend for small items, whichever one is more suitable to the scope of the allocation. For large items, the heap is probably necessary.

On 32-bit operating systems that have multiple threads, stack is often rather limited (albeit typically to at least a few mb), because the address space needs to be carved up and sooner or later one thread stack will run into another. On single threaded systems (Linux glibc single threaded anyway) the limitation is much less because the stack can just grow and grow.

On 64-bit operating systems there is enough address space to make thread stacks quite large.

Answer by Dan Lenski for Which is faster: Stack allocation or Heap allocation


Stack is much faster. It literally only uses a single instruction on most architectures, in most cases, e.g. on x86:

sub esp, 0x10  

(That moves the stack pointer down by 0x10 bytes and thereby "allocates" those bytes for use by a variable.)

Of course, the stack's size is very, very finite, as you will quickly find out if you overuse stack allocation or try to do infinite recursion :-)

Also, there's little reason to optimize the performance of code that doesn't verifiably need it, such as demonstrated by profiling. "Premature optimization" often causes more problems than it's worth.

My rule of thumb: if I know I'm going to need some data at compile-time, and it's under a few hundred bytes in size, I stack-allocate it. Otherwise I heap-allocate it.

Answer by Windows programmer for Which is faster: Stack allocation or Heap allocation


Usually stack allocation just consists of subtracting from the stack pointer register. This is tons faster than searching a heap.

Sometimes stack allocation requires adding a page(s) of virtual memory. Adding a new page of zeroed memory doesn't require reading a page from disk, so usually this is still going to be tons faster than searching a heap (especially if part of the heap was paged out too). In a rare situation, and you could construct such an example, enough space just happens to be available in part of the heap which is already in RAM, but allocating a new page for the stack has to wait for some other page to get written out to disk. In that rare situation, the heap is faster.

Answer by jakobengblom2 for Which is faster: Stack allocation or Heap allocation


I think the lifetime is crucial, and whether the thing being allocated has to be constructed in a complex way. For example, in transaction-driven modeling, you usually have to fill in and pass in a transaction structure with a bunch of fields to operation functions. Look at the OSCI SystemC TLM-2.0 standard for an example.

Allocating these on the stack close to the call to the operation tends to cause enormous overhead, as the construction is expensive. The good way there is to allocate on the heap and reuse the transaction objects either by pooling or a simple policy like "this module only needs one transaction object ever".

This is many times faster than allocating the object on each operation call.

The reason is simply that the object has an expensive construction and a fairly long useful lifetime.

I would say: try both and see what works best in your case, because it can really depend on the behavior of your code.

Answer by larsivi for Which is faster: Stack allocation or Heap allocation


Probably the biggest problem of heap allocation versus stack allocation, is that heap allocation in the general case is an unbounded operation, and thus you can't use it where timing is an issue.

For other applications where timing isn't an issue, it may not matter as much, but if you heap allocate a lot, this will affect the execution speed. Always try to use the stack for short lived and often allocated memory (for instance in loops), and as long as possible - do heap allocation during application startup.

Answer by yogman for Which is faster: Stack allocation or Heap allocation


A stack has a limited capacity, while a heap is not. The typical stack for a process or thread is around 8K. You cannot change the size once it's allocated.

A stack variable follows the scoping rules, while a heap one doesn't. If your instruction pointer goes beyond a function, all the new variables associated with the function go away.

Most important of all, you can't predict the overall function call chain in advance. So a mere 200 bytes allocation on your part may raise a stack overflow. This is especially important if you're writing a library, not an application.

Answer by Max Lybbert for Which is faster: Stack allocation or Heap allocation


Honestly, it's trivial to write a program to compare the performance:

#include   #include     namespace {      class empty { }; // even empty classes take up 1 byte of space, minimum  }    int main()  {      std::clock_t start = std::clock();      for (int i = 0; i < 100000; ++i)          empty e;      std::clock_t duration = std::clock() - start;      std::cout << "stack allocation took " << duration << " clock ticks\n";      start = std::clock();      for (int i = 0; i < 100000; ++i) {          empty* e = new empty;          delete e;      };      duration = std::clock() - start;      std::cout << "heap allocation took " << duration << " clock ticks\n";  }  

It's said that a foolish consistency is the hobgoblin of little minds. Apparently optimizing compilers are the hobgoblins of many programmers' minds. This discussion used to be at the bottom of the answer, but people apparently can't be bothered to read that far, so I'm moving it up here to avoid getting questions that I've already answered.

An optimizing compiler may notice that this code does nothing, and may optimize it all away. It is the optimizer's job to do stuff like that, and fighting the optimizer is a fool's errand.

I would recommend compiling this code with optimization turned off because there is no good way to fool every optimizer currently in use or that will be in use in the future.

Anybody who turns the optimizer on and then complains about fighting it should be subject to public ridicule.

If I cared about nanosecond precision I wouldn't use std::clock(). If I wanted to publish the results as a doctoral thesis I would make a bigger deal about this, and I would probably compare GCC, Tendra/Ten15, LLVM, Watcom, Borland, Visual C++, Digital Mars, ICC and other compilers. As it is, heap allocation takes hundreds of times longer than stack allocation, and I don't see anything useful about investigating the question any further.

The optimizer has a mission to get rid of the code I'm testing. I don't see any reason to tell the optimizer to run and then try to fool the optimizer into not actually optimizing. But if I saw value in doing that, I would do one or more of the following:

  1. Add a data member to empty, and access that data member in the loop; but if I only ever read from the data member the optimizer can do constant folding and remove the loop; if I only ever write to the data member, the optimizer may skip all but the very last iteration of the loop. Additionally, the question wasn't "stack allocation and data access vs. heap allocation and data access."

  2. Declare e volatile, but volatile is often compiled incorrectly (PDF).

  3. Take the address of e inside the loop (and maybe assign it to a variable that is declared extern and defined in another file). But even in this case, the compiler may notice that -- on the stack at least -- e will always be allocated at the same memory address, and then do constant folding like in (1) above. I get all iterations of the loop, but the object is never actually allocated.

Beyond the obvious, this test is flawed in that it measures both allocation and deallocation, and the original question didn't ask about deallocation. Of course variables allocated on the stack are automatically deallocated at the end of their scope, so not calling delete would (1) skew the numbers (stack deallocation is included in the numbers about stack allocation, so it's only fair to measure heap deallocation) and (2) cause a pretty bad memory leak, unless we keep a reference to the new pointer and call delete after we've got our time measurement.

On my machine, using g++ 3.4.4 on Windows, I get "0 clock ticks" for both stack and heap allocation for anything less than 100000 allocations, and even then I get "0 clock ticks" for stack allocation and "15 clock ticks" for heap allocation. When I measure 10,000,000 allocations, stack allocation takes 31 clock ticks and heap allocation takes 1562 clock ticks.


Yes, an optimizing compiler may elide creating the empty objects. If I understand correctly, it may even elide the whole first loop. When I bumped up the iterations to 10,000,000 stack allocation took 31 clock ticks and heap allocation took 1562 clock ticks. I think it's safe to say that without telling g++ to optimize the executable, g++ did not elide the constructors.


In the years since I wrote this, the preference on Stack Overflow has been to post performance from optimized builds. In general, I think this is correct. However, I still think it's silly to ask the compiler to optimize code when you in fact do not want that code optimized. It strikes me as being very similar to paying extra for valet parking, but refusing to hand over the keys. In this particular case, I don't want the optimizer running.

Using a slightly modified version of the benchmark (to address the valid point that the original program didn't allocate something on the stack each time through the loop) and compiling without optimizations but linking to release libraries (to address the valid point that we don't want to include any slowdown caused by linking to debug libraries):

#include   #include     namespace {      void on_stack()      {          int i;      }        void on_heap()      {          int* i = new int;          delete i;      }  }    int main()  {      auto begin = std::chrono::system_clock::now();      for (int i = 0; i < 1000000000; ++i)          on_stack();      auto end = std::chrono::system_clock::now();        std::printf("on_stack took %f seconds\n", std::chrono::duration(end - begin).count());        begin = std::chrono::system_clock::now();      for (int i = 0; i < 1000000000; ++i)          on_heap();      end = std::chrono::system_clock::now();        std::printf("on_heap took %f seconds\n", std::chrono::duration(end - begin).count());      return 0;  }  

displays:

on_stack took 2.070003 seconds  on_heap took 57.980081 seconds  

on my system when compiled with the command line cl foo.cc /Od /MT /EHsc.

You may not agree with my approach to getting a non-optimized build. That's fine: feel free modify the benchmark as much as you want. When I turn on optimization, I get:

on_stack took 0.000000 seconds  on_heap took 51.608723 seconds  

Not because stack allocation is actually instantaneous but because any half-decent compiler can notice that on_stack doesn't do anything useful and can be optimized away. GCC on my Linux laptop also notices that on_heap doesn't do anything useful, and optimizes it away as well:

on_stack took 0.000003 seconds  on_heap took 0.000002 seconds  

Answer by MSalters for Which is faster: Stack allocation or Heap allocation


It's not jsut stack allocation that's faster. You also win a lot on using stack variables. They have better locality of reference. And finally, deallocation is a lot cheaper too.

Answer by Mike Dunlavey for Which is faster: Stack allocation or Heap allocation


There's a general point to be made about such optimizations.

The optimization you get is proportional to the amount of time the program counter is actually in that code.

If you sample the program counter, you will find out where it spends its time, and that is usually in a tiny part of the code, and often in library routines you have no control over.

Only if you find it spending much time in the heap-allocation of your objects will it be noticeably faster to stack-allocate them.

Answer by Ketan for Which is faster: Stack allocation or Heap allocation


Never do premature assumption as other application code and usage can impact your function. So looking at function is isolation is of no use.

If you are serious with application then VTune it or use any similar profiling tool and look at hotspots.

Ketan

Answer by Nikhil for Which is faster: Stack allocation or Heap allocation


It has been mentioned before that stack allocation is simply moving the stack pointer, that is, a single instruction on most architectures. Compare that to what generally happens in the case of heap allocation.

The operating system maintains portions of free memory as a linked list with the payload data consisting of the pointer to the starting address of the free portion and the size of the free portion. To allocate X bytes of memory, the link list is traversed and each note is visited in sequence, checking to see if its size is at least X. When a portion with size P >= X is found, P is split into two parts with sizes X and P-X. The linked list is updated and the pointer to the first part is returned.

As you can see, heap allocation depends on may factors like how much memory you are requesting, how fragmented the memory is and so on.

Answer by Dan Olson for Which is faster: Stack allocation or Heap allocation


In general, stack allocation is faster than heap allocation as mentioned by almost every answer above. A stack push or pop is O(1), whereas allocating or freeing from a heap could require a walk of previous allocations. However you shouldn't usually be allocating in tight, performance-intensive loops, so the choice will usually come down to other factors.

It might be good to make this distinction: you can use a "stack allocator" on the heap. Strictly speaking, I take stack allocation to mean the actual method of allocation rather than the location of the allocation. If you're allocating a lot of stuff on the actual program stack, that could be bad for a variety of reasons. On the other hand, using a stack method to allocate on the heap when possible is the best choice you can make for an allocation method.

Since you mentioned Metrowerks and PPC, I'm guessing you mean Wii. In this case memory is at a premium, and using a stack allocation method wherever possible guarantees that you don't waste memory on fragments. Of course, doing this requires a lot more care than "normal" heap allocation methods. It's wise to evaluate the tradeoffs for each situation.

Answer by Furious Coder for Which is faster: Stack allocation or Heap allocation


An interesting thing I learned about Stack vs. Heap Allocation on the Xbox 360 Xenon processor, which may also apply to other multicore systems, is that allocating on the Heap causes a Critical Section to be entered to halt all other cores so that the alloc doesn't conflict. Thus, in a tight loop, Stack Allocation was the way to go for fixed sized arrays as it prevented stalls.

This may be another speedup to consider if you're coding for multicore/multiproc, in that your stack allocation will only be viewable by the core running your scoped function, and that will not affect any other cores/CPUs.

Answer by Jay for Which is faster: Stack allocation or Heap allocation


Aside from the orders-of-magnitude performance advantage over heap allocation, stack allocation is preferable for long running server applications. Even the best managed heaps eventually get so fragmented that application performance degrades.

Answer by MSN for Which is faster: Stack allocation or Heap allocation


Stack allocation will almost always be as fast or faster than heap allocation, although it is certainly possible for a heap allocator to simply use a stack based allocation technique.

However, there are larger issues when dealing with the overall performance of stack vs. heap based allocation (or in slightly better terms, local vs. external allocation). Usually, heap (external) allocation is slow because it is dealing with many different kinds of allocations and allocation patterns. Reducing the scope of the allocator you are using (making it local to the algorithm/code) will tend to increase performance without any major changes. Adding better structure to your allocation patterns, for example, forcing a LIFO ordering on allocation and deallocation pairs can also improve your allocator's performance by using the allocator in a simpler and more structured way. Or, you can use or write an allocator tuned for your particular allocation pattern; most programs allocate a few discrete sizes frequently, so a heap that is based on a lookaside buffer of a few fixed (preferably known) sizes will perform extremely well. Windows uses its low-fragmentation-heap for this very reason.

On the other hand, stack-based allocation on a 32-bit memory range is also fraught with peril if you have too many threads. Stacks need a contiguous memory range, so the more threads you have, the more virtual address space you will need for them to run without a stack overflow. This won't be a problem (for now) with 64-bit, but it can certainly wreak havoc in long running programs with lots of threads. Running out of virtual address space due to fragmentation is always a pain to deal with.

Answer by Andrei Pokrovsky for Which is faster: Stack allocation or Heap allocation


Stack allocation is a couple instructions whereas the fastest rtos heap allocator known to me (TLSF) uses on average on the order of 150 instructions. Also stack allocations don't require a lock because they use thread local storage which is another huge performance win. So stack allocations can be 2-3 orders of magnitude faster depending on how heavily multithreaded your environment is.

In general heap allocation is your last resort if you care about performance. A viable in-between option can be a fixed pool allocator which is also only a couple instructions and has very little per-allocation overhead so it's great for small fixed size objects. On the downside it only works with fixed size objects, is not inherently thread safe and has block fragmentation problems.

Answer by Master Yoda for Which is faster: Stack allocation or Heap allocation


stack allocation is much faster.

Answer by wjl for Which is faster: Stack allocation or Heap allocation


As others have said, stack allocation is generally much faster.

However, if your objects are expensive to copy, allocating on the stack may lead to an huge performance hit later when you use the objects if you aren't careful.

For example, if you allocate something on the stack, and then put it into a container, it would have been better to allocate on the heap and store the pointer in the container (e.g. with a std::shared_ptr<>). The same thing is true if you are passing or returning objects by value, and other similar scenarios.

The point is that although stack allocation is usually better than heap allocation in many cases, sometimes if you go out of your way to stack allocate when it doesn't best fit the model of computation, it can cause more problems than it solves.

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.