How much is the overhead of smart pointers compared to normal pointers in C++?

c++ performance c++11 smart-pointers

54,322

Solution 1

std::unique_ptr has memory overhead only if you provide it with some non-trivial deleter.

std::shared_ptr always has memory overhead for reference counter, though it is very small.

std::unique_ptr has time overhead only during constructor (if it has to copy the provided deleter and/or null-initialize the pointer) and during destructor (to destroy the owned object).

std::shared_ptr has time overhead in constructor (to create the reference counter), in destructor (to decrement the reference counter and possibly destroy the object) and in assignment operator (to increment the reference counter). Due to thread-safety guarantees of std::shared_ptr, these increments/decrements are atomic, thus adding some more overhead.

Note that none of them has time overhead in dereferencing (in getting the reference to owned object), while this operation seems to be the most common for pointers.

To sum up, there is some overhead, but it shouldn't make the code slow unless you continuously create and destroy smart pointers.

Solution 2

My answer is different from the others and i really wonder if they ever profiled code.

shared_ptr has a significant overhead for creation because of it's memory allocation for the control block (which keeps the ref counter and a pointer list to all weak references). It has also a huge memory overhead because of this and the fact that std::shared_ptr is always a 2 pointer tuple (one to the object, one to the control block).

If you pass a shared_pointer to a function as a value parameter then it will be at least 10 times slower then a normal call and create lots of codes in the code segment for the stack unwinding. If you pass it by reference you get an additional indirection which can be also pretty worse in terms of performance.

Thats why you should not do this unless the function is really involved in ownership management. Otherwise use "shared_ptr.get()". It is not designed to make sure your object isn't killed during a normal function call.

If you go mad and use shared_ptr on small objects like an abstract syntax tree in a compiler or on small nodes in any other graph structure you will see a huge perfomance drop and a huge memory increase. I have seen a parser system which was rewritten soon after C++14 hit the market and before the programmer learned to use smart pointers correctly. The rewrite was a magnitude slower then the old code.

It is not a silver bullet and raw pointers aren't bad by definition either. Bad programmers are bad and bad design is bad. Design with care, design with clear ownership in mind and try to use the shared_ptr mostly on the subsystem API boundary.

If you want to learn more you can watch Nicolai M. Josuttis good talk about "The Real Price of Shared Pointers in C++" https://vimeo.com/131189627
It goes deep into the implementation details and CPU architecture for write barriers, atomic locks etc. once listening you will never talk about this feature being cheap. If you just want a proof of the magnitude slower, skip the first 48 minutes and watch him running example code which runs upto 180 times slower (compiled with -O3) when using shared pointer everywhere.

EDITED:

And if you ask about "std::unique_ptr" than visit this talk "CppCon 2019: Chandler Carruth “There Are No Zero-cost Abstractions” https://www.youtube.com/watch?v=rHIkrotSwcc

Its just not true, that unique_ptr is 100% cost free.

OFFTOPIC:

I tried to educate people about the the false idea that using exceptions that are not thrown has no cost penalty for over two decades now. In this case it's in the optimizer and the code size.

Solution 3

As with all code performance, the only really reliable means to obtain hard information is to measure and/or inspect machine code.

That said, simple reasoning says that

You can expect some overhead in debug builds, since e.g. operator-> must be executed as a function call so that you can step into it (this is in turn due to general lack of support for marking classes and functions as non-debug).
For shared_ptr you can expect some overhead in initial creation, since that involves dynamic allocation of a control block, and dynamic allocation is very much slower than any other basic operation in C++ (do use make_shared when practically possible, to minimize that overhead).
Also for shared_ptr there is some minimal overhead in maintaining a reference count, e.g. when passing a shared_ptr by value, but there's no such overhead for unique_ptr.

Keeping the first point above in mind, when you measure, do that both for debug and release builds.

The international C++ standardization committee has published a technical report on performance, but this was in 2006, before unique_ptr and shared_ptr were added to the standard library. Still, smart pointers were old hat at that point, so the report considered also that. Quoting the relevant part:

“if accessing a value through a trivial smart pointer is significantly slower than accessing it through an ordinary pointer, the compiler is inefficiently handling the abstraction. In the past, most compilers had significant abstraction penalties and several current compilers still do. However, at least two compilers have been reported to have abstraction penalties below 1% and another a penalty of 3%, so eliminating this kind of overhead is well within the state of the art”

As an informed guess, the “well within the state of the art” has been achieved with the most popular compilers today, as of early 2014.

Solution 4

In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?

Slower? Most likely not, unless you are creating a huge index using shared_ptrs and you have not enough memory to the point that your computer starts wrinkling, like an old lady being plummeted to the ground by an unbearable force from afar.

What would make your code slower is sluggish searches, unnecessary loop processing, huge copies of data, and a lot of write operations to disk (like hundreds).

The advantages of a smart pointer are all related to management. But is the overhead necessary? This depends on your implementation. Let's say you are iterating over an array of 3 phases, each phase has an array of 1024 elements. Creating a smart_ptr for this process might be overkill, since once the iteration is done you'll know you have to erase it. So you could gain extra memory from not using a smart_ptr...

But do you really want to do that?

A single memory leak could make your product have a point of failure in time (let's say your program leaks 4 megabytes each hour, it would take months to break a computer, nevertheless, it will break, you know it because the leak is there).

Is like saying "you software is guaranteed for 3 months, then, call me for service."

So in the end it really is a matter of... can you handle this risk? does using a raw pointer to handle your indexing over hundreds of different objects is worth loosing control of the memory.

If the answer is yes, then use a raw pointer.

If you don't even want to consider it, a smart_ptr is a good, viable, and awesome solution.

Solution 5

Chandler Carruth has a few surprising "discoveries" on unique_ptr in his 2019 Cppcon talk. (Youtube). I can't explain it quite as well.

I hope I understood the two main points right:

Code without unique_ptr will (often incorrectly) not handle cases where owership is not passed while passing a pointer. Rewriting it to use unique_ptr will add that handling, and that has some overhead.
A unique_ptr is still a C++ object, and objects will be passed on stack when calling a function, unlike pointers, which can be passed in registers.

View more solutions

54,322

Author by

Venemo

I have always been enthusiastic about software. I've got a couple of years of experience with a wide array of technologies and recently I have also become a fan of open source. I've got expertise with web, mobile and desktop development. The main technologies I use are .NET and Qt. I've been working on web projects since 2008, and I've been developing mobile apps since 2010. I've grown to like both Linux and Windows, and I prefer to use the right tool for the right job. I always prefer quality over quantity and have a passion for user interface development. I'm proficent with Qt the C++ framework and also Qt Quick and QML Web technologies in general, such as jQuery, AJAX, HTML 5 New and interesting stuff like Node.js Bare-metal embedded development with C/C++ and hardware design Microsoft .NET - including technologies like ASP.NET, WPF, WCF, Silverlight, Windows Phone and SQL Server

Updated on July 15, 2021

Comments

Venemo almost 3 years
How much is the overhead of smart pointers compared to normal pointers in C++11? In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?

Specifically, I'm asking about the C++11 std::shared_ptr and std::unique_ptr.

Obviously, the stuff pushed down the stack is going to be larger (at least I think so), because a smart pointer also needs to store its internal state (reference count, etc), the question really is, how much is this going to affect my performance, if at all?

For example, I return a smart pointer from a function instead of a normal pointer:
```
std::shared_ptr<const Value> getValue();
// versus
const Value *getValue();
```
Or, for example, when one of my functions accept a smart pointer as parameter instead of a normal pointer:
```
void setValue(std::shared_ptr<const Value> val);
// versus
void setValue(const Value *val);
```
graywolf over 10 years

ok, but valgrind is good in checking for possible memory leaks, so as long as you use it you should be safe™
Claudiordgz over 10 years

@Paladin Yes, if you can handle your memory, smart_ptr are really useful for large teams
graywolf over 10 years

I use unique_ptr, it simplifies lot of things, but don't like shared_ptr, reference counting is not very efficient GC and its not perfect either
Claudiordgz over 10 years

@Paladin I try to use raw pointers if I can encapsulate everything. If it is something that I will be passing around all over the place like an argument then maybe I'll consider an smart_ptr. Most of my unique_ptrs are used in the big implementation, like a main or run method
Venemo over 10 years

Could you please include some details in your answer about the cases I added to my question?
R. Martinho Fernandes over 9 years

unique_ptr has no overhead in the destructor. It does exactly the same as you would with a raw pointer.
lisyarus over 9 years

@R.MartinhoFernandes comparing to raw pointer itself, it does have time overhead in destructor, since raw pointer destructor does nothing. Comparing to how a raw pointer would probably be used, it surely has no overhead.
Joe over 8 years

Worth noting that part of the shared_ptr construction/destruction/assignment cost is due to thread safety
Martin Drozdik about 8 years

Also, what about the default constructor of std::unique_ptr? If you construct a std::unique_ptr<int>, the internal int* gets initialized to nullptr whether you like it or not.
lisyarus over 7 years

@Joe Thank you! Added this to the answer.
lisyarus over 7 years

@MartinDrozdik In most situations you'd null-initialize the raw pointer too, to check it's nullity later, or something like that. Nevertheless, added this to the answer, thank you.
Venemo over 6 years

Thanks for your answer! Which platform did you profile on? Can you back up your claims with some data?
Lothar over 6 years

I have no number to show, but you can find some in Nico Josuttis talk vimeo.com/131189627
Claudiordgz over 6 years

@Lothar I see you paraphrased one of the things I said in your answer: Thats why you should not do this unless the function is really involved in ownership management... great answer, thanks, upvoted
Deduplicator over 6 years

Ever heard of std::make_shared()? Also, I find demonstrations of blatant misuse being bad a bit boring...
Lothar over 6 years

All "make_shared" can do is safe you from one additional allocation and give you a bit more cache locality if the control block is allocated in front of the object. It can't not help at all when you pass the pointer around. This is not the root of the problems.
Byron over 5 years

This might have been true 10 or more years ago, but today, inspecting machine code is not as useful as the person above suggests. Depending on how instructions are pipelined, vectorized, ... and how the compiler/processor deals with speculation ultimately is how fast it is. Less code machine code doesn't necessarily mean faster code. The only way to determine the performance is to profile it. This can change on a processor basis and also per compiler.
Mohan Kumar over 5 years

I have tested the code now, it's just only 10% slow when using the unique pointer.
phuclv about 5 years

never ever benchmark with -O0 or debug code. The output will be extremely inefficient. Always use at least -O2 (or -O3 nowadays because some vectorization aren't done in -O2)
CygnusX1 about 5 years

Are you certain that std::shared_ptr incurs no overhead when dereferencing the object? To my knowledge, shared_ptr points to a proxy object which holds a pair: {reference count, pointer to the actual object}. Therefore, you need to perform two jumps in the memory, not one to reach your object.
lisyarus about 5 years

@CygnusX1 Yes, I am. A std::shared_ptr has two pointers: the owned pointer and the referenced pointer (see constructor #8 here en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr). These two pointers usually coincide, but what if you want a shared pointer to a member of a class that is itself stored through the shared pointer? You make a shared pointer that owns the whole class instance, but references the member.
lisyarus about 5 years

@CygnusX1 Implementations usually use the proxy to store the owned pointer + reference count, and store the referenced pointer in the shared_ptr object itself, speeding up access. Here's a dumb verification that sizeof(shared_ptr) == 2 * sizeof(pointer): ideone.com/XFq5Vc
Lothar almost 5 years

If you have time and want a coffee break take -O4 to get link time optimization and all the little tiny abstraction functions get inline and vanish.
Nathan Doromal over 4 years

An issue I've seen is that, once shared_ptrs are used in a server, then the usage of shared_ptrs begin to proliferate, and soon shared_ptrs become the default memory management technique. So now you have repeated 1-3% abstraction penalties which are taken over and over again.
Paul Childs over 4 years

I think benchmarking a debug build is a complete and utter waste of time
RnMss about 4 years

You should include a free call in the malloc test, and delete[] for new (or make variable a static), because the unique_ptrs are calling delete[] under the hood, in their destructors.
imallett almost 3 years

This answer is nice as far as it goes, but the OP explicitly asked for information on std::unique_ptr<...> too. This is just a rant about std::shared_ptr<...>.
Lothar almost 3 years

@imallett No he did not. He asked about shared ptr and his examples all used shared ptr because this is the real important use case. For unique ptr he should watch "CppCon 2019: Chandler Carruth “There Are No Zero-cost Abstractions” on youtube. I will add this to the answer.
Victor Drouin almost 3 years

I believe you made a mistake saying "and a pointer list to all weak references". As it may be a way of implementing shared/weak pointers, I think most of the time (see msvc & clang implementations for instance) it is done through a double counter (one for strong refs and one for weak ones). Control block (and object block when allocated through allocate_shared) is kept allocated until all strong and weak refs are destroyed.
c z almost 3 years

@R.MartinhoFernandes Looking at the GCC code, this isn't true. During destruction, unique_ptr checks to see if the value is nullptr and always sets itself to nullptr after. This is because (unlike deleteing a raw pointer) a custom-deleter may not well handle nullptr, and the pointer itself doesn't know if whether or not it is going out of scope. I find it unlikley the compiler will optimise this if the deleter cannot be inlined.
c z almost 3 years

@phuclv I disagree, both should be tested. 1. Debug mode misses out optimisations which are often specific to the particular build platform and version, and can give you a "worst case" benchmark. 2. Optimisations are easy for simple scripts but less prevalent in complex software with complex paths. I've seen numerous posters claiming super fast algorithms, only to see them later find the optimiser has just removed the entire test loop upon seeing that the output can be predetermined. 3. Having software that runs considerably slower in debug mode is annoying to developers.
phuclv almost 3 years

@cz who cares about worst case benchmarks? Probably only RTOS applications. For most users only the optimized benchmark is useful. Who are annoyed about slow debug mode? MSVC debug mode may be 10 times slower because lots of STL debug code is injected and yet no one complains apart from you
ljleb over 2 years

@Lothar with more recent versions of llvm, -O4 doesn't include LTO anymore. See this SO question for more info