Why do linked lists use pointers instead of storing nodes inside of nodes

c++ pointers linked-list

15,508

Solution 1

It's not just better, it's the only possible way.

If you stored a Node object inside itself, what would sizeof(Node) be? It would be sizeof(int) + sizeof(Node), which would be equal to sizeof(int) + (sizeof(int) + sizeof(Node)), which would be equal to sizeof(int) + (sizeof(int) + (sizeof(int) + sizeof(Node))), etc. to infinity.

An object like that can't exist. It's impossible.

Solution 2

In Java

Node m_node

stores a pointer to another node. You don't have a choice about it. In C++

Node *m_node

means the same thing. The difference is that in C++ you can actually store the object as opposed to a pointer to it. That's why you have to say you want a pointer. In C++:

Node m_node

means store the node right here (and that clearly can't work for a list - you end up with a recursively defined structure).

Solution 3

C++ is not Java. When you write

Node m_next;

in Java, that is the same as writing

Node* m_next;

in C++. In Java, the pointer is implicit, in C++ it is explicit. If you write

Node m_next;

in C++, you put an instance of Node right there inside the object that you are defining. It is always there and cannot be omitted, it cannot be allocated with new and it cannot be removed. This effect is impossible to achieve in Java, and it is totally different from what Java does with the same syntax.

Solution 4

You use a pointer, otherwise your code:

class Node
{
   //etc
   Node m_next; //non-pointer
};

…would not compile, as the compiler cannot compute the size of Node. This is because it depends on itself — which means the compiler cannot decide how much memory it would consume.

Solution 5

The latter (Node m_next) would have to contain the node. It wouldn't point to it. And there would then be no linking of elements.

View more solutions

15,508

Author by

m0meni

On a scale of one to my account, this is my account.

Updated on June 02, 2022

Comments

m0meni almost 2 years
I've worked with linked lists before extensively in Java, but I'm very new to C++. I was using this node class that was given to me in a project just fine
```
class Node
{
  public:
   Node(int data);

   int m_data;
   Node *m_next;
};
```
but I had one question that wasn't answered very well. Why is it necessary to use
```
Node *m_next;
```
to point to the next node in the list instead of
```
Node m_next;
```
I understand that it is better to use the pointer version; I'm not going to argue facts, but I don't know why it's better. I got a not so clear answer about how the pointer is better for memory allocation, and I was wondering if anyone here could help me understand that better.
Mike Seymour about 9 years

Worse, it would be logically impossible for an object to contain something of the same type.
m0meni about 9 years

Wouldn't there still technically be linking because it would be a node containing a node containing a node and so on?
Mike Seymour about 9 years

@AR7: No, containment means it's literally inside the object, not linked to it.
bitmask about 9 years

Worse, no valid size exists: If k == sizeof(Node) holds and your type has data, it would also have to hold that sizeof(Node) = k + sizeof(Data) = sizeof(Node) + sizeof(Data) and then sizeof(Node) > sizeof(Node).
Carcigenicate about 9 years

*Unless it's evaluated lazily. Infinite lists are possible, just not with strict evaluation.
Carcigenicate about 9 years

@svick It's actually Haskell that I was thinking of. Infinite lists are fun when the language allows them. I've never tried anything like that in Java though, so I can't speak to that.
Peteris about 9 years

@Carcigenicate it's not about evaluating/executing some function on the Node object - it's about the memory layout of every instance of Node, which must be determined at compile time, before any evaluation can occur.
David K about 9 years

@Peteris If we didn't insist on knowing where the allocation of Node ends in virtual memory, as C and C++ do, we could perhaps make at least one list storing one Node inside another. Allocating anything else in a higher address might be problematic, however. :-) But I don't know any language that works that way.
Voo about 9 years

@DavidK It's logically impossible to do this. You need a pointer (well really an indirection) here - sure the language can hide it from you, but in the end, no way around that.
m0meni about 9 years

@SalmanA I already knew this. I just wanted to know why it wouldn't work without a pointer which is what the accepted answer explained much better.
David K about 9 years

@Voo Yes, it is logically impossible, but the point made earlier (even before I commented) is that this is at least partly due to the design of the C and C++ languages. I find it amusing to contemplate an alternative language in which that data structure is possible. To repeat what I said before, however, in reality I don't know any language that works that way.
Voo about 9 years

@David I'm confused. First you agree that it is logically impossible, but then you want to contemplate it? Remove anything of C or C++ - it's impossible in any language you could ever dream up as far as I can see. That structure is by definition an infinite recursion and without some level of indirection we cannot break that.
David K about 9 years

@Voo The topic of the question is C++, and in that context this data structure is impossible. But there are many assumptions in C++ that we tend to carry over to other languages out of of habit rather than logical necessity. See the first comment above: "Unless it's evaluated lazily." That's a concept that's completely foreign to how the C language taught us to think about data structures; but that doesn't imply it's logically impossible in every conceivable language.
Voo about 9 years

@DavidK Give me the mathematical definition of the structure then that doesn't have an infinite recursion in it. If you can do that, you've shown that it's at least theoretically possible to do so (if I have way too much free time tonight I might just prove this is impossible..). Lazy evaluation has nothing to do with this in any case (I've programmed in Haskell in the past, it's not like the concept is foreign to me) - you still allocate space for the thunk at creation (and that's where the indirection comes into place again).
Benjamin Gruenbaum about 9 years

@Voo A node is a tuple containing a value and null or a value and a node. Here's a recursive definition of a node with a stopping condition which is just fine. It's not only possible to define objects like this - it's also extremely common, here's actual (poor) code that runs in Haskell - data List a = Node a (List a) | Nil.
Benjamin Gruenbaum about 9 years

Hint (it works because you don't allocate space at creation, rather - it's lazy, because Haskell is a non-strict language that evaluates things only when it must.
Panzercrisis about 9 years

@AR7 They're both giving kind of the same explanation, just under two different approaches. If you declared it as a "regular" variable, then the first time a constructor were called, it would instantiate that to a new instance. But before it finishes instantiating it - before the the first one's constructor is finished - the member Node's own constructor would be called, which would instantiate another new instance...and you would get endless pseudo-recursion. It's not really so much a size issue in completely strict and literal terms, as it is a performance issue.
Panzercrisis about 9 years

But all you're really wanting is just a way to point to whichever one's next in the list, not a Node that's actually inside the first Node. So you create a pointer, which is essentially how Java handles objects, as opposed to primitives. When you call a method or make a variable, Java doesn't store a copy of the object or even the object itself; it stores a reference to an object, which is essentially a pointer with a little bit of a kiddy glove wrapped around it. This is what both answers are essentially saying.
pm100 about 9 years

its not a size or speed issue - its an impossibility issue. The included Node object would include a Node object that would include a Node object... In fact it is impossible to compile it
Panzercrisis about 9 years

@pm100 That's what I mean. I'm not sure if the C++ specification would block it, but if a language were compiled so that the compiled code kept jumping back to the beginning of the compiled constructor recursively, it would be possible to compile; but it would be an infinite loop if the programmer didn't handle this properly in the constructor.
m0meni about 9 years

@Panzercrisis I'm aware that they're both giving the same explanation. This approach, however, wasn't as helpful to me because it focused on what I already had an understanding of: how pointers work in C++ and how pointers are handled in Java. The accepted answer addressed specifically why not using a pointer would be impossible because the size can't be calculated. On the other hand, this one left it more vaguely as " a recursively defined structure." P.S your explanation that you just wrote explains it better than both :D.
Voo about 9 years

@benjamin I actually pointed out (because I knew otherwise someone would bring this up - well didn't help) that Haskell allocated the thunks at creation time and hence this works because those thunks give us the indirection we need. This is nothing but a pointer with extra data in disguise...
David K about 9 years

Mathematically, the data structure would have infinite recursion, just like one of the tapes of a Turing machine (which after all is one of the classic mathematical models of computing). Is it worthwhile to actually implement such a structure in practice? I doubt it. Is it possible to do so in a way that doesn't use pointers from one node to the next behind the scenes? Yes, but you might not like the resulting performance.
Aaron Dufour about 9 years

@BenjaminGruenbaum You are describing a structure that includes pointers. Just because Haskell doesn't require you to explicitly call it a pointer doesn't mean that its not implemented as such.
Harry Johnston about 9 years

@Voo: In the example given, Node is mathematically equivalent to an infinite int array. You can implement an infinite array by reserving a suitably large section of address space, i.e., larger than the amount of memory you can actually commit, and having the OS commit memory as needed. The only constraint you have to place on the array in order to implement it like that is to insist that it cannot be sparse, which in this context is fine since the only way to reference element n+1 is via element n. You run out of resource before you run out of address space.
Harry Johnston about 9 years

Alternatively, you can assume a processor architecture that allows both multiple address spaces (which is a real thing but is now out of fashion) and arbitrary address lengths (this has never existed as far as I know). :-)
k_g about 9 years

@bitmask no valid size exists in the real numbers. If you allow transinfinites, aleph_0 works. (Just being overly pedantic :-))
Voo about 9 years

@HarryJohnston But then every linked list is fixed size and since the last element has to be different, it's now not an infinitely recursive definition anymore. For the pointer case we'd need an arbitrary length encoding for the pointers to get around this but it'd be possible.
Harry Johnston about 9 years

@Voo: I'm not sure what you mean by fixed size in this context, or why the last element has to be different. It's true that typically a linked list can be terminated by a null pointer rather than needing a separate mechanism to indicate where the end of the data is, but I'd file that objection under "well, if your declaration doesn't use links, then by definition it's not a linked list" rather than "your declaration is inherently impossible to implement".
Thomas about 9 years

@k_g Well, the C/C++ standard mandates that the return value of sizeof is an unsigned integral type, so there goes the hope of transfinite or even real sizes. (being even more pedantic! :p)
PyRulez about 9 years

@Voo, Haskell doesn't need to use thunks; that's just how GHC implements it.
bitmask about 9 years

@Thomas: One might even point out that there go even the Natural Numbers. (Going over the -pedantic top :p )
Sergey Orshanskiy about 9 years

In fact, Node is not even defined before the end of this snippet, so you cannot really use it inside. Allowing one to implicitly forward-declare pointers to an as of yet undeclared class is a little cheat that is allowed by the language in order to make such structures possible, without the need to explicitly cast pointers all the time.
abligh about 9 years

IMHO Java's Node m_node is much more like C++'s Node &m_node, i.e. a reference rather than a pointer.
Falco about 9 years

To get something similar in Java would probably be "extends" if SuperNode extends Node, SuperNodes includes all Attributes of Node and has to reserve all the additional space. So in Java you cannot do "Node extends Node"
cmaster - reinstate monica about 9 years

@Falco True, inheritance is a form of in-place inclusion of the base classes. However, since Java does not allow for multiple inheritance (unlike C++), you can only pull in an instance of a single other preexisting class via inheritance. That's why I wouldn't think of inheritance as a substitute for in-place member inclusion.
emlai about 9 years

@abligh Why so? The Java Node can be null, whereas the C++ Node& can't. On the other hand, Node* could be null, which makes it identical to the Java Node.
abligh about 9 years

@zenith - it's not a perfect match, but consider e.g. node1 = m_node``. In Java that code works similarly to C++ if m_node` is defined as a reference, i.e. nothing is copied.
emlai about 9 years

@abligh Actually, the value of the pointer (the memory address) is copied in both cases.
Brent Bradburn about 7 years

The Node_reference declaration above addresses one of the most interesting language-level differences between Java and C++. In Java, declaring an object of type Node would implicitly use a reference to a separately allocated object. In C++, you have the choice of reference (pointer) vs. direct (stack) allocation -- so you have to handle the distinction explicitly. In most cases you would use direct allocation, although not for list elements.
Brent Bradburn about 4 years

Don't know why I didn't also recommend the possibility of a std::deque.