What exactly is a reference in C#

11,543

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object

If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.

which has reference count

Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.

and knows about the type compatibility.

I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.

My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

How references are implemented is, not surprisingly, an implementation detail.

Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#

References are things that act as references are specified to act by the C# language specification. That is:

  • objects (of reference type) have identity independent from the values of their fields
  • any object may have a reference to it
  • such a reference is a value which may be passed around like any other value
  • equality comparison is implemented for those values
  • two references are equal if and only if they refer to the same object; that is, references reify object identity
  • there is a unique null reference which refers to no object and is unequal to any valid reference to an object
  • A static type is always known for any reference value, including the null reference
  • If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)

There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.

and a bit about how they are implemented?

In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.

As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?

it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure

This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.

In C# you can think of a reference to a variable as an alias. That is, when you say

void M() 
{
  int x = 123;
  N(ref x);
}
void N(ref int y)
{ 
    y = 456;

Essentially what we are saying is that x and y are different names for the same variable. The ref is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y is now just another name for x; we have two names for the same variable.

References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:

static bool EqualAliases(ref int y, ref int z)
{
  return true iff y and z are both aliases for the same variable
}

the way you can with references:

static bool EqualReferences(object x, object y)
{
  return x == y;
}

Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.

Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.

Share:
11,543

Related videos on Youtube

meJustAndrew
Author by

meJustAndrew

As correct information keeps rising on this site, while the wrong one is vanished away by the community, I believe that the community will always find the way to shape a better world.

Updated on June 04, 2022

Comments

  • meJustAndrew
    meJustAndrew almost 2 years

    From what I understand by now, I can say that a reference in C# is a kind of pointer to an object which has reference count and knows about the type compatibility. My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

    I have read this post about what differences are between references and pointers, but that does not cover that much about what a reference is but it it's describing more it's properties compared with a pointer in C++. I also understand the differences between passing by reference an passing by value (as in C# objects are by default passed by value, even references), but it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure as in the Eric Lippert blog entry about the stack as an implementation detail.

    Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C# and a bit about how they are imlemented?

    Edit: this is not a duplicate, because in the Reference type in C# it is explained how a reference works and how is it different of a value, but what am I asking is how a reference is defined at a low level.

    • techvice
      techvice over 7 years
      Possible duplicate of Reference type in C#
    • Ben Voigt
      Ben Voigt over 7 years
      If you think there's anything going on with reference counting, you don't understand.
    • Ben Voigt
      Ben Voigt over 7 years
      Concerning "when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure" the important difference between a variable of reference type and a byref parameter is that the first one influences the lifetime of what it points to and the second one does not. There's lots of ways they are similar (they hold an address, they will be automatically adjusted if the GC performs a heap compaction) but those aren't so important for your particular point.
    • Admin
      Admin over 7 years
      @BenVoigt It's possible for a ref parameter to reference a boxed value type, is it not? If that happens, it's possible for that ref parameter to be the only remaining reference, and if that doesn't cause the lifetime to be extended, things are going to behave very badly.
    • Ben Voigt
      Ben Voigt over 7 years
      @hvd: No, it isn't. The byref parameter references the variable in (or known to, it might be a class member) the caller which I suppose has type object (a reference type). That variable keeps its target object alive. If the variable was a member of a heap object, the caller must have had a reference to that object, keeping it (and its member variable) alive. The byref parameter itself doesn't keep anything alive.
    • Admin
      Admin over 7 years
      @BenVoigt The caller must have had a reference to that object, but that reference may be cleared during the call. Consider pastebin.com/ak2baTvh. Try it in release mode. With the Console.WriteLine(s_); commented, the WeakReference shows that s got garbage collected. With Console.WriteLine(s_); uncommented, the WeakReference shows that s remains alive longer.
    • Eric Lippert
      Eric Lippert over 7 years
      I would be careful to not conflate ref variables and references to objects. They are conceptually quite different. In C# a reference to an object refers to an object as a whole, and a ref variable is an alias for another variable. You can tell they are different conceptually because C# permits different operations on them. You can take two references to object and call ReferenceEquals on them to determine if they are referring to the same or different object. But there is no way in C# to determine if two ref params refer to the same variable.
    • meJustAndrew
      meJustAndrew over 7 years
      @Eric Lippert thanks for the comment, I never thought this way about ref parameters as they won't have references equal, I find this pretty interesting. I want to thank you for the great answer, I am sorry that regarding your last question within your answer I have to say that thinking about it I am not able to provide an answer as I would not be able to create this kind of system but it helped me get an idea about how complex should be the entire mechanism of references and garbadge collection.
    • meJustAndrew
      meJustAndrew over 7 years
      also @Eric Lippert, just to clarify, the ref and references to objects are not conceptually the same thing, but tehnically, as an implementation detail, are they the same?
    • Ben Voigt
      Ben Voigt over 7 years
      @hvd: Ahh, you're making a byref parameter to the value type in the box, and it's the last thing keeping the boxed value alive. I was thinking you meant a byref parameter to the reference to the boxed value. Anyway, you're right, the caller's reference doesn't keep it alive, the byref parameter does, and I'm right, boxing is not a special case, just a particular case of byref to variable inside a gc-heap object. Consider rextester.com/DTAQ3945 which exhibits the exact same behavior with neither boxing nor value types.
    • Admin
      Admin over 7 years
      @BenVoigt Right, I realised that afterwards too and thought of a ref to a class's field as another simpler example. I think we're in agreement now.
    • Eric Lippert
      Eric Lippert over 7 years
      @meJustAndrew: That's complicated; too complicated for a comment. My advice to you is that if you want to understand this stuff, that you approach in by going down one level of abstraction at a time. The C# type system is layered on top of the Common Type System of the CLI, which makes the relationships between object, interface, managed pointer and unmanaged pointer types very clear; many of these concepts are abstracted away in the C# type system. Chapter 8 of Partition I of the CLI spec should prove interesting reading for you.
  • Ben Voigt
    Ben Voigt over 7 years
    Side note: the CLR garbage collector is capable of updating both references and pointers to (or into) gc-heap objects that get moved. The C# language doesn't support tracking pointers (so targets must be pinned), but other .NET languages do.
  • Ben Voigt
    Ben Voigt over 7 years
    C# references aren't immutable! (Big difference from C++ references, that) And they can be null without violating any invariants (another big difference from C++).
  • Malachi
    Malachi over 7 years
    Thanks Ben! Updated my response to reflect your correction
  • Eric Lippert
    Eric Lippert over 7 years
    @BenVoigt: It's a bit confusing, because C++ has in many ways a different approach to references and memory management than C# does, obviously. The right analogy to make I think is that a C++ reference is less like a reference to an object in C# than it is like a ref variable in C#. And those are immutable in C#; when you say M(ref x) and we have void M(ref int y) then y becomes an alias for x, and there is no way to change that inside M, to make y an alias for something else.
  • Ben Voigt
    Ben Voigt over 7 years
    @EricLippert: like a ref variable in C# 7, yes. Before return types and locals were added, not so much. Although beware that in C++ a reference, like a pointer, is an alias to a location and not any particular object. It's perfectly legal (given some conditions) to replace the referred-to object with a different one of the same type in the same location.
  • Ben Voigt
    Ben Voigt over 7 years
    @EricLippert, you probably already know this but our readers may not: Your simple statement suggests the right conclusion, but in C++ where a distinction is made between initialization and assignment, and object lifetime is eager and deterministic, the journey to reach that conclusion is much longer.