Best explanation for languages without null

45,894

Solution 1

I think the succinct summary of why null is undesirable is that meaningless states should not be representable.

Suppose I'm modeling a door. It can be in one of three states: open, shut but unlocked, and shut and locked. Now I could model it along the lines of

class Door
    private bool isShut
    private bool isLocked

and it is clear how to map my three states into these two boolean variables. But this leaves a fourth, undesired state available: isShut==false && isLocked==true. Because the types I have selected as my representation admit this state, I must expend mental effort to ensure that the class never gets into this state (perhaps by explicitly coding an invariant). In contrast, if I were using a language with algebraic data types or checked enumerations that lets me define

type DoorState =
    | Open | ShutAndUnlocked | ShutAndLocked

then I could define

class Door
    private DoorState state

and there are no more worries. The type system will ensure that there are only three possible states for an instance of class Door to be in. This is what type systems are good at - explicitly ruling out a whole class of errors at compile-time.

The problem with null is that every reference type gets this extra state in its space that is typically undesired. A string variable could be any sequence of characters, or it could be this crazy extra null value that doesn't map into my problem domain. A Triangle object has three Points, which themselves have X and Y values, but unfortunately the Points or the Triangle itself might be this crazy null value that is meaningless to the graphing domain I'm working in. Etc.

When you do intend to model a possibly-non-existent value, then you should opt into it explicitly. If the way I intend to model people is that every Person has a FirstName and a LastName, but only some people have MiddleNames, then I would like to say something like

class Person
    private string FirstName
    private Option<string> MiddleName
    private string LastName

where string here is assumed to be a non-nullable type. Then there are no tricky invariants to establish and no unexpected NullReferenceExceptions when trying to compute the length of someone's name. The type system ensures that any code dealing with the MiddleName accounts for the possibility of it being None, whereas any code dealing with the FirstName can safely assume there is a value there.

So for example, using the type above, we could author this silly function:

let TotalNumCharsInPersonsName(p:Person) =
    let middleLen = match p.MiddleName with
                    | None -> 0
                    | Some(s) -> s.Length
    p.FirstName.Length + middleLen + p.LastName.Length

with no worries. In contrast, in a language with nullable references for types like string, then assuming

class Person
    private string FirstName
    private string MiddleName
    private string LastName

you end up authoring stuff like

let TotalNumCharsInPersonsName(p:Person) =
    p.FirstName.Length + p.MiddleName.Length + p.LastName.Length

which blows up if the incoming Person object does not have the invariant of everything being non-null, or

let TotalNumCharsInPersonsName(p:Person) =
    (if p.FirstName=null then 0 else p.FirstName.Length)
    + (if p.MiddleName=null then 0 else p.MiddleName.Length)
    + (if p.LastName=null then 0 else p.LastName.Length)

or maybe

let TotalNumCharsInPersonsName(p:Person) =
    p.FirstName.Length
    + (if p.MiddleName=null then 0 else p.MiddleName.Length)
    + p.LastName.Length

assuming that p ensures first/last are there but middle can be null, or maybe you do checks that throw different types of exceptions, or who knows what. All these crazy implementation choices and things to think about crop up because there's this stupid representable-value that you don't want or need.

Null typically adds needless complexity. Complexity is the enemy of all software, and you should strive to reduce complexity whenever reasonable.

(Note well that there is more complexity to even these simple examples. Even if a FirstName cannot be null, a string can represent "" (the empty string), which is probably also not a person name that we intend to model. As such, even with non-nullable strings, it still might be the case that we are "representing meaningless values". Again, you could choose to battle this either via invariants and conditional code at runtime, or by using the type system (e.g. to have a NonEmptyString type). The latter is perhaps ill-advised ("good" types are often "closed" over a set of common operations, and e.g. NonEmptyString is not closed over .SubString(0,0)), but it demonstrates more points in the design space. At the end of the day, in any given type system, there is some complexity it will be very good at getting rid of, and other complexity that is just intrinsically harder to get rid of. The key for this topic is that in nearly every type system, the change from "nullable references by default" to "non-nullable references by default" is nearly always a simple change that makes the type system a great deal better at battling complexity and ruling out certain types of errors and meaningless states. So it is pretty crazy that so many languages keep repeating this error again and again.)

Solution 2

The nice thing about option types isn't that they're optional. It is that all other types aren't.

Sometimes, we need to be able to represent a kind of "null" state. Sometimes we have to represent a "no value" option as well as the other possible values a variable may take. So a language that flat out disallows this is going to be a bit crippled.

But often, we don't need it, and allowing such a "null" state only leads to ambiguity and confusion: every time I access a reference type variable in .NET, I have to consider that it might be null.

Often, it will never actually be null, because the programmer structures the code so that it can never happen. But the compiler can't verify that, and every single time you see it, you have to ask yourself "can this be null? Do I need to check for null here?"

Ideally, in the many cases where null doesn't make sense, it shouldn't be allowed.

That's tricky to achieve in .NET, where nearly everything can be null. You have to rely on the author of the code you're calling to be 100% disciplined and consistent and have clearly documented what can and cannot be null, or you have to be paranoid and check everything.

However, if types aren't nullable by default, then you don't need to check whether or not they're null. You know they can never be null, because the compiler/type checker enforces that for you.

And then we just need a back door for the rare cases where we do need to handle a null state. Then an "option" type can be used. Then we allow null in the cases where we've made a conscious decision that we need to be able to represent the "no value" case, and in every other case, we know that the value will never be null.

As others have mentioned, in C# or Java for example, null can mean one of two things:

  1. the variable is uninitialized. This should, ideally, never happen. A variable shouldn't exist unless it is initialized.
  2. the variable contains some "optional" data: it needs to be able to represent the case where there is no data. This is sometimes necessary. Perhaps you're trying to find an object in a list, and you don't know in advance whether or not it's there. Then we need to be able to represent that "no object was found".

The second meaning has to be preserved, but the first one should be eliminated entirely. And even the second meaning should not be the default. It's something we can opt in to if and when we need it. But when we don't need something to be optional, we want the type checker to guarantee that it will never be null.

Solution 3

All of the answers so far focus on why null is a bad thing, and how it's kinda handy if a language can guarantee that certain values will never be null.

They then go on to suggest that it would be a pretty neat idea if you enforce non-nullability for all values, which can be done if you add a concept like Option or Maybe to represent types that may not always have a defined value. This is the approach taken by Haskell.

It's all good stuff! But it doesn't preclude the use of explicitly nullable / non-null types to achieve the same effect. Why, then, is Option still a good thing? After all, Scala supports nullable values (is has to, so it can work with Java libraries) but supports Options as well.

Q. So what are the benefits beyond being able to remove nulls from a language entirely?

A. Composition

If you make a naive translation from null-aware code

def fullNameLength(p:Person) = {
  val middleLen =
    if (null == p.middleName)
      p.middleName.length
    else
      0
  p.firstName.length + middleLen + p.lastName.length
}

to option-aware code

def fullNameLength(p:Person) = {
  val middleLen = p.middleName match {
    case Some(x) => x.length
    case _ => 0
  }
  p.firstName.length + middleLen + p.lastName.length
}

there's not much difference! But it's also a terrible way to use Options... This approach is much cleaner:

def fullNameLength(p:Person) = {
  val middleLen = p.middleName map {_.length} getOrElse 0
  p.firstName.length + middleLen + p.lastName.length
}

Or even:

def fullNameLength(p:Person) =       
  p.firstName.length +
  p.middleName.map{length}.getOrElse(0) +
  p.lastName.length

When you start dealing with List of Options, it gets even better. Imagine that the List people is itself optional:

people flatMap(_ find (_.firstName == "joe")) map (fullNameLength)

How does this work?

//convert an Option[List[Person]] to an Option[S]
//where the function f takes a List[Person] and returns an S
people map f

//find a person named "Joe" in a List[Person].
//returns Some[Person], or None if "Joe" isn't in the list
validPeopleList find (_.firstName == "joe")

//returns None if people is None
//Some(None) if people is valid but doesn't contain Joe
//Some[Some[Person]] if Joe is found
people map (_ find (_.firstName == "joe")) 

//flatten it to return None if people is None or Joe isn't found
//Some[Person] if Joe is found
people flatMap (_ find (_.firstName == "joe")) 

//return Some(length) if the list isn't None and Joe is found
//otherwise return None
people flatMap (_ find (_.firstName == "joe")) map (fullNameLength)

The corresponding code with null checks (or even elvis ?: operators) would be painfully long. The real trick here is the flatMap operation, which allows for the nested comprehension of Options and collections in a way that nullable values can never achieve.

Solution 4

Since people seem to be missing it: null is ambiguous.

Alice's date-of-birth is null. What does it mean?

Bob's date-of-death is null. What does that mean?

A "reasonable" interpretation might be that Alice's date-of-birth exists but is unknown, whereas Bob's date-of-death does not exist (Bob is still alive). But why did we get to different answers?


Another problem: null is an edge case.

  • Is null = null?
  • Is nan = nan?
  • Is inf = inf?
  • Is +0 = -0?
  • Is +0/0 = -0/0?

The answers are usually "yes", "no", "yes", "yes", "no", "yes" respectively. Crazy "mathematicians" call NaN "nullity" and say it compares equal to itself. SQL treats nulls as not equal to anything (so they behave like NaNs). One wonders what happens when you try to store ±∞, ±0, and NaNs into the same database column (there are 253 NaNs, half of which are "negative").

To make matters worse, databases differ in how they treat NULL, and most of them aren't consistent (see NULL Handling in SQLite for an overview). It's pretty horrible.


And now for the obligatory story:

I recently designed a (sqlite3) database table with five columns a NOT NULL, b, id_a, id_b NOT NULL, timestamp. Because it's a generic schema designed to solve a generic problem for fairly arbitrary apps, there are two uniqueness constraints:

UNIQUE(a, b, id_a)
UNIQUE(a, b, id_b)

id_a only exists for compatibility with an existing app design (partly because I haven't come up with a better solution), and is not used in the new app. Because of the way NULL works in SQL, I can insert (1, 2, NULL, 3, t) and (1, 2, NULL, 4, t) and not violate the first uniqueness constraint (because (1, 2, NULL) != (1, 2, NULL)).

This works specifically because of how NULL works in a uniqueness constraint on most databases (presumably so it's easier to model "real-world" situations, e.g. no two people can have the same Social Security Number, but not all people have one).


FWIW, without first invoking undefined behaviour, C++ references cannot "point to" null, and it's not possible to construct a class with uninitialized reference member variables (if an exception is thrown, construction fails).

Sidenote: Occasionally you might want mutually-exclusive pointers (i.e. only one of them can be non-NULL), e.g. in a hypothetical iOS type DialogState = NotShown | ShowingActionSheet UIActionSheet | ShowingAlertView UIAlertView | Dismissed. Instead, I'm forced to do stuff like assert((bool)actionSheet + (bool)alertView == 1).

Solution 5

The undesirability of having having references/pointers be nullable by default.

I don't think this is the main issue with nulls, the main issue with nulls is that they can mean two things:

  1. The reference/pointer is uninitialized: the problem here is the same as mutability in general. For one, it makes it more difficult to analyze your code.
  2. The variable being null actually means something: this is the case which Option types actually formalize.

Languages which support Option types typically also forbid or discourage the use of uninitialized variables as well.

How option types work including strategies to ease checking null cases such as pattern matching.

In order to be effective, Option types need to be supported directly in the language. Otherwise it takes a lot of boiler-plate code to simulate them. Pattern-matching and type-inference are two keys language features making Option types easy to work with. For example:

In F#:

//first we create the option list, and then filter out all None Option types and 
//map all Some Option types to their values.  See how type-inference shines.
let optionList = [Some(1); Some(2); None; Some(3); None]
optionList |> List.choose id //evaluates to [1;2;3]

//here is a simple pattern-matching example
//which prints "1;2;None;3;None;".
//notice how value is extracted from op during the match
optionList 
|> List.iter (function Some(value) -> printf "%i;" value | None -> printf "None;")

However, in a language like Java without direct support for Option types, we'd have something like:

//here we perform the same filter/map operation as in the F# example.
List<Option<Integer>> optionList = Arrays.asList(new Some<Integer>(1),new Some<Integer>(2),new None<Integer>(),new Some<Integer>(3),new None<Integer>());
List<Integer> filteredList = new ArrayList<Integer>();
for(Option<Integer> op : list)
    if(op instanceof Some)
        filteredList.add(((Some<Integer>)op).getValue());

Alternative solution such as message eating nil

Objective-C's "message eating nil" is not so much a solution as an attempt to lighten the head-ache of null checking. Basically, instead of throwing a runtime exception when trying to invoke a method on a null object, the expression instead evaluates to null itself. Suspending disbelief, it's as if each instance method begins with if (this == null) return null;. But then there is information loss: you don't know whether the method returned null because it is valid return value, or because the object is actually null. It's a lot like exception swallowing, and doesn't make any progress addressing the issues with null outlined before.

Share:
45,894

Related videos on Youtube

Roman A. Taycher
Author by

Roman A. Taycher

Born in Odessa,Ukraine. Became a CS major. Graduated with a Bachelor of Science in Computer Science from Portland State University. Currently working at Intel(as a contractor).

Updated on February 16, 2020

Comments

  • Roman A. Taycher
    Roman A. Taycher about 4 years

    Every so often when programmers are complaining about null errors/exceptions someone asks what we do without null.

    I have some basic idea of the coolness of option types, but I don't have the knowledge or languages skill to best express it. What is a great explanation of the following written in a way approachable to the average programmer that we could point that person towards?

    • The undesirability of having references/pointers be nullable by default
    • How option types work including strategies to ease checking null cases such as
      • pattern matching and
      • monadic comprehensions
    • Alternative solution such as message eating nil
    • (other aspects I missed)
    • Stephen Swensen
      Stephen Swensen over 13 years
      If you add tags to this question for functional-programming or F# you are bound to get some fantastic answers.
    • Roman A. Taycher
      Roman A. Taycher over 13 years
      I added functional programming tag since the option-type did come from the ml world. I'd rather not mark it F#(too specific). BTW someone with taxonomy powers needs to add a maybe-type or option-type tags.
    • josesuero
      josesuero over 13 years
      there's little need for such specific tags, I suspect. The tags are mainly to allow people to find relevant questions (for example, "questions I know a lot about, and will be able to answer", and "functional-programming" is very helpful there. But something like "null" or "option-type" are much less useful. Few people are likely to monitor an "option-type" tag looking for questions they can answer. ;)
    • stevendesu
      stevendesu over 13 years
      Let's not forget that one of the main reasons for null is that computers evolved strongly tied to set theory. Null is one of the most important sets in all of set theory. Without it entire algorithms would break down. For instance- perform a merge sort. This involves breaking a list in half several times. What if the list is 7 items long? First you split it into 4 and 3. Then 2, 2, 2, and 1. Then 1, 1, 1, 1, 1, 1, 1, and.... null! Null has a purpose, just one that you don't see practically. It exists more for the theoretical realm.
    • stusmith
      stusmith over 13 years
      @steven_desu - I disagree. In 'nullable' languages, you can have a reference to an empty list [], and also a null list reference. This question relates to the confusion between the two.
    • nawfal
      nawfal about 11 years
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    This is a pet peeve but c# is hardly a c-like language.
  • Stephen Swensen
    Stephen Swensen over 13 years
    I was going for Java here, since C# would probably have a nicer solution... but I appreciate your peeve, what people really mean is "a language with c-inspired syntax". I went ahead and replaced the "c-like" statement.
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    With linq, right. I was thinking of c# and didn't notice that.
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    Yes with c inspired syntax mostly, but I think I have also heard of imperative programming languages like python/ruby with very little in the way of c like syntax referred to as c-like by functional programmers.
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    I've never tried it but en.wikipedia.org/wiki/Cyclone_%28programming_language%29 claims to allow non-null pointers for c.
  • Stephen Swensen
    Stephen Swensen over 13 years
    I disagree with your statement that nobody is interested in the first case. Many people, especially those in the functional language communities, are extremely interested in this and either discourage or completely forbid the use of uninitialized variables.
  • Admin
    Admin over 13 years
    I believe NULL as in "reference that may not point to anything" was invented for some Algol language (Wikipedia agrees, see en.wikipedia.org/wiki/Null_pointer#Null_pointer). But of course it's likely that assembly programmers initialized their pointers to an invalid adress (read: Null = 0).
  • bltxd
    bltxd over 13 years
    @Stephen: We probably meant the same thing. To me they discourage or forbid the use of uninitialized things precisely because there is no point discussing undefined things as we can't do anything sane or useful with them. It would have no interest whatsoever.
  • Stephen Swensen
    Stephen Swensen over 13 years
    @bltxd: cool, I suspected I was not quite understanding what you were trying to convey.
  • tc.
    tc. over 13 years
    -1. NULL is not "directly inherited from assembly"; there's generally nothing special about virtual address 0, and often nothing special about physical address 0. IIRC, it's even traditional to load your program to address 0, to the extent that many learn-C-in-21-days books speficially said that your program was loaded to address 0.
  • tc.
    tc. over 13 years
    +1, but note that not all people have last names ("given name" and "family name" seem to be more accurate anyway); I've heard of some people that have only one name (this means they don't fit into most data models). Arabic names can be especially complicated. Not all credit cards have 16 digits either.
  • josesuero
    josesuero over 13 years
    as @tc. says, null has nothing to do with assembly. In assembly, types are generally not nullable. A value loaded into a general-purpose register might be zero or it might be some non-zero integer. But it can never be null. Even if you load a memory address into a register, on most common architectures, there is no separate representation of the "null pointer". That's a concept introduced in higher-level languages, like C.
  • Prakash
    Prakash over 13 years
    Re: names - Indeed. And maybe you do care about modeling a door that is hanging open but with the lock deadbolt sticking out, preventing the door from shutting. There is lots of complexity in the world. The key is not to add more complexity when implementing the mapping between "world states" and "program states" in your software.
  • Prakash
    Prakash over 13 years
    By the way, for good reading on the topic of representation in software, I suggest the out-of-print "Abstraction and Specification in Program Development" (by Liskov, using the CLU language).
  • Stephen Swensen
    Stephen Swensen over 13 years
    @Brian, thanks for fulfilling my promise to OP that someone from the functional programming / F# community would come through with a great answer!
  • bltxd
    bltxd over 13 years
    @tc & jalf: rethinking about the issue, you're absolutely right... I guess it's a good time to rephrase the post :)
  • tc.
    tc. over 13 years
    My point is that you're making assumptions about names that you should never have made. What's wrong with String name, preferredName? If you have multiple middle names, do you concatenate them with spaces? Why do spaces in the middle name count as a "character", while the spaces between names don't? What about Unicode combining characters?
  • vpalmu
    vpalmu over 13 years
    What, you've never locked doors open?
  • akaphenom
    akaphenom over 13 years
    I don't understand why folks get worked up over the semantics of a particular domain. Brian represented the flaws with null in a concise and simple manner, yes he simplified the problem domain in his example by saying everyone has first and last names. The question was answered to a 'T', Brian - if you're ever in boston I owe you a beer for all the posting you do here!
  • Prakash
    Prakash over 13 years
    @akaphenom: thanks, but note that not all people drink beer (I am a non-drinker). But I appreciate that you are just using a simplified model of the world in order to communicate gratitude, so I won't quibble more about the flawed assumptions of your world-model. :P (So much complexity in the real world! :) )
  • Dave Griffith
    Dave Griffith over 13 years
    @delnan - Null was added in Algol. John Backus, inventor of Algol, refers to it as his "billion dollar mistake"
  • comonad
    comonad over 13 years
    Strangely, there are 3-state-doors in this world! They are used in some hotels as toilet-doors. A push-button acts as a key from the inside, that locks the door from the outside. It is automatically unlocked, as soon as the latch bolt moves.
  • Stephen Swensen
    Stephen Swensen over 13 years
    Hi @Jon, It's a bit hard following you here. I finally realized that by "special/weird" values you probably mean something like Javascript's 'undefined' or IEEE's 'NaN'. But besides that, you don't really address any of the questions the OP asked. And the statement that "Null is probably the most useful notion for checking if something is absent" is almost certainly wrong. Option types are a well-regarded, type-safe alternative to null.
  • Jon
    Jon over 13 years
    @Stephen - Actually looking back over my message, i think the whole 2nd half should be moved to a yet-to-be-asked question. But I still say null is very useful for checking to see if something is absent.
  • C. A. McCann
    C. A. McCann over 13 years
    +1, this is a good point to emphasize. One addendum: over in Haskell-land, flatMap would be called (>>=), that is, the "bind" operator for monads. That's right, Haskellers like flatMapping things so much that we put it in our language's logo.
  • Admin
    Admin over 13 years
    +1 Hopefully an expression of Option<T> would never, ever be null. Sadly, Scala is uhh, still linked to Java :-) (On the other hand, if Scala didn't play nice with Java, who would use it? O.o)
  • Kevin Wright
    Kevin Wright over 13 years
    Easy enough to do: 'List(null).headOption'. Note that this means a very different thing than a return value of 'None'
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    I think I understand what you are talking about but could you list some examples? Especially of applying multiple functions to a possibly null value?
  • Roman A. Taycher
    Roman A. Taycher over 13 years
    I gave you bounty since I really like what you said about composition, that other people didn't seem to mention.
  • vpalmu
    vpalmu over 13 years
    Well applying a vector transform to an empty vector results in another empty vector. FYI, SQL is mostly a vector language.
  • vpalmu
    vpalmu over 13 years
    OK I better clarify that. SQL is a vector language for rows and a value language for columns.
  • duggi
    duggi over 13 years
    ... and what about revolving doors? they can lock, but are they ever really opened or closed?
  • ChenZhou
    ChenZhou over 12 years
    Excellent answer with great examples!
  • Noldorin
    Noldorin over 11 years
    Actual mathematicians do not use the concept of "NaN" though, rest assured.
  • I. J. Kennedy
    I. J. Kennedy over 11 years
    @Noldorin: They do, but they use the term "indeterminate form".
  • Noldorin
    Noldorin over 11 years
    @I.J.Kennedy: That's a different college, which I know quite well thank you. Some 'NaN's may represent indeterminate form, but since FPA doesn't do symbolic reasoning, equating it with indeterminate form is quite misleading!
  • nawfal
    nawfal about 11 years
    or it could be this crazy extra null value that doesn't map into my problem domain. well said, accepted..
  • David Conrad
    David Conrad almost 10 years
    @DaveGriffith It was Tony Hoare, not John Backus.
  • ARF
    ARF about 9 years
    'Null typically adds needless complexity.' This brings my biggest gripe with Javascript, it has both null and undefined values. Coffeescript even has a special operator ( '?' ) to check for both at the same time.
  • Nathan C. Tresch
    Nathan C. Tresch about 9 years
    First of all, your assertion that null is meaningless is false: It can be said to represent a wave function before it's collapse, or an indeterminate state. From there, it follows that null plays a valuable part in data modelling and representing things in our code.
  • cat
    cat almost 8 years
    What's wrong with assert(actionSheet ^ alertView)? Or can't your language XOR bools?
  • Ohad Schneider
    Ohad Schneider over 6 years
    And in the second meaning, we want the compiler to warn (stop?) us if we try to access such variables without checking for nullity first. Here's a great article about the upcoming null/non-null C# feature (finally!) blogs.msdn.microsoft.com/dotnet/2017/11/15/…