Expression trees for dummies?

c# .net linq expression-trees

25,806

Solution 1

The best explanation about expression trees I ever read is this article by Charlie Calvert.

To sum it up;

An expression tree represents what you want to do, not how you want to do it.

Consider the following very simple lambda expression:
Func<int, int, int> function = (a, b) => a + b;

This statement consists of three sections:

A declaration: Func<int, int, int> function

An equals operator: =

A lambda expression: (a, b) => a + b;

The variable function points at raw executable code that knows how to add two numbers.

This is the most important difference between delegates and expressions. You call function (a Func<int, int, int>) without ever knowing what it will do with the two integers you passed. It takes two and returns one, that's the most your code can know.

In the previous section, you saw how to declare a variable that points at raw executable code. Expression trees are not executable code, they are a form of data structure.

Now, unlike delegates, your code can know what an expression tree is meant to do.

LINQ provides a simple syntax for translating code into a data structure called an expression tree. The first step is to add a using statement to introduce the Linq.Expressions namespace:

using System.Linq.Expressions;

Now we can create an expression tree:
Expression<Func<int, int, int>> expression = (a, b) => a + b;

The identical lambda expression shown in the previous example is converted into an expression tree declared to be of type Expression<T>. The identifier expression is not executable code; it is a data structure called an expression tree.

That means you can't just invoke an expression tree like you could invoke a delegate, but you can analyze it. So what can your code understand by analyzing the variable expression?

// `expression.NodeType` returns NodeType.Lambda.
// `expression.Type` returns Func<int, int, int>.
// `expression.ReturnType` returns Int32.

var body = expression.Body;
// `body.NodeType` returns ExpressionType.Add.
// `body.Type` returns System.Int32.

var parameters = expression.Parameters;
// `parameters.Count` returns 2.

var firstParam = parameters[0];
// `firstParam.Name` returns "a".
// `firstParam.Type` returns System.Int32.

var secondParam = parameters[1].
// `secondParam.Name` returns "b".
// `secondParam.Type` returns System.Int32.

Here we see that there is a great deal of information we can get from an expression.

But why would we need that?

You have learned that an expression tree is a data structure that represents executable code. But so far we have not answered the central question of why one would want to make such a conversion. This is the question we asked at the beginning of this post, and it is now time to answer it.

A LINQ to SQL query is not executed inside your C# program. Instead, it is translated into SQL, sent across a wire, and executed on a database server. In other words, the following code is never actually executed inside your program:
var query = from c in db.Customers where c.City == "Nantes" select new { c.City, c.CompanyName };

It is first translated into the following SQL statement and then executed on a server:
SELECT [t0].[City], [t0].[CompanyName] FROM [dbo].[Customers] AS [t0] WHERE [t0].[City] = @p0

The code found in a query expression has to be translated into a SQL query that can be sent to another process as a string. In this case that process happens to be a SQL server database. It is obviously going to be much easier to translate a data structure such as an expression tree into SQL than it is to translate raw IL or executable code into SQL. To exaggerate the difficulty of the problem somewhat, just imagine trying to translate a series of zeros and ones into SQL!

When it is time to translate your query expression into SQL, the expression tree representing your query is taken apart and analyzed, just as we took apart our simple lambda expression tree in the previous section. Granted, the algorithm for parsing the LINQ to SQL expression tree is much more sophisticated than the one we used, but the principle is the same. Once it has analyzed the parts of the expression tree, then LINQ mulls them over and decides the best way to write a SQL statement that will return the requested data.

Expression trees were created in order to make the task of converting code such as a query expression into a string that can be passed to some other process and executed there. It is that simple. There is no great mystery here, no magic wand that needs to be waved. One simply takes code, converts it into data, and then analyzes the data to find the constituent parts that will be translated into a string that can be passed to another process.

Because the query comes to the compiler encapsulated in such an abstract data structure, the compiler is free to interpret it in almost any way it wants. It is not forced to execute the query in a particular order, or in a particular way. Instead, it can analyze the expression tree, discover what you want done, and then decide how to do it. At least in theory, it has the freedom to consider any number of factors, such as the current network traffic, the load on the database, the current results sets it has available, etc. In practice LINQ to SQL does not consider all these factors, but it is free in theory to do pretty much what it wants. Furthermore, one could pass this expression tree to some custom code you write by hand which could analyze it and translate it into something very different from what is produced by LINQ to SQL.

Once again, we see that the expression trees allow us to represent (express?) what we want to do. And we use translators that decide how our expressions are getting used.

Solution 2

An expression tree is a mechanism to translate executable code into data. Using an expression tree, you can produce a data structure that represents your program.

In C#, you can work with the expression tree produced by lambda expressions by using the Expression<T> class.

In a traditional program, you write code like this:

double hypotenuse = Math.Sqrt(a*a + b*b);

This code causes the compiler to generate an assignment, and that's it. In most cases, that's all you care about.

With conventional code, your application can't go retroactively back and look at hypotenuse to determine that it was produced by performing a Math.Sqrt() call; this information is simply not part of what is included.

Now, consider a lambda expression like the following:

Func<int, int, double> hypotenuse = (a, b) => Math.Sqrt(a*a + b*b);

This is a little different than before. Now hypotenuse is actually a reference to a block of executable code. If you call

hypotenuse(3, 4);

you will get the value 5 returned.

We can use expression trees to explore the block of executable code that was produced. Try this instead:

Expression<Func<int, int, int>> addTwoNumbersExpression = (x, y) => x + y;
BinaryExpression body = (BinaryExpression) addTwoNumbersExpression.Body;
Console.WriteLine(body);

This produces:

(x + y)

More advanced techniques and manipulations are possible with expression trees.

Solution 3

Expression trees are an in-memory representation of an expression, e.g. an arithmetic or boolean expression. For example, consider the arithmetic expression

a + b*2

Since * has a higher operator precedence than +, the expression tree is built like that:

    [+]
  /    \
 a     [*]
      /   \
     b     2

Having this tree, it can be evaluated for any values of a and b. Additionally, you can transform it into other expression trees, for example to derive the expression.

When you implement an expression tree, I would suggest to create a base class Expression. Derived from that, the class BinaryExpression would be used for all binary expressions, such as + and * . Then you could introduce a VariableReferenceExpression to reference variables (such as a and b), and another class ConstantExpression (for the 2 from the example).

The expression tree is in many cases built as the result of parsing an input (from the user directly, or from a file). For evaluating the expression tree, I would suggest to use the Visitor pattern.

Solution 4

Short answer: It's nice to be able to write the same kind of LINQ query and point it at any data source. You couldn't have a "Language Integrated" query without it.

Long answer: As you probably know, when you compile source code, you're transforming it from one language to another. Usually from a high level language (C#) to a lower lever on (IL).

There are basically two ways you can do this:

You can translate the code using find and replace
You parse the code and get a parse tree.

The latter is what all the programs we know as 'compilers' do.

Once you have a parse tree you can easily translate it into any other language and this is what expression trees allow us to do. Since the code is stored as data you can do anything you want to it but probably you'll just want to translate it into some other language.

Now, in LINQ to SQL the expression trees get turned into a SQL command and then are sent over the wire to the database server. As far as I know they don't do anything really fancy when translating the code but they could. For instance, the query provider could create different SQL code depending on the network conditions.

Solution 5

IIUC, an expression tree is similar to an Abstract Syntax Tree, but an expression usually yiels a single value, whereas an AST can represent an entire program (with classes, packages, function, statements, etc.)

Anyway, for an the expression (2 + 3) * 5, the tree is:

Evaluate each node recursively (bottom-up) to get the value at the root node, i.e. the value of the expression.

You can of course have unary (negation) or trinary (if-then-else) operators too, and functions (n-ary, i.e. any number of ops) if your expression language allows it.

Evaluating types and doing type-control is done over similar trees.

View more solutions

25,806

Admin

Updated on November 16, 2020

Comments

Admin over 3 years

I am the dummy in this scenario.

I've tried to read on Google what these are but I just don't get it. Can someone give me a simple explanation of what they are and why they're useful?

edit: I'm talking about the LINQ feature in .Net.
- Jim Schubert over 14 years
  
  I know this post is rather old, but I've been looking into Expression Trees lately. I became interested after I began using Fluent NHibernate. James Gregory extensively uses what is known as static reflection and he has an intro: jagregory.com/writings/introduction-to-static-reflection To see static reflection and expression trees in action, check out the Fluent NHibernate source code (fluentnhibernate.org). It is very clean, and a very cool concept.
Admin about 15 years

OK I was with you right til the end but I still don't really get why this is a big deal. I'm having a hard time thinking of applications.
Pierreten about 14 years

He was using a simplified example; the true power lies in the fact that your code that explores the expression tree, can also be made responsible for interpreting it and applying semantic meaning to the expression.
Richard almost 11 years

Yeah, I know I'm late to the game, but I wanted to write this answer as a way to understand it myself. (This question came up high in my internet search.)
Paul Matthews over 9 years

Yes, this answer would have been better if he/she explained why (x + y) was actually useful to us. Why would we want to explore (x + y) and how do we do that?
yoel halb over 8 years

Well while it is true that an Expression Tree that the OP refers to works similar and with the same underlying concept as a parse tree, it is done dynamically at run-time with code, however note with he introduction of the Roslyn compiler the line of division between the two got really blurry if not completely removed.
Rich Bryant about 8 years

Nice job. It's a good answer.
Yarik over 7 years

The "var" keyword has nothing to do with DLR. You are confusing it with the "dynamic".
johnny almost 6 years

This is a good, small answer on var here, which shows Yarik is correct. Thankful for the rest of the answer, however. quora.com/…
johnny over 5 years

One of the better answers.
Yan D almost 5 years

excellent answer. one little aspect to add to this brilliant explanation is - another use of expression trees is that you can modify expression tree on the fly at run time as you may see fit before you feed it to be executed which at times is extremely useful.
Şafak Gür almost 5 years

This is all wrong. var is a compile-time syntactic sugar - it has nothing to do with expression trees, DLR or the runtime. var i = 0 gets compiled as if you wrote int i = 0, so you can't use var to represent a type that is not known in the compile-time. Expression trees are not "an addition to support DLR", they are introduced in .NET 3.5 to allow LINQ. DLR on the other hand is introduced in .NET 4.0 to allow dynamic languages (like IronRuby) and the dynamic keyword. Expression trees are actually used by the DLR to provide interop, it's not the other way around.
stanimirsp over 4 years

You don't have to explore it, you do it just to see what is your query and what will be translated into some other language in that case to SQL