What does the buffered parameter do in Dapper dot net?

.net sqldatareader dapper

12,058

Solution 1

but as far as I can tell the only thing it does is cast the result to a list before returning it

You aren't missing anything. That is the key difference. Except it isn't a cast as such: the actual returned object is very different. Basically, there are two ways of reading data:

in a streaming API each element is yielded individually; this is very memory efficient, but if you do lots of subsequent processing per item, mean that your connection / command could be "active" for an extended time
in a buffered API all the rows are read before anything is yielded

If you are reading a very large amount of data (thousands to millions of rows), a non-buffered API may be preferable. Otherwise lots of memory is used, and there may be noticeable latency before even the first row is available. However, in most common scenarios the amount of data read is within reasonable limits, so it is reasonable to push it into a list before handing it to the caller. That means that the command / reader etc has completed before it returns.

As a side note, buffered mode also avoids the oh-so-common "there is already an open reader on the connection" (or whatever the exact phrasing is).

Solution 2

I have to disagree with @chris-marisic on this... I ran into multiple "Out Of Memory" exceptions at that exact line (data.ToList()) when using buffered:true. It was not a "zillion rows X bazillion columns" query, just a regular 5-6k rows SQL-result with about 30 columns.

It really depends on your configuration. E.g. whether your SQL and IIS run on same physical machine or not. And how much memory is installed on the IIS machine, and what is the page file setting etc. If the web-server has 2 GB or less - consider setting "buffered:false" for super-heavy reports.

Solution 3

In practice it is better to never use buffered: false.

I've found reading even many millions of rows that it is both faster and more memory efficient to use buffered results than unbuffered. Perhaps there is a cross over point if your tables have 500 columns and you're reading 10s of millions or 100s of millions of rows.

If your result sets are smaller than many billions of values it is not worth using buffered: false for any reason.

I was shocked during actual analysis that reading gigabytes of data from Sql Server was both faster (2-6x faster) and more memory efficient in standard buffered mode. The performance increase even accounts for the most minute operation possible, adding an object to a sparse array by index to an array that does not resize. Using a multi-gigabyte sparse array switching from unbuffered to buffered saw the 2x improvement in load time. Writing to a dictionary using buffered saw the 6x improvement in load time when inserting millions of records (dictionary used the table's int PK as key so as basic of a hashcode calculation as possible).

As with everything regarding performance you always must analyze. However I can tell you with a very high level of certainty always start with the default buffered behavior of Dapper.

12,058

smdrager

I work as a web development manager for an e-commerce and mail-catalog company in Bethlehem, PA. His main experience is with .NET, C#, MVC, and MSSQL. I am also proficient in PHP, MySQL, and front-end technologies HTML5, jQuery, and LESS/CSS.

Updated on September 16, 2022

Comments

smdrager almost 2 years

Dapper dot net has a buffer parameter (a bool), but as far as I can tell the only thing it does is cast the result to a list before returning it.

As per the documentation:

Dapper's default behavior is to execute your sql and buffer the entire reader on return. This is ideal in most cases as it minimizes shared locks in the db and cuts down on db network time.

However when executing huge queries you may need to minimize memory footprint and only load objects as needed. To do so pass, buffered: false into the Query method.

I'm not sure how casting the result to a list accomplishes this. Am I missing something? My only idea is that it is supposed to set the CommandBehavior for the ExecuteReader to CommandBehavior.SequentialAccess (but it doesn't).
- Groo over 9 years
  
  Duplicate of Explanation of dapper buffer/cache, which has has a slightly longer, detailed answer by @Marc (although, admittedly, was actually asked after this question).
- kristianp over 5 years
  
  @Groo, if the other question was asked after this one, then the other question is the duplicate, not this one.
Rich over 8 years

Chris, when you talk about populating the array or writing to a dictionary, are you talking about how you're handling results that Dapper has already returned? So technically that's profiling your own post-buffered-read code? Or are you talking about processing results from SQL Server at a lower level?.
Chris Marisic over 8 years

@Rich Using foreach(item in unbuffered) dictionary.Add(item.Id, item) was 6x slower than bufferedquery.ToDictionary(x=> x.Id). There is a definitive latency cost for using unbuffered. (I'm reading sets of millions and hundreds of millions, note i also used the properly sized capacity constructor for the dictionary)
Rich over 8 years

So you are indeed talking about what I thought. Awesome. Very useful info, thanks.
frankhommers over 7 years

To elaborate on that: data.ToList() creates an list fully in memory anyway. So if you are using buffered:true, you will probably get the list in memory twice.
jazzcat about 7 years

By data.ToList() I was referencing an actual line from dapper's source code.
Kasey Speakman about 6 years

Dunno what kind of workflow you are doing, but using buffered = true on millions of rows is very careless of resources. Deployed to limited resource machines, it may be very slow (due to paging) or just crash, for row size of typical business data.
Chris Marisic about 6 years

@KaseySpeakman and what if the business case is load a billion records into memory?
Kasey Speakman about 6 years

@ChrisMarisic We use unbuffered for exporting data to CSV from low memory/cpu AWS server nodes. It could be a lot of data (wide rows), but the process still uses little memory. It is a great fit for that.
Chris Marisic about 6 years

@KaseySpeakman isn't there a sql server native way to export data to CSV?
Kasey Speakman about 6 years

@ChrisMarisic Perhaps, but I'm using Postgres and file gets saved directly to S3.
Chris Marisic about 6 years

@KaseySpeakman if you're writing the file to S3 aren't you buffering 100% of everything into memory already? I'm pretty sure S3 doesn't support block writing.
Kasey Speakman about 6 years

@ChrisMarisic We are using S3's Multipart Upload support. Where we upload pieces at a time and S3 combines them upon completion. Suffice it to say there are good use cases for both unbuffered and buffered Dapper queries. We use buffered for most API calls but we also limit result counts to avoid resource problems.
Kasey Speakman about 6 years

@ChrisMarisic Ah, "my way or the highway." No worries. Onlookers can make their own judgements.
Daniel Earwicker over 4 years

Maybe things have changed since this answer was written, but the current Dapper code says: return buffered ? results.ToList() : results; And the results is yield return-ing of rows as they come in from the connection. What's your theory for how d.ToList().ToDictionary() is 6x faster than d.ToDictionary()?
Chris Marisic over 4 years

@DanielEarwicker likely creates lots of intermediary collections over and over again as the unknown length collection is being hydrated. It's also possible that there's alot more context switches as opposed to trying to get the io threads closed as finished asap with ToList.
SpiritBob about 4 years

@ChrisMarisic I'm pretty sure if you have data warehousing, or need to actually return billions of records, multiple times at the same time, your only option is to use unbuffered, unless of course you have 1 TB RAM. Everything exists for a reason, it's not something you just sign-off as being useless, or considered bad practice to be used. Obviously if you can buffer any data to the memory, it will be much faster than not buffering it, but sometimes you simply don't have the resources to do that, or considered foolish to do so. Everything can be abused.
Chris Marisic about 4 years

@SpiritBob for reference this was all measured against reading about 4 billion rows in several minutes, every few hours. And yes it was for a reporting engine to have access to all data in memory. Which if your data isn't large random strings, this is only a dozen or two dozen gigabytes of memory. Not 100 or 1000.