Why is writing speed for RAM less than reading speed? And how do caches come into the picture?

memory cache bandwidth nehalem

9,287

Solution 1

Memory must store its bits in two states which have a large energy barrier between them, or else the smallest influence would change the bit. But when writing to that memory, we must actively overcome that energy barrier.

Overcoming the energy barrier in RAM requires waiting while energy is moved around. Simply looking to see what the bit is set to takes less time.

For more detail, see MSalters excellent answer to a somewhat similar question.

I'm not certain enough of the details of how caching interacts with RAM to answer that part of the question with any authority, so I'll leave it to someone else.

Solution 2

Write Case: If you have something to write to memory and you have a good memory controller, ignoring all caching, all you have to do is send a transaction to the memory controller with the data you want written. Because of memory ordering rules, as soon as the transaction leaves the core, you can move on to the next instruction because you can assume the hardware is taking care of the write to memory. This means a write takes virtually no time at all.

Read Case: On the other hand, a read is an entirely different operation and is greatly assisted by caching. If you need to read in data, you can't go on to your next step in your program until you've actually got the data in hand. That means you need to check caches first and then memory to see where the data is. Depending on where the data is at, your latency will suffer accordingly. In a non-threading, non-pipelined core, non-prefetching system, you're just burning core cycles waiting on data to come back so you can move on to the next step. Cache and memory is orders of magnitude slower than core speed/register space. This is why reading is so much slower than a write.

Going back to the write transaction, the only issue you may run into with speed is if you're doing reads after a write transaction to the same address. In that case, your architecture needs to ensure that your read doesn't hop over your write. If it does, you'll get the wrong data back. If you have a really smart architecture, as that write is propagating out towards memory, if a read to the same address comes along, the hardware can return the data way before it ever gets out to memory. Even in this case of read-after-write, it's not the write that takes a while from the core's perspective, it's the read.

From a RAM perspective: Even if we're not talking about a core and we're only talking about RAM/memory controller, doing a write to the MC will result in the MC storing it in a buffer and sending a response back stating that the transaction is complete (even though it's not). Using buffers, we don't have to worry about actual DIMM/RAM write speeds because the MC will take care of that. The only exception to this case is when you're doing large blocks of writes and go beyond the capabilities of the MC buffer. In that case, you do have to start worrying about RAM write speed. And that's what the linked article is referring to. Then you have to start worrying about the physical limitations of reading vs writing speeds that David's answer touches on. Usually that's a dumb thing for a core to do anyway; that's why DMA was invented. But that's a whole other topic.

9,287

user2898278

Updated on September 18, 2022

Comments

user2898278 over 1 year
Firstly, this is true, right? I feel that reads will always be faster than writes, also this guy here does some experiments to "prove" it. He doesn't explain why, just mentions "caching issues". (and his experiments don't seem to worry about prefetching)

But I don't understand why. If it matters, let's assume we're talking about the Nehalem architecture (like i7) which has L1, L2 cache for each core and then a shared inclusive L3 cache.

Probably this is because I don't correctly understand how reads and writes work, so I'll write my understanding. Please tell me if something is wrong.
```
If I read some memory, following steps should happen: (assume all cache misses)

    1. Check if already in L1 cache, miss
    2. Check if in L2 cache, miss
    3. Check if in L3 cache, miss
    4. Fetch from memory into (L1?) cache
```
Not sure about last step. Does data percolate down caches, meaning that in case of cache miss memory is read into L3/L2/L1 first and then read from there? Or can it "bypass" all caches and then caching happens in parallel for later. (reading = access all caches + fetch from RAM to cache + read from cache?)
```
Then write:

    1. All caches have to be checked (read) in this case too
    2. If there's a hit, write there and since Nehalem has write through caches, 
write to memory immediately and in parallel
    3. If all caches miss, write to memory directly?
```
Again not sure about last step. Can write be done "bypassing" all caches or writing involves always reading into the cache first, modifying the cached copy and letting the write-through hardware actually write to memory location in RAM? (writing = read all caches + fetch from RAM to cache + write to cache, written to RAM in parallel ==> writing is almost a superset of reading?)
- Ƭᴇcʜιᴇ007 over 10 years
  
  Please don't cross-post between SE sites. Either flag a mod to request and/or wait for a mod to migrate your other question here. If you want it here and not there, since you've already posted both places, please consider going and deleting it from SO.
- Ƭᴇcʜιᴇ007 over 10 years
  
  Reading something is passive, writing (changing) something is active. Activity is almost always harder than passivity. ;)
- Ramhound over 10 years
  
  @user2898278 - Do you have any possible sources more reliable then a random blog?
- M.Bennett over 10 years
  
  You have got something elementary wrong here. Every bit of data is addressed...there's no trickling down cache levels looking for data as if you were guessing.
- user2898278 over 10 years
  
  @techie007, thank you for your response. I'll remove it from SO. As I said, I intuitively understand why "pure write" should be somewhat slower than a "pure read" and had read that electronics thread before posting this, but I want to know about "actual" read and write times including all effects due to caching, not just moving data from RAM to/from cache ("pure" read/write).
- user2898278 over 10 years
  
  @Ramhound, I also tested this with a tool called lmbench. I got write speeds consistently slower than read speeds by about 1.5 times. Although I'm not sure about the accuracy of the program on modern processors. But data is actually all over the internet suggesting that write is much slower, like this. But there is also some data that suggests otherwise, so I don't know what's happening.see
- user2898278 over 10 years
  
  @M.Bennett, Every bit is addressable inside the RAM, of course, but reads and writes are done in multiples of cache line size from the RAM. When you say "every bit is addressed", do you mean the same thing as "bypassing caches" as I wrote in my question? I'm not sure if the CPU can read/write directly from/to the memory in any situation or these operations must go through the cache (in the former case, there would be separate physical connection b/w RAM and processor I think, is that the case?)
- Ramhound over 10 years
  
  Add your data to your question please.
- Ƭᴇcʜιᴇ007 over 10 years
  
  @user2898278 This isn't a discussion forum. If you want to discuss this at length, you'll probably be better off hitting the chat, or an actual discussion forum.
user2898278 over 10 years

Thank you for this. I now better understand why "pure" writes would be slower than pure reads. But how much difference do the electronic factors make? I mean would the difference purely due to electronic factors be around 1.5 times between read and write bandwidth? Any ideas? (by pure, I mean excluding caches)