Concept of "block size" in a cache

47,941

Solution 1

I typed up this email for someone to explain caches, but I think you might find it useful as well.

You have 32-bit addresses that can refer to bytes in RAM. You want to be able to cache the data that you access, to use them later.

Let's say you want a 1-MiB (2²⁰ bytes) cache.

What do you do?

You have 2 restrictions you need to meet:

Caching should be as uniform as possible across all addresses. i.e. you don't want to bias toward any particular kind of address.

How do you do this? Use remainder! With mod, you can evenly distribute any integer over whatever range you want.

You want to help minimize bookkeeping costs. That means e.g. if you're caching in blocks of 1 byte, you don't want to store 4 bytes of data just to keep track of where 1 byte belongs to.

How do you do that? You store blocks that are bigger than just 1 byte.

Let's say you choose 16-byte (2⁴-byte) blocks. That means you can cache 2²⁰ / 2⁴ = 2¹⁶ = 65,536 blocks of data.

You now have a few options:

You can design the cache so that data from any memory block could be stored in any of the cache blocks. This would be called a fully-associative cache.

The benefit is that it's the "fairest" kind of cache: all blocks are treated completely equally.

The tradeoff is speed: To find where to put the memory block, you have to search every cache block for a free space. This is really slow.

You can design the cache so that data from any memory block could only be stored in a single cache block. This would be called a direct-mapped cache.

The benefit is that it's the fastest kind of cache: you do only 1 check to see if the item is in the cache or not.

The tradeoff is that, now, if you happen to have a bad memory access pattern, you can have 2 blocks kicking each other out successively, with unused blocks still remaining in the cache.

You can do a mixture of both: map a single memory block into multiple blocks. This is what real processors do -- they have N-way set associative caches.

Direct-mapped cache:

Now you have 65,536 blocks of data, each block being of 16 bytes.
You store it as 65,536 "rows" inside your cache, with each "row" consisting of the data itself, along with the metadata (regarding where the block belongs, whether it's valid, whether it's been written to, etc.).

Question: How does each block in memory get mapped to each block in the cache?

Answer: Well, you're using a direct-mapped cache, using mod. That means addresses 0 to 15 will be mapped to block 0 in the cache; 16-31 get mapped to block 2, etc... and it wraps around as you reach the 1-MiB mark.

So, given memory address M, how do you find the row number N? Easy: N = M % 2²⁰ / 2⁴.
But that only tells you where to store the data, not how to retrieve it. Once you've stored it, and try to access it again, you have to know which 1-MB portion of memory was stored here, right?

So that's one piece of metadata: the tag bits. If it's in row N, all you need to know is what the quotient was, during the mod operation. Which, for a 32-bit address, is 12 bits big (since the remainder is 20 bits).

So your tag becomes 12 bits long -- specifically, the topmost 12 bits of any memory address.
And you already knew that the lowermost 4 bits are used for the offset within a block (since memory is byte-addressed, and a block is 16 bytes).
That leaves 16 bits for the "index" bits of a memory address, which can be used to find which row the address belongs to. (It's just a division + remainder operation, but in binary.)

You also need other bits: e.g. you need to know whether a block is in fact valid or not, because when the CPU is turned on, it contains invalid data. So you add 1 bit of metadata: the Valid bit.

There's other bits you'll learn about, used for optimization, synchronization, etc... but these are the basic ones. :)

Solution 2

I'm assuming you know the basics of tag, index, and offset but here's a short explanation as I have learned in my computer architecture class. Blocks are replaced in 64 byte blocks, so every time a new block is put into cache it replaces all 64 bytes regardless if you only need one byte. That's why when addressing the cache there is an offset that specifies the byte you want to get from the block. Take your example, if only 16 bit integer is being loaded, the cache will search for the block by the index, check the tag to make sure its the right data and then get the byte according to the offset. Now if you load another 16 bit value, lets say with the same index but different tag, it will replace the 64 byte block with the new block and get the info from the specified offset. (assuming direct mapped)

I hope this helps! If you need more info or this is still fuzzy let me know, I know a couple of good sites that do a good job of teaching this.

47,941

Author by

hektor

Updated on July 09, 2022

Comments

hektor almost 2 years

I am just beginning to learn the concept of Direct mapped and Set Associative Caches. I have some very elementary doubts . Here goes.

Supposing addresses are 32 bits long, and i have a 32KB cache with 64Byte block size and 512 frames, how much data is actually stored inside the "block"? If i have an instruction which loads from a value from a memory location and if that value is a 16-bit integer, is it that one of the 64Byte blocks now stores only a 16 bit(2Bytes) integer value. What of the other 62 bytes within the block? If i now have another load instruction which also loads a 16bit integer value, this value now goes into another block of another frame depending on the load address(If the address maps to the same frame of the previous instruction, then the previous value is evicted and the block again stores only 2bytes in 64 bytes). Correct?

Please forgive me if this seems like a very stupid doubt, its just that i want to get my concepts correctly.
hektor over 12 years

yes it helps.. but do please give me the other sites as well for additional reading.. thanks anyway
GSilva over 12 years

@hektor Check out this link, it's really good for memory and different replacement techniques and the one that helped me most. It even has interactive java modules.
Faizan almost 11 years

which is better in terms of efficiency , having a bigger block size or smaller block size?
user541686 almost 11 years

@Faizan: It depends on a lot of factors, including your memory bandwidth. Smaller block sizes give you finer granularity (which is better) at a higher cost ($) and at potentially getting less throughput from the RAM. There isn't a single answer.
Joaquim Ferrer almost 8 years

@Mehrdad When you say that to discover the cache block where the memory block will be mapped you say that is M(memory address) mod S(size of cache) but that way address 0 will be mapped to block 0 and address 1 will be mapped to block 1, etc. Besides There will be in your example 2^20 blocks in cache which is wrong. Or am I missing something?
user541686 almost 8 years

@JoaquimFerrer: Wow, 5 years and I can't believe no one else has pointed out the typo until now. I think I fixed it, could you double-check it to make sure? Thanks for letting me know!
Joaquim Ferrer almost 8 years

@Mehrad i'm new to this but i think you're still wrong because that way the cache will have 2^4 blocks. I think it's M / block size mod number of blocks. M / block will you a block number and the mod puts it between the accepted range.
Joaquim Ferrer almost 8 years

@Mehrad By the way it's the magic of stackoverlfow! Even after 5 years your answer was incredible helpful.
user541686 almost 8 years

@JoaquimFerrer: Haha thanks! But hmm I think this is right? M % 2^20 gives you the index of the byte in the cache. So if you divide that by 2^4, that should give you the index of the block, right? For example if M % 2^20 gives you 17 then that means you need to look at block 17 / 16 = 1. Or am I missing something?
Joaquim Ferrer almost 8 years

@Mehrad Yes you are right! Thank you too for the answer!
nj2237 almost 6 years

Hi @Mehrdad really good explanation, thanks a lot. But in your second point, you say "you don't want to store 4 bytes of data just to keep track of where 1 byte belongs to". Cache just operates on 4 byte addresses right? It only stores data, and not addresses right? And btw, on the RAM, aren't we using 4 bytes of data (32 bit addresses) to store 1 byte of data?
user541686 almost 6 years

@nj2237: The cache has to store the memory address too, otherwise how would it know whether memory contents at a particular address are in the cache? There's not enough cache to map every memory block to a different cache block. Regarding RAM, well yes, but you don't necessarily store those addresses for every single byte of RAM. You only store the addresses you need (e.g. the beginning of a large array).
nj2237 almost 6 years

Oh yeah, that makes sense.. thanks! So that would be the metadata, right? The process is, say, in DMC, when a CPU tries to fetch data from storage, it will take the index part of the memory address (the middle 16 bits) to see which "row" or block it belongs to, and then it compares the tag (the top 12 bits) in the metadata of the data (in cache) to the tag part of the memory address and if they are equal, then it's a cache hit, correct? And good point on the RAM part too, I was not aware. Thanks!
user541686 almost 6 years

@nj2237: Yeah, sounds like you got it.
Kindred over 5 years

Does the "metadata+block" has an official name? My current understand block only means for the data part, and each cache set (the "row" in your answer) may contain many blocks each with its own metadata, but I really want to know how to call the "metadata+block"...
Kindred over 5 years

I heard people saying "cache block" which means the "metadata+__block__", but when talking about "size of a cache block", I heard some other people said it's "size of the block", this is so vague....
ryanwebjackson about 4 years

Does this mean that the size of the tag depends on the value (not size) of the memory address?
user541686 about 4 years

@ryanwebjackson: Not really, although the size of the memory address and its value are related by a logarithmic factor, so I suppose you could say "yes" and still be correct. But I think you're probably confused about something -- which part that you're referring to is "this" specifically?
ryanwebjackson about 4 years

My current understanding is that each cache block is made up of index, tag, and cached data. I'm not sure how to determine the size of the tag. I see you're determining the size in the example above, but I'm not sure how.
user541686 about 4 years

@ryanwebjackson: Oh I see. The bits in the tag are extra bits of information you need to determine where the value came from. Maybe it's easy to see this if you start off assuming the tag didn't exist at all: You have a value in row 5; which memory address did it come from? Well, it could've come from address 5, address 5 + 2^20, address 5 + 2 × 2^20, address 5 + 3 × 2^20, ..., all the way to address 5 + 2^(12-1) × 2^20. The unknown here is the coefficient of 2^20. How many bits do you need to represent that unknown? 12 bits, which correspond to the address's top 12 bits. Does that make sense?
ryanwebjackson about 4 years

Unfortunately not to me, yet. Do you know of a good reference for the formulas? I want to understand, but I don't want to keep commenting here. Thanks.
user541686 about 4 years

@ryanwebjackson: These slides might be helpful? It's hard to say though, I don't personally know of any sources (I just Googled to find this one). You're welcome to keep asking me questions if you'd like though, I don't mind.
ryanwebjackson about 4 years

Thank you. I'll review them and try to come up with a more descriptive question, if I am still not understanding the concepts and formulas.
user541686 about 4 years

@ryanwebjackson: Okay sure!