What's the difference between a word and byte?

assembly terminology cpu-architecture byte word

188,595

Solution 1

Byte: Today, a byte is almost always 8 bit. However, that wasn't always the case and there's no "standard" or something that dictates this. Since 8 bits is a convenient number to work with it became the de facto standard.

Word: The natural size with which a processor is handling data (the register size). The most common word sizes encountered today are 8, 16, 32 and 64 bits, but other sizes are possible. For examples, there were a few 36 bit machines, or even 12 bit machines.

The byte is the smallest addressable unit for a CPU. If you want to set/clear single bits, you first need to fetch the corresponding byte from memory, mess with the bits and then write the byte back to memory.

By contrast, one definition for word is the biggest chunk of bits with which a processor can do processing (like addition and subtraction) at a time – typically the width of an integer register. That definition is a bit fuzzy, as some processors might have different register sizes for different tasks (integer vs. floating point processing for example) or are able to access fractions of a register. The word size is the maximum register size that the majority of operations work with.

There are also a few processors which have a different pointer size: for example, the 8086 is a 16-bit processor which means its registers are 16 bit wide. But its pointers (addresses) are 20 bit wide and were calculated by combining two 16 bit registers in a certain way.

In some manuals and APIs, the term "word" may be "stuck" on a former legacy size and might differ from what's the actual, current word size of a processor when the platform evolved to support larger register sizes. For example, the Intel and AMD x86 manuals still use "word" to mean 16 bits with DWORD (double-word, 32 bit) and QWORD (quad-word, 64 bit) as larger sizes. This is then reflected in some APIs, like Microsoft's WinAPI.

Solution 2

What I don't understand is what's the point of having a byte? Why not say 8 bits?

Apart from the technical point that a byte isn't necessarily 8 bits, the reasons for having a term is simple human nature:

economy of effort (aka laziness) - it is easier to say "byte" rather than "eight bits"
tribalism - groups of people like to use jargon / a private language to set them apart from others.

Just go with the flow. You are not going to change 50+ years of accumulated IT terminology and cultural baggage by complaining about it.

FWIW - the correct term to use when you mean "8 bits independent of the hardware architecture" is "octet".

Solution 3

BYTE

I am trying to answer this question from C++ perspective.

The C++ standard defines ‘byte’ as “Addressable unit of data large enough to hold any member of the basic character set of the execution environment.”

What this means is that the byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits. Hence it is guaranteed that a byte will have at least 8 bits.

In other words, a byte is the amount of memory required to store a single character.

If you want to verify ‘number of bits’ in your C++ implementation, check the file ‘limits.h’. It should have an entry like below.

#define CHAR_BIT      8         /* number of bits in a char */

WORD

A Word is defined as specific number of bits which can be processed together (i.e. in one attempt) by the machine/system. Alternatively, we can say that Word defines the amount of data that can be transferred between CPU and RAM in a single operation.

The hardware registers in a computer machine are word sized. The Word size also defines the largest possible memory address (each memory address points to a byte sized memory).

Note – In C++ programs, the memory addresses points to a byte of memory and not to a word.

Solution 4

Why not say 8 bits?

Because not all machines have 8-bit bytes. Since you tagged this C, look up CHAR_BIT in limits.h.

Solution 5

A word is the size of the registers in the processor. This means processor instructions like, add, mul, etc are on word-sized inputs.

But most modern architectures have memory that is addressable in 8-bit chunks, so it is convenient to use the word "byte".

View more solutions

188,595

Peter Cordes

GNU/Linux hacker and command line junkie. Primary maintainer of the Stackoverflow x86 tag wiki. I like efficient code, and knowing how things really work. I mostly use C/C++ (and perl/bash) on Linux, but I mostly look at assembly-language stuff on SO because it's more interesting to me, and there are fewer people posting good asm answers. (profile pic is from https://xkcd.com/386/, and describes me perfectly. Incomplete answers/explanations make me crazy.)

Updated on May 13, 2022

Comments

Peter Cordes almost 2 years

I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits?

I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word?
- starblue over 12 years
  
  It is best to avoid the term "word" because of its ambiguity. Or make it precise by saying 16-bit word, 32-bit word, ...
- Admin over 12 years
  
  Is it advantageous to have a word be larger or smaller?
- VoidStar over 12 years
  
  @quest4knoledge a larger word allows for larger pointers (a.k.a more RAM), and allows for bigger numbers to be processed quickly. It also may allows for some operations like memset to be faster, by working in larger blocks. However, processors with a larger word require more transistors in the processor and may consume a bit more energy.
- Admin over 12 years
  
  @VoidStar and a larger word would mean smaller address space, or am I confused?
- Fred about 5 years
  
  To answer the question "what is the point of having a byte" - it's history. CPU's did not start out being able to handle anything bigger than a "byte" (earlier processors handled only nybbles (4 bits) but the term never really caught on). The first cpu of any note was the Intel 8086/8088. It was designed to deal with instructions built around "bytes", this is also why we still refer to memory in terms of xBytes e.g. GigaBytes because the basic unit of addressable memory was the byte. 'K is a reference to KiloBytes of which the first PC's had 16, expandable to 64 - woo hoo!
cnicutar over 12 years

A group of 8 bits is called an octet.
tolitius over 12 years

correct: The term octet was defined to explicitly denote a sequence of 8 bits because of the ambiguity associated with the term byte. But I like the sound of byte better :)
Admin over 12 years

So in a sense the term "byte" is just used for convenience?
Joachim Sauer over 12 years

@tolitius: +1 for "But I like the sound of byte better": I strongly suspect you're not alone in this and safe for a few niche systems, the "confusion" of a byte possibly being a size other than 8-bit is no longer relevant these days.
VoidStar over 12 years

Yes, "byte" was especially convenient when the term was invented. Like many conventions, once they set in they persist. I'm not sure if byte-based terminology really makes computers any easier to understand in the big picture anymore, but it's the dominant convention and isn't like to change any time soon.
Alexey Frunze over 12 years

Yep. The minimum addressable unit of memory on TMS320C54xx (one of Texas Instruments' DSPs) is 16-bit long, which is also the smallest size of its general-purpose registers. And the TI C compiler defines char=short=int=16 bits on it.
Ross Patterson over 12 years

That's entirely dependent on the CPU type. As you point out, on 32-bit non-IA32 machines, a "word" is typically 32 bites.
Ross Patterson over 12 years

Excellent answer. I'd only quibble with "[t]he word by contrast is biggest chunk of bits with which a processor can do processing ... at a time". It is in fact the most-common chunk of bits etc. Lots of architectures that have evolved over time have a word size that isn't their widest, but they are often limited in what they can do with their widest values.
Ross Patterson over 12 years

For extra credit, a "nibble" is a common term for half a byte. It arose during the early microcomputer CPU era (e.g., the Intel 8080), and was always understood to be 4 bits, because by then the byte had settled down to 8 bits.
starblue over 12 years

Byte is the term used for a unit that was used as a character in text. Historically there were byte with sizes from 6 to 9 bits.
Admin over 12 years

@starblue how is it possible that a char takes up less room than a word?
VoidStar over 12 years

@ quest4knoledge: because memory is stored in smaller chunks that words. A word is 32bits (or 64bits on newer machines). In an algorithm that processes individual chars 1-by-1, they DO take up a whole word only when inside the CPU, and when placed back in RAM, they are packed more tightly.
Engineer over 12 years

@RossPatterson That's entirely dependent on whether you're developing software or eating dinner.
Abdelouahab Pp about 11 years

i thought the octet was just the french translation of the byte, thank you ;)
DarkDust over 10 years

Nope, these sizes are only valid on a 16-bit machine. You're probably used to Windows programming which still uses these macros as it's a legacy from its 16-bit days and MS hasn't bothered to correct this.
DarkDust over 10 years

BTW, because the size of a word (and really even a byte) can vary, ISO-C has the int<X>_t and uint<X>_t types (plus more) which should be used if you want a variable/parameter of a specific bit size.
johnfound over 10 years

@DarkDust we are talking about assembly language here. C standards are not relevant. BTW, I am programming assembly from 1980 and the same names was in use. (well, maybe except qword)
DarkDust over 10 years

However, I did find an exception: in GNU as, the .word may be 32 bits (for example for Sparc).
johnfound over 10 years

Sorry, AS is not an assembler. It is an ugly, cripple, miserable, mutant, created with the only goal to be a back end for the HLL compilers.
Antonio Rizzo over 9 years

Today a 8-bit byte is a standard; see IEC 80000-13:2008.
Puck over 8 years

Is there such a name for a 2-Word length memory?
barlop over 8 years

to dark and @RossPatterson So if we use the old definition of byte, how big is it on an 8086? 8088? and suppose there a particular cpu that is only 1 byte long and would byte be the size of that one CPU?
DarkDust over 8 years

@barlop: Which "old definition of byte" do you mean? On both 8086 and 8088 the byte is 8 bit wide (as was already the de facto standard back then) and the word size is 16 bit. The major difference between the two is that the data bus is 16 bit wide for the 8086 and 8 bit wide for the 8088. A CPU where most registers are 8-bit wide do have a word-size of 8-bit (or 1 byte). Famous examples would be the Intel 8080 or Z80 (even though both have limited support for 16 bit operations; these work by combining two 8-bit registers).
barlop over 8 years

Unusual yes(we know that. An example is, The texas instruments c54x Google texas instruments c54x byte. ti.com/lit/ug/spru393/spru393.pdf "The ’C55x instructions are variable byte lengths ranging in size from 8 bits to 48 bits." stackoverflow.com/questions/2098149/…
barlop over 8 years

@DarkDust i'm not totally clear on what the old definition of byte was, but it meant that it isn't necessarily 8 bits. The 8086 and 8088 both have 16bit word size, so 16 bit general purpose registers. Memory locations no doubt 8 bits. Given that, Is the old definition of byte, that it meant "memory location size" or, more accurately, and for the rare case of variable length memory locations, the smallest possible memory location size?
DarkDust over 8 years

@barlop: There is no "old" definition: since its introduction it's simply the smallest unit in a computer architecture, apart from the bit. Most often, that means the smallest addressable unit (or memory location size).
barlop over 8 years

@DarkDust when you say memory location, are you considering cpu registers to be memory locations? so if a cpu register was smaller than a ram-memory location, then would you say byte was the size of that small addressable cpu register?
DarkDust over 8 years

@barlop: No, a register is not a memory location. And a CPU with a register size smaller than the smallest addressable memory size doesn't make sense, does it? I'm afraid this is degenerating into a chat, so if things are still unclear, please post a new question instead.
DarkDust over 7 years

@DebanjanDhar: Yes, they're unrelated. The only relation is that a page is (AFAIK) always a multiple of the word size.
Stephen C over 6 years

It doesn't come from there at all. The term was actually coined by W. Buchholtz at IBM in the late 1950's. Source: bobbemer.com/BYTE.HTM. According to Bob Bemer, the spelling "byte" was chosen in preference to "bite" to avoid confusion (with "bit") due to typos. He would know. He was there!
Stephen C over 6 years

(Only 30 years? You are a mere whipper-snapper. I learned to program on systems where the natural "byte" size was not 8 bits :-) )
Peter Cordes about 6 years

No, in CPUs with 32-bit words and 8-bit bytes (e.g. MIPS or ARM), half a word is 2 bytes.
Peter Cordes about 6 years

No, most RISC machines have 32-bit words, but can address single bytes. On MIPS for example, word definitely means 32 bits, but there's an lb (load byte) instruction which loads 8 bits.
Peter Cordes about 6 years

x86 (as usual) makes things complicated: In Intel terminology, a word is 16 bits, even on modern x86 CPUs where the default operand size is 32 bits (dword), and the integer register width is 64 bits (qword). And xmm registers are 128-bits wide (movdqa move double-quad). The memory bus is at least 64 bits wide (and transfers in bursts of 64 bytes = a cache line), and execution-unit to cache paths are at least 128 bits wide, or 256 or even 512 bits wide. Whatever the native machine-word size of modern x86 is, it's not 16 bits, but modern x86 still uses 8086 terminology.
Peter Cordes about 6 years

ARM / MIPS / other mainstream RISC architectures have 32-bit words. It's the register width (on the 32-bit version of those ISAs) and the instruction width. 16 bits is a half-word, thus ARM instructions like ldrh to load 16 bits and zero-extend it into a 32-bit register. Or ldrsh to load and sign-extend 16 bits.
Peter Cordes over 5 years

@Puck: double-word is the standard term. e.g. DWCAS (double-word compare-and-swap) in architecture-neutral lockless programming terminology. And most architectures that have been extended at least once have "double-word" versions of instructions. e.g. MIPS64 daddu (double-word add unsigned). On x86, a dword is 32 bits, half of the max register width, because x86 started with 16-bit 8086 and now has 64-bit wide quad-word registers and memory operand-sizes mov qword [rdi], 12345. (An x86 qword is the same width as a MIPS / PowerPC doubleword, because of terminology, see prev comment).
Crystina over 3 years

larggest chunk of bits with which a processor can do processing -> that mean word == the size of register?
DarkDust over 3 years

@Crystina: Yes, more specifically: it's usually the size of the general purpose registers (registers for floating point may have a different size, for example).
Peter Cordes almost 2 years

Re: my earlier comments. Another way to look at it is that "word size" isn't a useful term for modern x86. It implies that a bunch of things all care about the same specific size, but there isn't a single natural "machine word" on x86-64. The register width is also (usually) the pointer width, but that's about all. (64-bit RISC ISAs also usually keep their terminology of a word being 32-bit, and still the instruction size, so e.g. MIPS has ld (load double-word) and dsll to double-word left-shift.)
Peter Cordes almost 2 years

I made some edits to this, trying to preserve the simplistic definitions of the original (register width vs. "most common" operation size) while adding some caveats / context. (And correcting mistakes like saying that FP registers might have a different word size, rather than saying they might be multiple words). Definitely some of the origin of the term "word" comes from machine-code instruction formats which on classic RISCs are also single words. Some ISAs are more heavily word-oriented than others, e.g. x86 isn't, with byte-stream machine code and supporting any power of 2 operand-size.
DarkDust almost 2 years

@PeterCordes: Thanks, but I feel it added too many (unnecessary) details which made the answer harder to understand so I rolled it back. But you had a good point about the "word" in manuals/APIs and I've added a brief addendum about that. If you still feel that the answer in its current form is wrong or missing an important clarification, please feel free to edit again.
Peter Cordes almost 2 years

Like I said in the changelog and comment, "different word sizes for different tasks (integer vs. floating point processing for example)" is just plain wrong. I don't think that's standard terminology at all. (Although it would arguably apply with your made-up definition of "word size" which doesn't match normal usage that I'm familiar with.) Many of the other changes you rolled back were important, not just the one you added a footnote about. It's going to be hard for me to edit your answer in a way you like if we seriously disagree about what "word" means, or that it's always definable.
Peter Cordes almost 2 years

Also I find it weird that you wrote "especially Microsoft" when the question is tagged [assembly] and [cpu-architecture]. (And formerly [hardware] before I retagged to [word] and [byte].) Intel's and AMD's manuals use those terms to document the assembly language and architecture. Microsoft does actually use those as type names in their WinAPI C headers, but that's a downstream consequence of them originally caring about the API from an assembly-language perspective back in the day, I assume (early Windows 1.0).
DarkDust almost 2 years

@PeterCordes: I guess I understand where some of the confusion came from: I used "word" in a context that should've read "register" and tried to further improve the answer using some of the infos you've provided before. I agree that "word" is not always well-definable, and it's sometimes even just a marketing term. My goal here is to give a useful definition without getting lost in too many details. Since you know a lot more about x86 details in particular (haven't done assembly-level programming in over a decade) and other interesting stuff, why not add another answer?