Why is x86 ugly? Why is it considered inferior when compared to others?

assembly x86 mips x86-64 cpu-architecture

36,760

Solution 1

Couple of possible reasons for it:

x86 is a relatively old ISA (its progenitors were 8086s, after all)
x86 has evolved significantly several times, but hardware is required to maintain backwards compatibility with old binaries. For example, modern x86 hardware still contains support for running 16 bit code natively. Additionally, several memory-addressing models exist to allow older code to inter-operate on the same processor, such as real mode, protected mode, virtual 8086 mode, and (amd64) long mode. This can be confusing to some.
x86 is a CISC machine. For a long time this meant it was slower than RISC machines like MIPS or ARM, because instructions have data interdependency and flags making most forms of instruction level parallelism difficult to implement. Modern implementations translate the x86 instructions into RISC-like instructions called "micro-ops" under the covers to make these kinds of optimizations practical to implement in hardware.
In some respects, the x86 isn't inferior, it's just different. For example, input/output is handled as memory mapping on the vast majority of architectures, but not on the x86. (NB: Modern x86 machines typically have some form of DMA support, and communicate with other hardware through memory mapping; but the ISA still has I/O instructions like IN and OUT)
The x86 ISA has a very few architectural registers, which can force programs to round-trip through memory more frequently than would otherwise be necessary. The extra instructions needed to do this take execution resources that could be spent on useful work, although efficient store-forwarding keeps the latency low. Modern implementations with register renaming onto a large physical register file can keep many instructions in flight, but lack of architectural registers was still a significant weakness for 32-bit x86. x86-64's increase from 8 to 16 integer and vector registers is one of the biggest factors in 64bit code being faster than 32-bit (along with the more efficient register-call ABI), not the increased width of each register. A further increase from 16 to 32 integer registers would help some, but not as much. (AVX512 does increase to 32 vector registers, though, because floating-point code has higher latency and often needs more constants.) (see comment)
x86 assembly code is complicated because x86 is a complicated architecture with many features. An instruction listing for a typical MIPS machine fits on a single letter sized piece of paper. The equivalent listing for x86 fills several pages, and the instructions just do more, so you often need a bigger explanation of what they do than a listing can provide. For example, the MOVSB instruction needs a relatively large block of C code to describe what it does:
```
if (DF==0) 
  *(byte*)DI++ = *(byte*)SI++; 
else 
  *(byte*)DI-- = *(byte*)SI--;
```
That's a single instruction doing a load, a store, and two adds or subtracts (controlled by a flag input), each of which would be separate instructions on a RISC machine.

While MIPS (and similar architectures) simplicity doesn't necessarily make them superior, for teaching an introduction to assembler class it makes sense to start with a simpler ISA. Some assembly classes teach an ultra-simplified subset of x86 called y86, which is simplified beyond the point of not being useful for real use (e.g. no shift instructions), or some teach just the basic x86 instructions.
The x86 uses variable-length opcodes, which add hardware complexity with respect to the parsing of instructions. In the modern era this cost is becoming vanishingly small as CPUs become more and more limited by memory bandwidth than by raw computation, but many "x86 bashing" articles and attitudes come from an era when this cost was comparatively much larger.
Update 2016: Anandtech has posted a discussion regarding opcode sizes under x64 and AArch64.

EDIT: This is not supposed to be a bash the x86! party. I had little choice but to do some amount of bashing given the way the question's worded. But with the exception of (1), all these things were done for good reasons (see comments). Intel designers aren't stupid -- they wanted to achieve some things with their architecture, and these are some of the taxes they had to pay to make those things a reality.

Solution 2

The main knock against x86 in my mind is its CISC origins - the instruction set contains a lot of implicit interdependencies. These interdependencies make it difficult to do things like instruction reordering on the chip, because the artifacts and semantics of those interdependencies must be preserved for each instruction.

For example, most x86 integer add & subtract instructions modify the flags register. After performing an add or subtract, the next operation is often to look at the flags register to check for overflow, sign bit, etc. If there's another add after that, it's very difficult to tell whether it's safe to begin execution of the 2nd add before the outcome of the 1st add is known.

On a RISC architecture, the add instruction would specify the input operands and the output register(s), and everything about the operation would take place using only those registers. This makes it much easier to decouple add operations that are near each other because there's no bloomin' flags register forcing everything to line up and execute single file.

The DEC Alpha AXP chip, a MIPS style RISC design, was painfully spartan in the instructions available, but the instruction set was designed to avoid inter-instruction implicit register dependencies. There was no hardware-defined stack register. There was no hardware-defined flags register. Even the instruction pointer was OS defined - if you wanted to return to the caller, you had to work out how the caller was going to let you know what address to return to. This was usually defined by the OS calling convention. On the x86, though, it's defined by the chip hardware.

Anyway, over 3 or 4 generations of Alpha AXP chip designs, the hardware went from being a literal implementation of the spartan instruction set with 32 int registers and 32 float registers to a massively out of order execution engine with 80 internal registers, register renaming, result forwarding (where the result of a previous instruction is forwarded to a later instruction that is dependent on the value) and all sorts of wild and crazy performance boosters. And with all of those bells and whistles, the AXP chip die was still considerably smaller than the comparable Pentium chip die of that time, and the AXP was a hell of a lot faster.

You don't see those kinds of bursts of performance boosting things in the x86 family tree largely because the x86 instruction set's complexity makes many kinds of execution optimizations prohibitively expensive if not impossible. Intel's stroke of genius was in giving up on implementing the x86 instruction set in hardware anymore - all modern x86 chips are actually RISC cores that to a certain degree interpret the x86 instructions, translating them into internal microcode which preserves all the semantics of the original x86 instruction, but allows for a little bit of that RISC out-of-order and other optimizations over the microcode.

I've written a lot of x86 assembler and can fully appreciate the convenience of its CISC roots. But I didn't fully appreciate just how complicated x86 was until I spent some time writing Alpha AXP assembler. I was gobsmacked by AXP's simplicity and uniformity. The differences are enormous, and profound.

Solution 3

The x86 architecture dates from the design of the 8008 microprocessor and relatives. These CPUs were designed in a time when memory was slow and if you could do it on the CPU die, it was often a lot faster. However, CPU die-space was also expensive. These two reasons are why there are only a small number of registers that tend to have special purposes, and a complicated instruction set with all sorts of gotchas and limitations.

Other processors from the same era (e.g. the 6502 family) also have similar limitations and quirks. Interestingly, both the 8008 series and the 6502 series were intended as embedded controllers. Even back then, embedded controllers were expected to be programmed in assembler and in many ways catered to the assembly programmer rather than the compiler writer. (Look at the VAX chip for what happens when you cater to the compiler write.) The designers didn't expect them to become general purpose computing platforms; that's what things like the predecessors of the POWER archicture were for. The Home Computer revolution changed that, of course.

Solution 4

I think this question has a false assumption. It's mainly just RISC-obsessed academics who call x86 ugly. In reality, the x86 ISA can do in a single instruction operations which would take 5-6 instructions on RISC ISAs. RISC fans may counter that modern x86 CPUs break these "complex" instructions down into microops; however:

In many cases that's only partially true or not true at all. The most useful "complex" instructions in x86 are things like mov %eax, 0x1c(%esp,%edi,4) i.e. addressing modes, and these are not broken down.
What's often more important on modern machines is not the number of cycles spent (because most tasks are not cpu-bound) but the instruction cache impact of code. 5-6 fixed-size (usually 32bit) instructions will impact the cache a lot more than one complex instruction that's rarely more than 5 bytes.

x86 really absorbed all the good aspects of RISC about 10-15 years ago, and the remaining qualities of RISC (actually the defining one - the minimal instruction set) are harmful and undesirable.

Aside from the cost and complexity of manufacturing CPUs and their energy requirements, x86 is the best ISA. Anyone who tells you otherwise is letting ideology or agenda get in the way of their reasoning.

On the other hand, if you are targetting embedded devices where the cost of the CPU counts, or embedded/mobile devices where energy consumption is a top concern, ARM or MIPS probably makes more sense. Keep in mind though you'll still have to deal with the extra ram and binary size needed to handle code that's easily 3-4 times larger, and you won't be able to get near the performance. Whether this matters depends a lot on what you'll be running on it.

Solution 5

I have a few additional aspects here:

Consider the operation "a=b/c" x86 would implement this as

  mov eax,b
  xor edx,edx
  div dword ptr c
  mov a,eax

As an additional bonus of the div instruction edx will contain the remainder.

A RISC processor would require first loading the addresses of b and c, loading b and c from memory to registers, doing the division and loading the address of a and then storing the result. Dst,src syntax:

  mov r5,addr b
  mov r5,[r5]
  mov r6,addr c
  mov r6,[r6]
  div r7,r5,r6
  mov r5,addr a
  mov [r5],r7

Here there typically won't be a remainder.

If any variables are to be loaded through pointers both sequences may become longer though this is less of a possibility for the RISC because it may have one or more pointers already loaded in another register. x86 has fewer register so the likelihood of the pointer being in one of them is smaller.

Pros and cons:

The RISC instructionss may be mixed with surrounding code to improve instruction scheduling, this is less of a possibility with x86 which instead does this work (more or less well depending on the sequence) inside the CPU itself. The RISC sequence above will typically be 28 bytes long (7 instructions of 32-bit/4 byte width each) on a 32-bit architecture. This will cause the off-chip memory to work more when fetching the instructions (seven fetches). The denser x86 sequence contains fewer instructions and though their widths vary you're probably looking at an average of 4 bytes/instruction there too. Even if you have instruction caches to speed this up seven fetches means that you will have a deficit of three elsewhere to make up for compared to the x86.

The x86 architecture with fewer registers to save/restore means that it will probably do thread switches and handle interrupts faster than RISC. More registers to save and restore requires more temporary RAM stack space to do interrupts and more permanent stack space to store thread states. These aspects should make x86 a better candidate for running pure RTOS.

On a more personal note I find it more difficult to write RISC assembly than x86. I solve this by writing the RISC routine in C, compiling and modifying the generated code. This is more efficient from a code production standpoint and probably less efficient from an execution standpoint. All those 32 registers to keep track of. With x86 it is the other way around: 6-8 registers with "real" names makes the problem more manageable and instills more confidence that the code produced will work as expected.

Ugly? That's in the eye of the beholder. I prefer "different."

View more solutions

36,760

claws

Updated on July 08, 2022

Comments

claws almost 2 years
I've been reading some SO archives and encountered statements against the x86 architecture.
- Why do we need different CPU architecture for server & mini/mainframe & mixed-core? says
  "PC architecture is a mess, any OS developer would tell you that."
- ~~Is learning Assembly Language worth the effort?~~ (archived) says
  "Realize that the x86 architecture is horrible at best"
- Any easy way to learn x86 assembler? says
  "Most colleges teach assembly on something like MIPS because it's much simpler to understand, x86 assembly is really ugly"
and many more comments like
I tried searching but didn't find any reasons. I don't find x86 bad probably because this is the only architecture I'm familiar with.

Can someone kindly give me reasons for considering x86 ugly/bad/inferior compared to others.
- dmckee --- ex-moderator kitten about 14 years
  
  I'm going with S&A on the basis of the answers so far, but I'll note in passing that CISC isn't a problem for the m68k instruction set. x86 is what it is, and you can keep it.
- claws about 14 years
  
  what is "S&A"? " CISC isn't a problem for the m68k instruction set." -- Why not?
- dmckee --- ex-moderator kitten about 14 years
  
  The motorala 68000 series chips have a highly CISC architecture, but they have a uniform, fairly orthogonal, and very easy instruction set. Why the difference from x86? I don't know. But take note that there is a big difference between complexity in the chip and complexity in the instruction set (i.e. in the interface that an assembly programmer sees).
- Juliet about 14 years
  
  +1: for a provocative title, but with an actual interesting question :)
- Turing Complete almost 14 years
  
  +1 for a very interesting question.
- v.oddou about 10 years
  
  Its because it is simply common knowledge, you don't need proof or articles about it. (lol)
- Admin over 9 years
  
  Recent study on energy efficiency of different processors found here, with a good discussion of what drove CISC & RISC designs. extremetech.com/extreme/…
- ebrohman over 8 years
  
  Your second link is broken
Joey Adams about 14 years

I see variable-length opcodes are a source of strength, as x86 machine code tends to take less space than PowerPC code, for instance. I may be wrong.
Billy ONeal about 14 years

It's a tradeoff. It's a strength in that the binary size might be smaller, but it's a weakness in that you need to have very complicated hardware to implement a parser for these instructions. The vast majority of instructions are the same size anyway -- most of the reason for variable length opcodes on x86 is for when they decided to add features and found they couldn't represent what they wanted in the number of bits they had to work with. The vast majority of people aren't concerned with binary size nearly as much as hardware complexity or power consumption.
Billy ONeal about 14 years

@Joey Adams: Contrast the x86's variable length instructions with the ARM's Thumb Mode ( en.wikipedia.org/wiki/ARM_architecture#Thumb ). Thumb Mode results in significantly smaller object code for the ARM because the shorter instructions map directly to normal instructions. But since there is a 1:1 mapping between the larger instructions and the smaller ones, the parsing hardware is simple to implement. The x86's variable length instructions don't have these benefits because they weren't designed that way in the first place.
Chris K about 14 years

(7) The x86 uses variable length opcodes, which add hardware complexity with respect to the parsing of instructions - is more of a problem for compiler writers and those writing self-modifying code. The hardware doesn't give a shit.
Chris K about 14 years

(6) Not every op-code needs to be used by every program, but dammit, when I need SSE3, I'm glad I have it.
Billy ONeal about 14 years

@claws: It's not entirely meant to be read that way -- most everything I've listed above (except for #1) are tradeoffs. For example, #2 and #7 are the way they are to maintain backward compatibility with existing code. That's a definite +. #3 is just different, no win or loss. #4 means that for somebody who knows what they're doing, writing hand coded assembly can be significantly easier. #5 is a consequence of being a CISC, and doesn't matter much in practice nowadays. #6 means that the processor does more work for the assembly programmer -- again, a trade off. Can't have everything.
Dietrich Epp about 14 years

RISC was not designed with human developers in mind. One of the ideas behind RISC was to offload some of the complexity of the chip onto whoever wrote the assembly, ideally the compiler. More registers meant less memory usage and fewer dependencies between instructions, allowing deeper pipelines and higher performance. Note that x86-64 has twice as many general registers as x86, and this alone is responsible for significant performance gains. And instructions on most x86 chips are decoded before they are cached, not after (so size doesn't matter here).
Billy ONeal about 14 years

@Chris Kaminski: How does that not affect the hardware? Sure, on a modern full sized computer nobody's going to care, but if I'm making something like a cell phone, I care more about power consumption than almost anything else. The variable length opcodes don't increase execution time but the decode hardware still requires power to operate.
Billy ONeal about 14 years

@Dietrich Epp: That's not entirely true. The x86-64 does have more registers visible in the ISA, but modern x86 implementations usually have a RISC style register file which is mapped to the ISA's registers on demand to speed up execution.
Dietrich Epp about 14 years

I believe the "enter" versus "push/mov" issue arose because on some processors, "push/mov" is faster. On some processors, "enter" is faster. C’est la vie.
Blessed Geek about 14 years

I've never had the chance to design ICs professionally, but had the opportunity to concoct routines to ensure processors do work. I have yet to understand how much more hardware complexity variable-length instruction-sets add up. I always thought the complexity was more significant on the designer's task and the amount of cell modularity you have to forego. I've never thought such complexity added much to how much pain a processor would suffer.
claws about 14 years

"I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware." -- I didn't get this and the CUDA's PTX part.
Chris K about 14 years

@Billy ONeal: I'm not saying it can't be better - just that the people bitching about how horrid the x86 is are compiler writers. The ones (I think) with a legitimate beef were the OS authors in the early 90's and the whole Protected Mode (and then PAE) mess. I'm glad AMD finally forced Intel to see the 64-bit light.
dmckee --- ex-moderator kitten about 14 years

I'll listen to no bashing of CISC per se unless and until you can explain m68k.
claws about 14 years

@dmckee : I'm the OP. I don't know anything about m68k But can you explain why doesn't these things hold for m68k?
Billy ONeal about 14 years

@Chris: Agreed. I'm not trying to bash up on the x86. Just citing some of the things people typically bash it for.
Chris K about 14 years

@Billy ONeal: And you missed the biggest one of all - All the damn memory models!
Chris K about 14 years

(3) Doesn't DMA change this (as well as introduce it's own share of NEW headaches)?
Billy ONeal about 14 years

@Chris: Regarding memory models, that's part of what I meant by (2). The memory models were not designed that way in the first place; they are a result of changing significant things (like native word size) and a desire to maintain backwards compatibility with code using the old model. Regarding DMA, yes, that changes things (like it changes things for every CPU), but that's the same for RISCs as well. Not all hardware (i.e. the keyboard) uses DMA though, so the x86's I/O differences are still alive and well.
dthorpe about 14 years

I'm not familiar with the m68k, so I can't critique it.
Billy ONeal about 14 years

I don't think this answer is bad enough to downvote, but I do think the whole "RISC is smaller and faster than CISC" argument isn't really relevant in the modern era. Sure, the AXP might have been a hell of a lot faster for it's time, but the fact of the matter is that modern RISCs and modern CISCs are about the same when it comes to performance. As I said in my answer, the slight power penalty for x86 decode is a reason not to use x86 for something like a mobile phone, but that's little argument for a full sized desktop or notebook.
Billy ONeal about 14 years

+1 for the only answer here from someone who actually seems to have historical background on the issue.
Billy ONeal about 14 years

Hmm.. when I think of x86 competitors, I don't think of MIPS. ARM or PowerPC maybe....
Nathan Fellman about 14 years

@Dietrech Epp: "And instructions on most x86 chips are decoded before they are cached, not after" That's not true. They are cached before they are decoded. I believe the Pentium 4 had an additional trace cache that cached after decode, but that's been discontinued.
dthorpe about 14 years

@Billy: size is more than just code size or instruction size. Intel pays quite a penalty in chip surface area to implement the hardware logic for all those special instructions, RISC microcode core under the hood or not. Size of the die directly impacts cost to manufacture, so it's still a valid concern with modern system designs.
Billy ONeal about 14 years

@dthorpe: Really? I'd like to see that statement backed up with some actual data regarding the die area spent on x86 decode. Otherwise I have no choice but to discount it as FUD. Looking at recent Intel chips, over half the chip is cache. Somehow I don't think x86 decode is a significant portion of die area.
Shannon Severance almost 14 years

@Billy: x86 has been around near forever. At one time MIPS was an x86 competitor. As I remember x86 had its work cut out to get to a level where it was competitive with MIPS. (Back when MIPS and SPARC were fighting it out in the workstation arena.)
Billy ONeal almost 14 years

@Shannon Severance: Just because something once was does not mean something that is.
ShinTakezou almost 14 years

breaking compatibility is sometimes the only way towards real enhancements and improvements; the only reason not to do that is something I can call to be brief "marketing", and often it is not a good thing (growing the pocket of some people apart, of course, but not to make hardware better for real)
ShinTakezou almost 14 years

there are other cisc processors, coming out from the 8 bit era (m68k can be considered the descending of 6800; z8000 or alike from z80...) that evolved into "better" cisc, so it's not a good excuse. Extinction is the only path to real evolution, and trying to be backward compatible is a defect and limitation, not a feature. The Home Computer status is late if you think about Home Computer revolution promises. And I believe part of the guiltiness is for the "backward compatibility" issue, which is about marketing, not technology.
ShinTakezou almost 14 years

When I was forced to a x86 based machine and started to take a look at it (having m68k background),I started to feel asm programming frustrating, ... like if I've learned programming with a language like C, and then be forced to get in touch with asm... you "feel" you lose power of expression, ease, clarity, "coherence", "intuitionability".I am sure that if I would have started asm programming with x86,I would have thought it is not so bad...maybe... I did also MMIX and MIPS, and their "asm lang" is far better than x86 (if this is the right PoV for the Q, but maybe it is not)
ShinTakezou almost 14 years

m68k is a lot funnier, and nice to humans far more than x86 (which can't seem so "human" to many m68k programmers), if the right PoV is the way human can write code in those assembly.
Billy ONeal almost 14 years

@ShinTakezou: Frankly, I agree with you. However, x86 is one case where I'm quite happy they left things as they are. I like the ability to run software that came out 15 years ago without modification and without keeping old hardware around. The amount of software written for x86 is vast enough to make a backwards-incompatible change a problem.
Deva almost 14 years

Yes, I didn't mention Marketing. This was a hugely powerful force in the lifeline of the x86 architecture and I don't know why I missed it.
Deva almost 14 years

I'd like to add that RISC-like orthogonality in a CISC-like design is possible: Texas Instruments did it decades ago with their 990 architecture. The 9900 microprocessor was a wonderfully clean design. I've written assembly for Z80, 8086/88, 6502 and 9900 chips and the 9900 is far-and-away the best design IMO. However, the x86 was where I was getting paid.
ShinTakezou almost 14 years

@Billy luckly some evolution exists and hardware emulation can be done in software (with the help of some feature of modern processors too, thinking of "virtualization"), so 15-yo software can run, and sometimes faster than on the original hardware; I think that having cheap memories, powerful processors (still more powerful breaking backward compatibilities), innovative hardware design (more asynchronous architectures?), we can run the old sw without modification, on a "hosted virtual hardware" that has even better performance of the real old one.
Billy ONeal almost 14 years

@ShinTakezou: Good for you. Problem is that the proposed hardware does not exist. The only competing architectures with x86 nowadays are RISC architectures, and none offers better practical performance than x86 does anymore. If you want innovative hardware design, you should be looking at GPU computing. No general purpose computer offers better performance than existing architectures.
ShinTakezou almost 14 years

@Billy ONeal,we don't need async archs in particular:current"bad" hw is able to run emulations smoothly enough!Why they insist on"compatibility"very few persons(or none)are interested in?I am not in the debate x86(CISC?)vs RISC;I am in the debate that more promising and "better"processors/arch existed and could exist(Intel or whatever,it is not important for the final user,but unluckly it is for intel,that "drives"the market,and I suspect that,as often happens,slowing down innovation is a way of maximizing profits,thus the promises of '70s magazines"strangely"were betrayed).
Billy ONeal almost 14 years

@ShinTakezou: Your assertion that current hardware is "bad" is simply incorrect. If Intel hardware is so bad, then I ask why no other architecture offers significantly better performance. ARM and friends certainly have an incentive to produce such a platform, and they produce as many if not more chips than Intel, for devices such as the iPod, iPhone, and iPad, and other types of embedded devices, such as television sets.
ShinTakezou almost 14 years

I put " around bad purposely to avoid such a unsenseful talk;but I can also state it is bad, I've heard a lot of technical people saying that and trying to explain technical points I am not able to reproduce or fully understand,but I am more interested in logic in this case(to be "precise",we should enlarge the touched topics):the fact that no other archs compete on the consumer market in"performance",can't be used as an argument to prove "PC" current hardware is not bad.(Note:I stress the usage of generic "hw"/arch against "x86 arch" where people may think I talk about x86 internals)
Billy ONeal almost 14 years

@ShinTakezou: What specifically is bad? What architecture specifically exists that does not have the same "badness"?
ShinTakezou almost 14 years

once there were horses and someone talked about moving machines,people asked what's wrong with horses,until prototypes started to become widespread;other hardware canbe reality,simply there is no enough"enterprise"to make them widespread(at consumer level),so as already said,the current hw is decided by market(and its saturation to maximize profits),not by the possibility of known technology(molding in RD-labs,I think).From a"historical"PoV,current pc hardware is basically the same ibm pc('80s):faster clocks,larger buses etc. are not innovations(and real innovations are very few,if any)
Billy ONeal almost 14 years

@ShinTakezou: When something does not change, that generally indicates that something was done correctly in the first place. If you can't point to a case where such a change resulted in increased performance, then your argument has no leg to stand on. More to the point, there have been considerable changes in PC operation. Look at GPU computing -- that's a significant model departure. Look at recent chips from Intel and AMD: They don't have the traditional Northbridge/Southbridge pair that was standard for many years. What sort of innovation are you thinking of?
ShinTakezou almost 14 years

Logic again;"generally"means nothing;10millions persons saying a lie don't make it a truth.No need 4more legs than you,since ur only arg is"it works"/"it is so",basically.GPU usage is a false innovation,the idea of using coproc. is old(&already exploited before),it will go further,still it comes from the mists of'70s/'80s.If u reread my speech,you notice I am saying that"innovation"is always(X years)late,and not for technical reasons;so no surprise things change and become better a bit..just not so fast as they can.I cited async hw;very old idea,but only recently reconsidere,since>>>>
ShinTakezou almost 14 years

<<<<(continuing)since we are reaching the limits of the possibilities of the current approach...we knew it would have happened,but we continued anyway on the less innovative path. Because it was considered for some reason easier,cheaper, whatever...anyway as already said, for economical-market reasons, not for strictly techinical impossibilities.I have old magazines promising incredible things in few years..before the "market" became so important,slowing down the peaces.So, as already said, these possible promises were betrayed,and we are,to say a unsenseful number,10 years late.
Billy ONeal almost 14 years

@ShinTakezou: Name such a "promise", please.
ShinTakezou almost 14 years

<<<... and the reason for this is the maximization of the profits: if they sold technologies that reaches the current results 10 years ago, what could they sell in those 10 years...? they had to invest more in the R&D, without being "supported" by selling anything new (but for malfunctions or hw becoming old, even though still enough powerful) as new, forcing people to upgrade. Things work this way, to be brief (!), but it does not mean that this produces the best human technology currently can
ShinTakezou almost 14 years

to be brief without searching, imagine that they talked about what we can do today with computers as possible in few years (mags from 1980 or so), say 1985; maybe optimistical, so let us imagine they say it would be 2000. They talked of what computers do now, as possible in 1985-1990. It was a projection of the trend, I think. That was what technology was promising. Then things started to slow down, from the consumer PoV.Innovative machines (or machines trying to be innovative)were simply "ruled out" by the market/-ing... save taking from them years later and selling as "innovative"
Billy ONeal almost 14 years

@ShinTakezou: Name such a "promise", please. I still have yet to see you write one.
ShinTakezou almost 14 years

A promise is something that you say it can happen soon,&then it happens late(or never).The"promises"of some magazines if you want an example were that I can run a raytracer on a complex scene and have a result in 1h 5 years or so after the reading;I still can't obtain the result on a P4 2.8 GHz 15 years later.The "promises"(projection)were so wrong?My point is no:optimistical maybe,but not so wrong.Simply,the market dictated a different pace enlarging the gap between technological enhancements in order to grow profits.&since x86 is the market dominant arch,...conclusions left 2 u
Billy ONeal almost 14 years

@ShinTakezou: Yes, the projection was wrong. They thought they were going to be making chips in the 20GHZ range too, until they started to push things and found there were physical limits to switching speed.
ShinTakezou almost 14 years

no, physical limits were well enough known. They was wrong since they thought that if a working tech is ready in a lab, the day after they could start producing it for the masses. This is currently wrong.
Billy ONeal almost 14 years

@ShinTakezou: "physical limits were well enough known" <-- Really? You expect me to believe they were pushing tens of GHZ 15 years ago? What other architecture exists that has this "working tech ready in a lab"? There are plenty of thriving architectures in use, most notably PPC (IBM's POWER Architecture, most notably) and ARM (and various children from ARM Holdings). Just because Intel's architecture took over the desktop market does not mean that Intel is the only game in town. If there existed the tech co accomplish what you propose, rest assured someone like IBM or ARM would now sell it.
ShinTakezou almost 14 years

the limits of how fast you can dissipate heat "produced" inside a small area because of any physical process can be exstimated since before the advent of chips;"they" knew that miniaturization and increasing switching speed impose limits(adjustable,but not beyond the physical threshold,again knowable,and nobody wants a liquid He cooler in his desktop,right?);the Very same tech you have today,likely,is what was into labs before;I am not talking about mysterious things/techs,but simply about the gap from their working existence and their commercialization;it makes a big difference>>>
ShinTakezou almost 14 years

Since artificialy enlarging the gap makes an opportunity of increasing profits,all these tech-companies(intel,motorola,ibm or whatever)do their R&D and they distill them;not that they do so because are pervert,likely they have also no chance...because of the market...so again,my prev statement:the market,its mechanisms and selfreferentiality slow down innovation(and increase profits,so nobody is really interested in changing those mechanisms);so what's on the market now,it could have appeared 5,10,maybe 15 years ago.So,if there's no market for a thing,nobody >>>
ShinTakezou almost 14 years

>>> tries to sell it,even though it is better than what it is currently selling. --- stop with these long comments: democratically I am right, you're are optimistically wrong :D (just since this is not a comfortable place where to express longly opinions about the matter)
Turing Complete almost 14 years

The segment:offset addressing was an attempt to stay compatible to some extent with the CP/M - world. One of the worst decisions ever.
Olof Forshell over 13 years

Memory has always been slow. It is possibly (relatively speaking) slower today than it was when I began with Z80s and CP/M in 1982. Extinction is not the only path of evolution because with extinction that particular evolutionary direction stops. I would say the x86 has adapted well in its 28 year (so far existence).
Olof Forshell over 13 years

Look around and you'll see CISCs dominating despite their supposedly "inferior" instruction set and architecture. RISC has managed to corner a piece of the market or created a new one. To describe this piece I will say that it is significant but not very significant. The CISCs are competent technology products whether you love them, hate them or whatever. So are RISCs. It probably won't be that much different in five years. Maybe in ten. Who knows?
Olof Forshell over 13 years

The early eighties was an era when minicomputers were 16 bit PDPs from DEC. The VAX was just being introduced and with it the age of Super-minis: suddenly minicomputer processors were 32 bits, the same size as IBMs mainframes. So I'll venture to say that 1 MB of memory was HUGE back then. The fact that it could be done for a 16-bit processor was neat. "Clunky" is for 20-20 vision hindsighters. The 8086's memory wasn't filled immediately either. I think it worked quite (or why not very) well in its historical context.
Deva over 13 years

Memory speeds briefly hit near parity with CPUs around the time of the 8086. The 9900 from Texas Instruments has a design that only works because this happened. But then the CPU raced ahead again and has stayed there. Only now, there are caches to help manage this.
Olof Forshell over 13 years

a, b and c in my examples should be viewed as memory-based variables and not to immediate values.
Olof Forshell over 13 years

... "dword ptr" is used to specifiy the size of a variable whose size is not known if, for instance, it is simply declared as external or if you've been lazy.
David Thornley over 13 years

The 8080 was an expansion of the 8008. The 8086 was modelled on the 8080, and indeed was assembler-compatible. That gave the early IBM PCs a lot of software immediately, since a lot of 8080 software could simply be reassembled (at a cost in performance). There have been a lot of expansions in memory addressibility and processor capabilities: the 8008 addressed 16K, the 8080 addressed 64K, the 8086 addressed 1M (MS-DOS reserved much of this, leaving 640K), and eventually 4M. Keeping binary compatibility with families of chips starting with one source-compatible with the 8008, well....
Olof Forshell over 13 years

@Turing Complete: segment:offset was NOT primarily an attempt to stay compatible with the CP/M world. What it was was a very successful attempt to allow a 16 bit processor to address more than 64 KBytes by placing code, data, stack and other memory areas in different segments.
Sebastiaan over 13 years

Hell, I'd say the origin is the 8080, not the 8086 -- 8086 has always struck me as a '80 writ 16-bit.
Billy ONeal over 13 years

@Chris: I don't believe modern x86's maintain compatibility with 8080 code.
Sebastiaan over 13 years

@Billy: No, but the instruction set got its start there. You'd be amazed at how similar they are, even if there's no 8-bit compatibility. At the very least, the 8080 is godfather to the x86. (Also, superfluous apostrophe.)
Billy ONeal over 13 years

@Chris: Ah, I see what you mean. I only mentioned 8086 because modern x86 chips still contain hardware to run 8086 code (but not 8080 code).
Olof Forshell about 13 years

@David: you say that the 8086 was assembler compatible with the 8080. Do you have an example to share with us? That's no mean feat considering the rich instruction set of the 8086 in comparison with the meager set of the 8080.
David Thornley about 13 years

@Olof Forshell: It was assembler compatible in that 8080 assembly code could translate into 8086 code. From that point of view, it was 8080 plus extensions, much like you could view 8080 as 8008 plus extensions.
Olof Forshell about 13 years

@Chris: give me an example of how similar the instruction sets are. I've programmed the 8080, Z80 and 8086 (PDP/8, PDP/11, VAX, 370 and 68000) in assembly and the only likeness I can come up with has to do with the string instructions in the Z80 which I think intel had a look at before implementing their own on the 8086.
Olof Forshell about 13 years

@David: I think you've confused the word compatible with the words translateable or emulateable. What you are speaking of does not have anything to do with compatible. If it did you could just as well say the 8080 is compatible with the 360 architecture if you translate it.
David Thornley about 13 years

@Olof Forshell: Except that the 8086 was designed for that to happen. It was an extension of the 8080, and most (possibly all) 8080 instructions mapped one-to-one, with obviously similar semantics. That isn't true of the IBM 360 architecture, no matter which way you want to push it.
Olof Forshell about 13 years

@David: according to Wikipedia: "Marketed as source compatible, the 8086 was designed so that assembly language for the 8008, 8080, or 8085 could be automatically converted into equivalent (sub-optimal) 8086 source code, with little or no hand-editing." The first word is "marketed" which means more or less true. The remainder means that some sort of assembler source translator would do an instruction by instruction conversion from 8080 assembly language source to 8086 ditto, which then needed assembling and linking. I think you're reading a lot more into this than is reasonable for sales talk.
David Thornley about 13 years

@Olof Forshell: Unfortunately, all my sources are long since disposed of, but I was somewhat following what was going on at the time, and am not getting information from Wikipedia. I had a reasonable familiarity with the 8080 and a lesser familiarity with the 8086, so it looked pretty much true to me. I did read that the original BASIC was slow, because it was converted from 8080.
David Thornley about 13 years

@Olof Forshell: Just because Wikipedia says "Marketed as...." doesn't mean it was in the least wrong. The marketing material would be easier to find and quote than anything technical. Moreover, as long as we're performing Wikipedia exegesis, it says "the 8086 was designed", implying that the processor was designed to facilitate ASM-level compatibility (although not binary compatibility).
Olof Forshell about 13 years

@David: there were 7 byte registers on the 8080 a(ccumulator),b,c,d,e,h & l. Then there was the flags register. Flags and a made up the psw (program status word). b&c, d&e and h&l could be used as 16 bit entries such as in "ldax b" and were then addressed with the first register in the pair or "lhld mem" it wasn't entirely thought through. b/c, d/e and h/l could be used for indirect addressing such as as "ldax d" or "mov a,m" (m being the h/l pair memory). Right off the bat I see several problems with your argumentation. The psw is does not exist on the 8086 so it has lahf/sahf ...
Olof Forshell about 13 years

... to compensate. 8080 has three split registers for memory addressing and the 8086 has one, bx (bl/bh). Instead sd and di would have to be used with a lot of xchg-ing in-between bx and them. Compatible? NO! Not even at the source level
Olof Forshell about 13 years

I forgot the pc and stack pointer. Here's an architecture description cpu-world.com/Arch/8080.html . Marketing and advertising allow for messages whose content ranges from truths to lies though most fall on the upper part of the truth half of the playing field. In my mind the issue of assembler compatibility falls right on the border between truth and lie: not really true but not a complete lie either. Is the glass half-full or half-empty? To me it's half-empty and I have explained why, Wikipedia or no Wikipedia.
Olof Forshell about 13 years

@Chris: every instruction set got its start somewhere. The 8086's string instructions for example are rumored to come from the Z80. I programmed 8080, 8085, Z80 and 8086 at the assembly level and I am certainly not "amazed" at how similar the 8080 and 8086 instruction sets are simply because they aren't. After reading this cpu-world.com/Arch/8080.html I don't think anyone else will be either. Cheers!
Deva about 13 years

I used to play a bit with an 8080 development board many many years ago and I've been thinking about this "assembly-compatible" since this thread got resurrected. I seem to recall a "translator" was available to do the minimal translation from 8080 assembly to 8086 assembly. This would not have been a difficult task, given the similar register arrangement.
Olof Forshell about 13 years

My point is that "compatible" and "compatible after conversion" are two entirely different concepts. Equating them is a redefinition of the word compatible to mean practically anything. In the early days an IBM-compatible PC required a candidate PC to be able to perform exactly the same on all levels software, OS, BIOS and hardware. Most used an 8088 @ 4.77MHz but Compaq showed that a PC with an 8086 or 80286 (protected mode BIOS) at a higher frequency could be compatible too. Compatibility has since been taken over by various committees but compatibile is a word that has a precise meaning ...
Olof Forshell about 13 years

... and shouldn't be de-valued.
Quonux about 13 years

that is not true, the newest "sandy bridge" processors use a kind of a trace cache (like that for the pentium 4, oh that old boy :D ), so technologies go away and come back...
ninjalj almost 13 years

The 8086 ISA was explicitely designed to make easy to write translators from 8080 to 8086 assembly. Wikipedia says the 8080 was source-compatible with the 8008 (does that mean real source-compatibility or does it need a translator?), so the x86 instruction set can be traced back to at least the 8008 in 1972.
ninjalj almost 13 years

Which is one of the things that make the x86 instruction set so ugly, since it can't decide if it's an accumulator or a register-file based architecture (though this was mostly fixed with the 386, which made the instruction set much more orthogonal, irregardless of whatever the 68k fans tell you).
ninjalj almost 13 years

There was an article by arstechnica's Jon Stokes that said that the number of transistors used for x86-RISC translation has remained mostly constant, which means that it's relative size compared to the total number of transistors in the die has shrunk: arstechnica.com/old/content/2004/07/pentium-1.ars/2
R.. GitHub STOP HELPING ICE almost 13 years

In reality placing data and stack in different segments was utterly useless for C; it was only usable for asm. In C, a pointer can point to data with static, automatic, or dynamically allocated storage duration, so there's no way to elide the segment. Maybe it was useful for Pascal or Fortran or something, but not for C, which was already the dominant language at the time...
Olof Forshell over 12 years

@R..: "data and stack in different segments was utterly useless for C" - is about as far from the truth as you can get. In multi-threaded (especially with multiple cores/processors) applications it is essential that each thread have its own stack (containing call information, local variables etc). Sharing the same data segments allows multiple threads to work simultaneously on the same data. It is well and thriving on multi-threaded apps for NT-engine applications and, I suspect, Linux apps.
Olof Forshell over 12 years

@ninjalj: please give some actual instruction examples showing that "the 8086 ISA was explicitely designed to make easy to write translators from 8080 to 8086 assembly."
R.. GitHub STOP HELPING ICE over 12 years

@Olof: In a multithreaded program, the entire address space is shared. A pointer to a local variable on thread A's stack must be valid in thread B. This is not possible if you used segments to perform the magic of giving each thread its own stack. It's easy if you simply have a different stack pointer for each thread.
rabada123 over 12 years

@R.. OTOH, segmentation is exactly what allows the x86 implementation of thread-local storage: FS and GS are used to access it. By fiddling with the LDT, the kernel can set threads up so that their FS and GS descriptors have a different segment base, but use the same selector value.
R.. GitHub STOP HELPING ICE over 12 years

@Bernd: The reason fs/gs were chosen for thread-local storage is not that segment registers are good for this. It's just that x86 is seriously starved for registers, and the segment registers were unused. A general-purpose register pointing to the thread structure would have worked just as well, and in fact many RISC systems with more registers use one as a thread pointer.
Nathan Fellman almost 11 years

@dthorpe: I disagree with most if not all of what you wrote. Ever since the 8086, you didn't have to worry if it was safe to execute an add after another add. The rules are clear. There also is no need for you to deal with instruction reordering. Ever since the Pentium Pro in the mid-90s, the CPU does that for you. What you're mentioning may have been a problem 20 years ago, but I don't see any reason to hold it against the x86 architecture nowadays.
supercat almost 11 years

If Intel had included FS and GS in the 8086, allowed direct loads of segment registers, and possibly added a couple of "normalize effective address" instructions, I'd regard the segment+offset style as vastly superior to any other approach I've seen for letting a 16-bit CPU access 1MB of address space. Actually, even without those features, I still think it's better than anything else I've seen since [other than, obviously, moving to 32-bit registers]. Further, within an object-oriented framework, linear addressing isn't really all that great.
supercat almost 11 years

Few applications need more than two billion distinct objects, or need individual objects to grow larger than two gigabytes. Most applications which could need more than two billion objects would probably work better if it used a smaller number of larger objects; most applications which would need an object over 2 gigs would probably benefit if they used multiple smaller objects. None of the 8x86 segmentation modes I know of are designed to work with more than 65,536 segments, but an object-oriented framework which used a segment for each object could use 32-bit object IDs rather than 64.
supercat almost 11 years

@OlofForshell: I've yet to find a 16-bit processor that can access 128K of data as cleanly as the 8086 could access 1MB.
Olof Forshell almost 11 years

@supercat: what people in the era of the flat x86-32 memory model tend to forget is that 16 bits means 64k of memory (anyone who bothers doing the math will understand that magic isn't possible, that the 8086 wasn't a nasty punishment for unsuspecting programmers). There are few ways to get around 64k but the 8086 solution was a good compromise.
Olof Forshell almost 11 years

@R..: yes in a multi-threaded program the entire address space is shared including the area used by the stacks. I don't understand the reasoning when you talk about a pointer on the stack of one thread being accessible by other threads. This is local storage in the sense that every thread has a LIFO of its own. If threads start messing with the contents of each other's stacks the result is usually chaos.
R.. GitHub STOP HELPING ICE almost 11 years

@OlofForshell: Normally, especially if the caller won't return in the parent thread before the child thread(s) finish (e.g. a multi-threaded sort function), the arguments/data for the child threads will be located on the parent thread's stack. There is nothing "chaotic" about this. Moreover, if a library function is implemented internally with threads and uses pointers, the threads must be able to dereference any valid pointer passed by the caller. Otherwise the caller would have to be aware that only pointers to static or dynamic storage objects could be passed.
supercat almost 11 years

@OlofForshell: I think many people bemoaned the fact that the 8086 wasn't as nice as the 68000 (which had a 16MB linear addressing space and a clear path to 4 gigs). Certainly going to a 32-bit processor will make it easier to access more than 64K, but the 8086 is a 16-bit architecture which was designed to be a step up from the 8-bit 8080. I see no reason Intel should have leapt directly from an 8-bit to a 32-bit one.
supercat almost 11 years

@OlofForshell: Incidentally, the Macintosh ran on the 68000, but its OS imposed a lot of 32K limitations in cases which on the 8086 would have been 64K limitations. Such limitations were the result of reasonable design decisions (e.g. it was better to save two bytes per line in text-edit fields than to allow one text field to use up a quarter of the machine's memory), but indicate that using a 32-bit processor is no panacea.
Olof Forshell almost 11 years

@R..: the 8086 was introduced in the late seventies. C was not the dominant language at the time. Intel assumed their processors would be used in embedded systems and programmed in assembly. Intel also had a high-level language called PL/M and an RTOS called RMX. C and UNIX had barely made it out of Bell Labs.
Shahbaz over 10 years

where energy consumption is a top concern, ARM or MIPS probably makes more sense... so, if there is at least one aspect where ARM or MIPS make more sense, doesn't it make x86 not necessarily the best ISA?
R.. GitHub STOP HELPING ICE over 10 years

That's why I qualified "the best" with "aside from the cost...and their energy requirements".
supercat about 10 years

The only major problems with the 8086 segmented architecture was that there was only one non-dedicated segment register (ES), and that programming languages were not designed to work with it effectively. The style of scaled addressing it uses would work very well in an object-oriented language which does not expect objects to be able to start at arbitrary addresses (if one aligns objects on paragraph boundaries, object references will only need to be two bytes rather than four). If one compares early Macintosh code to PC code, the 8086 actually looks pretty good compared to 68000.
The Mask about 10 years

"Nowadays the x86 is translated into RISC-style instructions before it's executed anyway, " By who? the processor itself?
ctrl-alt-delor almost 10 years

It is translated to risk by a load of hot transistors that are not a cache, or anything else useful.
Joe Plante over 9 years

That isn't the first time I heard the suggestion to write it in C first, and then distill it into assembler. That definitely helps
Admin over 9 years

In the early days all processors were RISC. CISC came about as a mitigation strategy for ferric core memory systems that were VERY slow, thus CISC, with fewer, more powerful instructions, put less stress on the memory subsystem, and made better use of bandwidth. Likewise, registers were originally thought of as on-chip, in-CPU memory locations for doing accumulations. The last time I seriously benchmarked a RISC machine was 1993 - SPARC and HP Prisim. SPARC was horrible across the board. Prisim was up to 20x as fast as a 486 on add/sub/mul but sucked on transcendentals. CISC is better.
Admin over 9 years

I think Intel's throttling down the CPU speed, and smaller die sizes have largely eliminated the power differential. The new Celeron dual 64-bit CPU with 64k L1 and 1MB L2 caches is a 7.5 watt chip. It's my "Starbucks" hangout machine, and the battery life is ridiculously long and will run rings around a P6 machine. As a guy doing mostly floating point computations I gave up on RISC a long time ago. It just crawls. SPARC in particular was atrociously glacial. The perfect example of why RISC sucks was the Intel i860 CPU. Intel never went THERE again.
R.. GitHub STOP HELPING ICE over 9 years

@RocketRoy: 7.5 watt isn't really acceptable for a device that's powered 24/7 (and not performing useful computations the whole time) or running off a 3.7v/2000mAh battery.
Admin over 9 years

You size the CPU to the task at hand and resources available. That doesn't rule out a CISC or x86 architecture, it just means it may be overkill for some apps.
Admin over 9 years

I just looked up the list of MIPS processors, and anything in the last 10yrs is 15-30 watts, so Celeron at 7.5 is looking pretty svelte, especially since it can throttle, sleep and hibernate. Surprising to me.
fuz almost 9 years

The addressing mode problem was rectified in the 80386. Only 16 bit code has limited addressing modes, 32 bit code is much better. You can get the 32 bit addressing modes in 16 bit code using a special prefix and vice versa.
cHao almost 9 years

@FUZxxl: Yeah...i probably should have mentioned that the ugliness is mostly limited to 16-bit code. Fixed (i think). :)
fuz almost 9 years

The perceived inelegance mostly comes from the misconception that the registers of an 8086 are general purpose registers; that's incorrect. Each of them has a special purpose and if you don't stick to their purposes, you are going to have a bad time.
cHao almost 9 years

@FUZxxl: Yep. But you're generally going to have a bad time anyway. :) That was one of the worst things about the 8086 -- there really weren't any purely general-purpose registers at all -- and it's likely to shock the hell out of someone coming at it fresh from a RISC perspective. Even the 386 didn't entirely fix it, but did at least make it a lot less restrictive.
supercat over 8 years

@OlofForshell: The 8080 has three pairs of 8-bit registers (HL, BC, and DE) which some instructions will use as 16-bit registers. Those register pairs map pretty well to BX, CX, DX. The instruction sequence lahf / push ax is roughly equivalent to the 8080's push af, and pop ax / sahf is roughly equivalent to pop af. In practice, very little code wants to save AL and F without also wanting to save AH, but the 8080 had no equivalent to AH.
Olof Forshell over 8 years

@supercat: "roughly equivalent" proves little. BC and DE could be used as address registers for indirectly loading values from memory. Neither CX nor DX allow this, you would instead need to use SI and DI which are not divided into two 8-bit registers as CX (CH/CL) and DX (DH/DL) are. I guess the classification of 8080-origins depends on personal belief, what a person wants to see or not see and choosing examples more than anything else.
supercat over 8 years

@OlofForshell: Since code written for the 8080 wouldn't use anything equivalent to SI and DI, one could replace what in Z80 notation would be "LD A,(DE)" [that's an 8080 instruction, but I forget the 8080 notation] with "MOV SI,CX / MOV AL,[SI]". I'm not sure that mechanical translation would be likely to yield good code, but the data books Intel printed for the 8088 promoted the ease of migrating 8080 code. Whether or not such claim was really true, Intel did promote it.
Peter Cordes over 8 years

re: register renaming: Even implementations of RISC ISAs with many architectural registers (e.g. 32) still use register renaming to sustain large out-of-order windows with lots of instructions in flight. Where the scarcity of architectural registers hurts is in extra instructions to store/reload from memory when there aren't enough architectural registers to keep all useful values in regs. many vs. few architectural registers is a different thing from small or large physical register file with register renaming. You'd still use reg. renaming for any ISA, in a high perf implementation.
Billy ONeal over 8 years

@Peter: Yeah, that's what I tried to say. Namely, that there's a perception of a fewer number of physical registers because the ISA has a smaller number of registers. Edited to say that more clearly.
Peter Cordes over 8 years

@BillyONeal: I think I obscured my point. You still miss pointing out that lack of architectural registers is a problem in and of itself. Lots of physical regs helps keep lots of insns in flight at once, but more architectural regs reduces the number of instructions needed to do the same task. x86 code, esp. 32bit, often has to repeatedly load from memory (or worse, store and later reload), or re-compute something, because there aren't enough architectural registers to keep it around across loop iterations. AMD/Intel's uop micro-fusion helps some, but it still takes a load port cycle.
Peter Cordes over 8 years

Simulations have showed that the gains from going from 16 to 32 architectural registers would be smaller than the gains from 8 to 16, for some kinds of code. IIRC, the numbers were something like 15% speedup for 8->16, and 5% speedup for 16->32. I don't remember the context, or if those are accurate, but it was in some kind of paper about typical code in general, not for some specific algorithm. The point is, register renaming doesn't fix everything. There's a reason AVX512 increases the number of vector registers to 32.
Billy ONeal over 8 years

@Peter: The store and later reload doesn't get aliased away?
Peter Cordes over 8 years

@BillyONeal: Store-forwarding keeps the latency down to something like 5 cycles (Intel SnB-family). Is that what you meant by "aliased away"? A stand-along mov load is still a separate instruction that has to be decoded, and that takes a spot in the uop cache, and that takes up an issue slot in the 4-ops per cycle out-of-order core. It even occupies a load port (or the store port) for a cycle, competing for L1 cache bandwidth with other memory ops. Agner Fog's microarch guide documents the internals for all recent CPUs.
Peter Cordes over 8 years

I just went ahead and put in what I would say, rather than playing back-and-forth in comments. I love that SO works that way. :)
Alex Zhukovskiy about 8 years

@OlofForshell You say there typically won't be a reminder but wiki says that mips have it: en.wikipedia.org/wiki/MIPS_instruction_set#Integer
Olof Forshell about 8 years

@Alex Zhukovsky: I wasn't aware that MIPS is a typical RISC processor. "Typical" to me is ARM or PowerPC.
Olof Forshell about 8 years

@supercat: actually, the es register WAS dedicated to something, namely to those string instructions that required storing (movs, stos) or scanning (cmps and scas). Given 64KiB addressing from every segment register es also provided the "missing link" to memory other than code, data and stack memory (cs, ds, ss). The segment registers provided a sort of memory protection scheme in that you could not address outside the registers' 64Kib memory blocks. What better solution do you propose given that the x86 was a 16-bit architecture and the lithography constraints of the day?
supercat about 8 years

@OlofForshell: ES was used for string instructions, but could be used as an uncommitted register for code not using them. A way to ease the seg-reg bottleneck without requiring too much opcode space would be to have an "rseg" prefix which would specify that for the following r/m-format instruction the "r" field would select from CS/SS/DS/ES/FS/GS/??/?? instead of AX/BX/CX/DX/SI/DI/SP/BP, and to have prefixes for FS/GS and instructions for LFS and LGS (like LDS and LES). I don't know how the micro-architecture for the 8086 was laid out, but I would think something like that could have worked.
Olof Forshell about 8 years

@supercat: as I wrote, "register es also provide the missing link to memory other than ..." Fs and gs didn't arrive until the 386 as I recall.
supercat about 8 years

@OlofForshell: They didn't, which made the 80286 architecture even worse than the 8086 architecture in most regards. My point was that adding a couple more segment registers (or even one, for that matter) would have made the 8086 architecture a lot more useful, and the instruction set could have been cleaner and more useful if segment registers could be accessed much like the other ones.
Olof Forshell about 8 years

@supercat: we are discussing something that went down thirty-five years ago with the birth of the world's most successful computer architecture. I wrote some really well-performing software for the IBM PC that mixed assembly and PL/M-86 and I really don't know how the 8086 or 80286 would have become "more useful" had they had an extra segment register. I was perfectly content that I had more than 64KiB to work with. Lots of programmers had issues with the 8086 segmentation (I think they were fighting it instead of just accepting it and getting on with the job at hand) - I wasn't one of them.
supercat about 8 years

@OlofForshell: For many applications, the lack of segment registers was a major bottleneck; when using machine code one could work around it by putting translation tables in the code segment and having some functions take pointers to data that was required to be on the stack, and performance was generally "good enough", but additional segment registers would have helped a lot. Actually, my big gripe with the 80386 is that they kept the segment registers 16 bits, which makes it impossible to identify objects using segment alone.
supercat about 8 years

Forming addresses by using a segment+offset can allow extremely efficient object-oriented frameworks if the segment part of an address is sufficient to uniquely identify objects. Using 32-bit object references would allow better cache efficiency than using 64-bit ones, and having the 32 bits identify a segment and scaled offset would make it possible to support objects totaling well over 4GB.
Olof Forshell about 8 years

@supercat: you jump back and forth between x86-16 and -32, it's really difficult to follow. One minute I thought we were discussing the pros and cons of the original (real mode) x86-16 - the roots of today's PC architecture - and all of a sudden you are discussing 4GB and cache efficiency. Please be more concise and don't mix incompatible subjects.
supercat about 8 years

@OlofForshell: I think the 8086 concept of having segment registers which represent scaled base addresses for pointers without having to have separate descriptors for each such address was a good one which worked uniquely well on the 8086; the segmentation scheme on the 80286 was far less useful, and on the 80386 segments were largely ignored even though--if implemented in a fashion more closely analogous to the 8086, they could have allowed a lot more to be done with 32-bit applications.
Seng Cheong over 7 years

@RocketRoy "Intel i860 CPU. Intel never went THERE again." After a little research, the i860 sounds a lot like Itanium: VLIW, compiler-ordered instruction parallelism....
Admin over 7 years

Just some perspective on CPU power requirements. My Garmin 305 on my bike has a max device draw of 75 milliwatts, so again, get the requirements down and then pick the CPU. As a practical matter, it's easier to develop, or at least prototype until the system is well defined and understood, on a BIG Intel box and then port than struggle with min hardware the whole dev cycle. Especially true if the target environment is some version of Linux. Looks like the new replacement for 4 F-16 fighter computers is a single i7 Intel box saving hundreds of pounds and watts.
Peter Cordes over 6 years

Renaming EFLAGS avoids write-after-write dependencies. Even in-order Pentium P5 could run add with 2 per clock throughput, and it doesn't decode to uops internally. Modern x86 like Haswell/Skylake have 4 per clock throughput for add and other simple ALU ops that write but don't read flags. There are many warts, and slowdowns, but they are more subtle than that; e.g. the partial-flag updating of inc is harder to handle in hardware (and P4 tried to avoid handling it). Also, leaving flags unmodified for shift counts of zero sucks a lot.
user253751 about 6 years

@NathanFellman The programmers don't have to worry about that but the chip does!
Hadi Brais about 6 years

This answer is OK, but I wouldn't say it's a good answer. All the stated points are superficial and vague. There are two parts to the question "why x86 is ugly" and "why x86 is inferior". This answer looks like an unorganized collection of statements with unclear relationships to each other or to the questions. I think a good answer should specify (and explain why) exactly which features of x86 that make it "ugly", which features make it "inferior", which features make it "superior", and why all of these features were added to x86. The other answers aren't any better.
user2284570 over 3 years

@BillyONeal for the last point as far I m aware, this allows more code from the ram to fit in the same cache size as many used instructions are still one byte long, thus offsetting the required cpu complexity in terms of performance.
user2284570 over 3 years

@BillyONeal I would also add en.wikipedia.org/wiki/Orthogonal_instruction_set. Though I m having a doubt seeing compiled code putting constants inside registers even for a single operation, which might mean most instructions don t supports using a constant as an operand (unlike what was retained in RISC).
user2284570 over 3 years

@dmckee---ex-moderatorkitten I would tell considering the number of registers that both instructions set where designed at the time the ram was faster than the cpu, hence the smaller number of registers and maybe the higher frequency of cache misses.
user2284570 over 3 years

@staticsan wrong! cpu latency only got faster than the ram starting in 1980, hence en.wikipedia.org/wiki/Orthogonal_instruction_set. This was done because ram were very small and expensive.
user2284570 over 3 years

@OlofForshell Depends. Unlike x86, risc will typically constants encoded inside the instructions directly instead of having to use them from the ram or a register.
user2284570 over 3 years

@R..GitHubSTOPHELPINGICE no. I would still tell that x86 is favoring caches over regsiters by exposing a lower number of general purpose registers. And risc focus speed on what mater the most. That s why for example it s not a good idea even on modern x86 to use the DIV instruction in term of latency for dividing 2 numbers (I m meaning many times it s faster to use many more x86 instructions).
user2284570 over 3 years

@BillyONeal except it s wrong.
Olof Forshell over 3 years

@user2284570 I am not sure what you mean. x86-32 mov immediate allows you to include the constant as part of an instruction for loading a 8-, 16- or 32-bit register. In PowerPC you load immediate constants into a 32-bit register in 16-bit halves, one at a time, somewhat oversimplified. There are register to register instructions for both architectures but of course the source register should contain something meaningful to begin with, perhaps as a result of a prior memory load or an immediate constant load. Or am I missing something?
user2284570 over 3 years

@OlofForshell that the point, the mov instruction being the x86 instruction used for load/store, you typically move the constant into a register instead of performing an addition with the constant encoded inside the instruction. And based, on en.wikipedia.org/wiki/Orthogonal_instruction_set#RISC and my knowledge of mips, I can tell that constants can be used with any instructions directly.
user2284570 over 3 years

@TheMask for having seeing a hacked Intel rsa decrypted microcode, I can tell that on 64 bits, micro ops are all 48 bits long. And they can be cached. Some manufacturers even let accessing the real instruction set directly en.wikipedia.org/wiki/Elbrus_2000 en.wikipedia.org/wiki/Alternate_Instruction_Set.
user2284570 over 3 years

@OlofForshell and it seems Powerpc can also deal with small constants directly.
Olof Forshell over 3 years

@user2284570 you wrote "Unlike x86, risc will typically constants encoded inside the instructions directly instead of having to use them from the ram or a register" which to my knowledge is not a true representation of how an x86 compiler chooses to load constants into registers. There are any number of ways to utilize an immediate value in arithmetic, logic and move instructions. The specific instructions chosen depends very much on the processor family/.../revision, chosen optimization and the compiler's knowledge of the processor's stall characteristics (to name but a few factors) ...
Olof Forshell over 3 years

@user2284570 ... in relation to the position of the instruction in the execution stream being pieced together. There are other ways to load immediate values but using comments for a discussion of all possible methods of supplying an immediate value does not seem meaningful due to space restrictions and because the only people who know which ones should be used when are the masters of the alchemy of writing optimizers. I am not such a person.