Difference between: Opcode, byte code, mnemonics, machine code and assembly

82,266

Solution 1

OPCODE: It is a number interpreted by your machine(virtual or silicon) that represents the operation to perform

BYTECODE: Same as machine code, except, its mostly used by a software based interpreter(like Java or CLR)

MNEMONIC: English word MNEMONIC means "A device such as a pattern of letters, ideas, or associations that assists in remembering something.". So, its usually used by assembly language programmers to remember the "OPERATIONS" a machine can do, like "ADD" and "MUL" and "MOV" etc. This is assembler specific.

MACHINE CODE: It is the sequence of numbers that flip the switches in the computer on and off to perform a certain job of work - such as addition of numbers, branching, multiplication, etc etc. This is purely machine specific and well documented by the implementers of the processor.

Assembly: There are two "assemblies" - one assembly program is a sequence of mnemonics and operands that are fed to an "assembler" which "assembles" the mnemonics and operands into executable machine code. Optionally a "linker" links the assemblies and produces an executable file.

the second "assembly" in "CLR" based languages(.NET languages) is a sequence of CLR code infused with metadata information, sort of a library of executable code, but not directly executable.

Solution 2

Aniket did a good job, but I'll have a go too.

First, understand that at the lowest level, computer programs and all data are just numbers (sometimes called words), in memory of some kind. Most commonly these words are multiples of 8 bits (1's and 0's) (such as 32 and 64) but not necessarily, and in some processors each word is considerably larger. Regardless though, it's just numbers that are represented as a series of 1's and 0's, or on's and off's if you like. What the numbers mean is up to what/who-ever is reading them, and in the processor's case, it reads memory one word at a time, and based on the number (instruction) it sees, takes some action. Such actions might include reading a value from memory, writing a value to memory, modifying a value it had read, jumping to somewhere else in memory to read instructions from.

In the very early days a programmer would literally flick switches on and off to make changes to memory, with lights on or off to read out the 1's and 0's, as there were no keyboards, screens and so on. As time progressed, memory got larger, processors became more complex, display devices and keyboards for input were conceived, and with that, easier ways to program.

Paraphrasing Aniket:

The OPCODE is part of an instruction word that is interpreted by the processor as representing the operation to perform, such as read, write, jump, add. Many instructions will also have OPERANDS that affect how the instruction performs, such as saying from where in memory to read or write, or where to jump to. So if instructions are 32 bits in size for example, a processor may use 8 bits for the opcode, and 12 bits for each of two operands.

A step up from toggling switches, code might be entered into a machine using a program called a "monitor". The programmer would use simple commands to say what memory they want to modify, and enter MACHINE CODE numerically, e.g. in base 16 (hex) using 0 to 9 and A to F for digits.

Though better than toggling switches, entering machine code is still slow and error prone. A step up from that is ASSEMBLY CODE, which uses more easily remembered MNEMONICS in place of the actual number that represents an instruction. The job of the ASSEMBLER is primarily to transform the mnemonic form of the program to the corresponding machine code. This makes programming easier, particularly for jump instructions, where part of the instruction is a memory address to jump to or a number of words to skip. Programming in machine code requires painstaking calculations to formulate the correct instruction, and if some code is added or removed, jump instructions may need to be recalculated. The assembler handles this for the programmer.

This leaves BYTECODE, which is fundamentally the same as machine code, in that it describes low level operations such as reading and writing memory, and basic calculations. Bytecode is typically conceived to be produced when COMPILING a higher level language, for example PHP or Java, and unlike machine code for many hardware based processors, may have operations to support specific features of the higher level language. A key difference is that the processor of bytecode is usually a program, though processors have been created for interpreting some bytecode specifications, e.g. a processor called SOAR (Smalltalk On A RISC) for Smalltalk bytecode. While you wouldn't typically call native machine code bytecode, for some types of processors such as CISC and EISC (e.g. Linn Rekursiv, from the people who made record players), the processor itself contains a program that is interpreting the machine instructions, so there are parallels.

Solution 3

The following line is a disassembled x86 code.

68 73 9D 00 01       PUSH 0x01009D73

68 is the opcode. With the following for bytes it represents PUSH instruction of x86 Assembly language. PUSH instruction pushes 4 bytes (32 bits) length data to stack. The word PUSH is just a mnemonic that represents opcode 68. Each of bytes 68, 73, 9D, 00, 01 is machine code.

machine codes are for real machines (CPUs) but byte codes are pseudo machine codes for virtual machines.

When you write a java code. java compiler compiles your code and generates byte codes. (A .class file) and you can execute the same code at any platform without changing.

                     JAVA CODE
                         |
                         |
                     BYTE CODE
         ________________|_______________
         |               |               |
      x86 JVM        SPARC JVM        ARM JVM
         |               |               |
         |               |               |
        x86            SPARC            ARM
   MACHINE CODE     MACHINE CODE    MACHINE CODE

Solution 4

"Assembly" originates from the very early code "assemblers" which would "assemble" programs from multiple files (what we would now call "include" files). (Though note the "files" were often card decks.) The use of the term "assembly language" to refer to a mnemonic representation of the code is a back-formation from "assembler", and somewhat imprecise, since a number of "assemblers" do not support include files and hence do not "assemble".

It's interesting to note that "assemblers" were invented to support "subroutines". Originally there were "internal" and "external" subroutines. "Internal" subroutines were what we would now call "inline", whereas "external" ones were reached via a primitive "call" mechanism. There was much controversy at the time as to whether "external" subroutines were a good idea or not.

"Mnemonic" comes from the name of the Greek god Mnemosyne, the goddess of memory. Anything that helps you remember stuff is a "mnemonic device".

Solution 5

Recently I read a good article on this, Difference between Opcode and Bytecode, thus like to share with whoever is after a good explanation on this topic. All the credit goes to the original author.

Opcode vs Bytecode

  • Opcode:

    Opcode is short for operation code. As its name suggests, the opcode is a type of code that tells the machine what to do, i.e. what operation to perform. Opcode is a type of machine language instruction.

  • Bytecode:

    Bytecode is similar to opcode in nature, as it also tells the machine what to do. However, bytecode is not designed to be executed by the processor directly, but rather by another program.
    It is most commonly used by a software based interpreter like Java or CLR. They convert each generalized machine instruction into a specific machine instruction or instructions so that the computer's processor will understand.
    In fact, the name bytecode comes from instruction sets that have one-byte opcodes followed by optional parameters.

Share:
82,266
Ahmed Taher
Author by

Ahmed Taher

Updated on January 30, 2020

Comments

  • Ahmed Taher
    Ahmed Taher over 4 years

    I am quite new to this. I tried to understand the difference between the mentioned terms in a clear fashion, however, I am still confused. Here is what I have found:

    • In computer assembler (or assembly) language, a mnemonic is an abbreviation for an operation. It's entered in the operation code field of each assembler program instruction. for example AND AC,37 which means AND the AC register with 37. so AND, SUB and MUL are mnemonic. They are get translated by the assembler.

    • Instructions (statements) in assembly language are generally very simple, unlike those in high-level programming languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode, plus zero or more operands.