Printing Hexadecimal Digits with Assembly

15,254

Solution 1

First write a simple routine which takes a nybble value (0..15) as input and outputs a hex character ('0'..'9','A'..'F').

Next write a routine which takes a byte value as input and then calls the above routine twice to output 2 hex characters, i.e. one for each nybble.

Finally, for an N byte integer you need a routine which calls this second routine N times, once for each byte.

You might find it helpful to express this in pseudo code or an HLL such as C first, then think about how to translate this into asm, e.g.

void print_nybble(uint8_t n)
{
    if (n < 10) // handle '0' .. '9'
        putchar(n + '0');
    else // handle 'A'..'F'
        putchar(n - 10 + 'A');
}

void print_byte(uint8_t n)
{
    print_nybble(n >> 4); // print hi nybble
    print_nybble(n & 15); // print lo nybble
}

print_int16(uint16_t n)
{
    print_byte(n >> 8); // print hi byte
    print_byte(n & 255); // print lo byte
}

Solution 2

Is this a homework assignment?

Bits is bits. Bit, Byte, word, double word, these are hardware terms, something instruction sets/assembler is going to reference. hex, decimal, octal, unsigned, signed, string, character, etc are manifestations of programming languages. Likewise .text, .bss, .data, etc are also manifestations of software tools, the instruction set doesnt care about one address being .data and one being .text, it is the same instruction either way. There are reasons why all of these programming language things exist, very good reasons sometimes, but dont get confused when trying to solve this problem.

To convert from bits to human readable ascii, you first need to know your ascii table, and bitwise operators, and, or, logical shift, arithmetic shift, etc. Plus load and store and other things.

Think mathmatically what it takes to get from some number in a register/memory into ascii hex. Say 0x1234 which is 0b0001001000110100. For a human to read it, yes you need to get it into a string for lack of a better term but you dont necessarily need to store four characters plus a null in adjacent memory locations in order to do something with it. It depends on your output function. Normally character based output entities boil down to a single output_char() of some sort called many times.

You could convert to a string but that is more work, for each ascii character you compute call some sort of single character based output function right then. putchar() is an example of a byte output character type function.

So for binary you want to examine one bit at a time and create a 0x30 or 0x31. For octal, 3 bits at a time and create 0x30 to 0x37. Hex is based on 4 bits at a time.

Hex has the problem that the 16 characters we want to use are not found adjacent to each other in the ascii table. So you use 0x30 to 0x39 for 0 to 9 but 0x41 to 0x46 or 0x61 to 0x66 for A to F depending on your preference or requirements. So for each nybble you might AND with 0xF, compare with 9 and ADD 0x30 or 0x37 (10+0x37 = 0x41, 11+0x37 = 0x42, etc).

Converting from bits in a register to an ascii representation of binary. If the bit in memory was a 1 show a 1 (0x31 ascii) of the bit was a 0 show a 0 (0x30 in ascii).

void showbin ( unsigned char x )
{
    unsigned char ra;

    for(ra=0x80;ra;ra>>=1)
    {
        if(ra&x) output_char(0x31); else output_char(0x30);
    }
}

It may seem logical to use unsigned char above, but unsigned int, depending on the target processor, could produce much better (cleaner/faster) code. but that is another topic

The above could look could look something like this in assembler (intentionally NOT using x86)

 ...
 mov r4,r0
 mov r5,#0x80
top:
 tst r4,r5
 moveq r0,#0x30
 movne r0,#0x31
 bl output_char
 mov r5,r5, lsr #1
 cmp r5,#0
 bne top
 ...

Unrolled is easier to write and going to be a bit faster, the tradeoff is more memory used

 ...
 tst    r4, #0x80
 moveq  r0, #0x30
 movne  r0, #0x31
 bl output_char
 tst    r4, #0x40
 moveq  r0, #0x30
 movne  r0, #0x31
 bl output_char
 tst    r4, #0x20
 moveq  r0, #0x30
 movne  r0, #0x31
 bl output_char
 ...

Say you had 9 bit numbers and wanted to convert to octal. Take three bits at a time (remember humans read left to right so start with the upper bits) and add 0x30 to get 0x30 to 0x37.

...
mov r4,r0
mov r0,r4,lsr #6
and r0,r0,#0x7
add r0,r0,#0x30
bl output_char
mov r0,r4,lsr #3
and r0,r0,#0x7
add r0,r0,#0x30
bl output_char
and r0,r4,#0x7
add r0,r0,#0x30
bl output_char
...

A single (8 bit) byte in hex might look like:

...
mov r4,r0
mov r0,r4,lsr #4
and r0,r0,#0xF
cmp r0,#9
addhi r0,r0,#0x37
addls r0,r0,#0x30
bl output_character
and r0,r4,#0xF
cmp r0,#9
addhi r0,r0,#0x37
addls r0,r0,#0x30
bl output_character
...

Making a loop from 1 to N storing that value in memory and reading it from memory (.data), output in hex:

...
mov r4,#1
str r4,my_variable
...
top:
ldr r4,my_variable
mov r0,r4,lsr #4
and r0,r0,#0xF
cmp r0,#9
addhi r0,r0,#0x37
addls r0,r0,#0x30
bl output_character
and r0,r4,#0xF
cmp r0,#9
addhi r0,r0,#0x37
addls r0,r0,#0x30
bl output_character
...
ldr r4,my_variable
add r4,r4,#1
str r4,my_variable
cmp r4,#7 ;say N is 7
bne top
...
my_variable .word 0

Saving to ram is a bit of a waste if you have enough registers. Although with x86 you can operate directly on memory and dont have to go through registers.

x86 isnt the same as the above (ARM) assembler so it is left as an exercise of the reader to work out the equivalent. The point is, it is the shifting, anding, and adding that matter, break it down into elementary steps and the instructions fall out naturally from there.

Solution 3

Quick and dirty GAS macro

.altmacro

/*
Convert a byte to hex ASCII value.
c: r/m8 byte to be converted
Output: two ASCII characters, is stored in `al:bl`
*/
.macro HEX c
    mov \c, %al
    mov \c, %bl
    shr $4, %al
    HEX_NIBBLE al
    and $0x0F, %bl
    HEX_NIBBLE bl
.endm

/*
Convert the low nibble of a r8 reg to ASCII of 8-bit in-place.
reg: r8 to be converted
Output: stored in reg itself.
*/
.macro HEX_NIBBLE reg
    LOCAL letter, end
    cmp $10, %\reg
    jae letter
    /* 0x30 == '0' */
    add $0x30, %\reg
    jmp end
letter:
    /* 0x57 == 'A' - 10 */
    add $0x57, %\reg
end:
.endm

Usage:

mov $1A, %al
HEX <%al>

<> are used because of .altmacro: Gas altmacro macro with a percent sign in a default parameter fails with "% operator needs absolute expression"

Outcome:

  • %al contains 0x31 , which is '1' in ASCII
  • %bl contains 0x41 , which is 'A' in ASCII

Now you can do whatever you want with %al and %bl, e.g.:

  • loop over multiple bytes and copy them to memory (make sure to allocate twice as much memory as there are bytes)
  • print them with system or BIOS calls
Share:
15,254
BSchlinker
Author by

BSchlinker

Updated on June 12, 2022

Comments

  • BSchlinker
    BSchlinker almost 2 years

    I'm trying to learn NASM assembly, but I seem to be struggling with what seems to simply in high level languages.

    All of the textbooks which I am using discuss using strings -- in fact, that seems to be one of their favorite things. Printing hello world, changing from uppercase to lowercase, etc.

    However, I'm trying to understand how to increment and print hexadecimal digits in NASM assembly and don't know how to proceed. For instance, if I want to print #1 - n in Hex, how would I do so without the use of C libraries (which all references I have been able to find use)?

    My main idea would be to have a variable in the .data section which I would continue to increment. But how do I extract the hexadecimal value from this location? I seem to need to convert it to a string first...?

    Any advice or sample code would be appreciated.

  • adrian
    adrian over 6 years
    the C code given above produces "(" when given any value above 0x80 (i.e. if given 0x8F it give "(F")
  • Paul R
    Paul R over 6 years
    @AdrianZhang: It works for me - did you change something ? Can you provide a minimal reproducible example that shows the problem ?
  • adrian
    adrian over 6 years
    Never mind, it seems that changing from a uint8_t to a char produces the wrong result. Funny how that happened.
  • adrian
    adrian over 6 years
    Scratch that, it was not an unsigned char. Silly me.
  • Paul R
    Paul R over 6 years
    Yes, it’s implementation-defined as to whether plain char is treated as signed or unsigned, so they are best avoided when you need unsigned.
  • Peter Cordes
    Peter Cordes over 3 years
    Use a lookup table like a normal person for mapping a contiguous input range to various outputs, not a chain of branches!