Bit-reverse a byte on 68HC12
Solution 1
If you can spare the 256 bytes extra code size, a lookup table is probably the most efficient way to reverse a byte on a 68HCS12. But I am pretty sure this is not what your instructor is expecting.
For the "normal" solution, consider the data bits individually. Rotates and shifts allow you to move bits around. For a first solution, isolate the eight bits (with bitwise "and" operations), move them to their destination positions (shifts, rotates...), then combine them together again (with bitwise "or" operations). This will not be the most efficient or simplest implementation, but you should first concentrate on getting a correct result -- optimization can wait.
Solution 2
Hints: If you do a shift, one bit gets shifted out and a zero (probably) gets shifted in. Where does that shifted out bit go to? You need to shift that in to the other end of the destination register or memory address.
I'm sure that 25 years ago I could do this in Z80 machine code without an assembler :)
Solution 3
Consider two registers as stacks of bits. What happens if you move one bit at a time from one to another?
Solution 4
When you do a right shift, what was the least significant bit goes into the carry flag.
When you do a rotate, the carry flag is used to fill in the vacated bit of the result (LSB for a ROL, MSB for a ROR).
Solution 5
FIrst of all work out the algorithm for doing what you need to do. Express it as pseudo code or C or plain English or diagrams or whatever you are comfortable with. Once you have cleared this conceptual hurdle the actual implementation should be quite simple.
Your CPU probably has instructions which let you shift and/or rotate a register, perhaps including the carry flag as an additional bit. These instructions will be very useful.
Related videos on Youtube
dohlfhauldhagen
Updated on April 10, 2020Comments
-
dohlfhauldhagen about 4 years
I'm in a microprocessors class and we are using assembly language in Freescale CodeWarrior to program a 68HCS12 micro controller. Our assignment this week is to revers a byte, so if the byte was 00000001, the output would be 10000000, or 00101011 to 11010100. We have to use assembly language, and were told we could use rotates and shifts (but not limited to!) to accomplish this task. I'm really at a loss as to where I should start.
-
Spacedman about 13 yearsThere are hackier methods: graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious (in C, but could be done in assembly...)
-
dohlfhauldhagen about 13 yearsOk, I figured out how to do it using shifts and rotates. However, we can get extra credit for having the most efficient code. He doesn't care how we do it. I honestly don't know how to do a lookup table. I was kind of reading about them when researching an answer to this problem before, but I didn't really understand how to implement them.
-
Spacedman about 13 yearsSuppose your program in its lifetime is going to be doing zillions of bit reversals. When you start the program you just do all 256 possibilities by bit rotates and store the results in 256 bytes of contiguous memory, starting at, say, BASE. Now, whenever you need to flip the bits of a register, you just look at (BASE+value). That's a lookup table (LUT). You can even hardcode the LUT into the assembly as a chunk of 256 constants if you can precompute them. Then there's no init needed. Win.
-
Olof Forshell about 13 yearsTo save space over the 256 byte table you can have a 16 byte table containing the values for four bits (nibbles) at a time. The algorithm then would be "revval=revdigit[inval&0x0f]<<4|revdigit[inval>>4]". If I were a prof I'd like the two parts where one shift is in the indexing and the other outside.
-
Peter Cordes over 4 yearsIt's actually easier in assembly than C because you have rotate-through-carry. Shift a bit into the carry flag then do something like
adc same,same
to shift the carry flag into the bottom of a register. -
Peter Cordes about 4 yearsYou don't need a slow
rcl
; you can useadc ax, ax
. Both are 2 bytes (in 16-bit mode) andadc
is faster on modern CPUs. Your breakdown in terms of shl eax,1 and adc al,0 is wrong: it would beshl ax,1
andadc ax,0
, both with the same operand-size as thercl
, and shifting before adding in the carry. Also,dec cl
is 2 bytes; maybe you were thinking ofdec cx
? The one-byte inc/dec opcodes are only for 16 or 32-bit operand-size, not 8-bit. (If you were really optimizing for speed over size, you'd use the slowloop
instruction. But don't do that) -
Peter Cordes about 4 yearsAlso you could make it non-destructive by using
ror di, 1
instead ofshr
. Oh also, you only set CX=8, but DI and AX are 16-bit registers. -
Antonin GAVREL about 4 yearsBy the way I am puzzled by your comment about slow loop, may you elaborate ? I also checked for ror vs shr stackoverflow.com/questions/4976636/…, but what the point to make it non-destructive?
-
Peter Cordes about 4 yearsWhy is the loop instruction slow? Couldn't Intel have implemented it efficiently?. And re: non-destructive: if you used
ror di,1
16 times in a loop, the final value of DI would be the same as the initial. But yourshr
loop leaves DI=0 (if you did 16 iterations). So you have a choice of which one is more useful: a zeroed register or the original value. -
Antonin GAVREL about 4 yearsThank you very interesting, I edited the answer! Also in this very specific situation do you see any other point to have ror over shr since we do not really care of keeping di value?
-
Peter Cordes about 4 yearsHow do you know whether someone using this code wants the original value as well? Sometimes you do want to keep the original for later use when bit-reversing. But anyway, why did you waste an instruction by using
shl
andadc ax,0
separately instead ofadc ax,ax
like I initially recommended? Adding a number to itself is a left-shift by 1, and using ADC shifts in CF. So it's identical to RCL except for avoiding partial-flag stuff (adc writes all the flags). -
Antonin GAVREL about 4 yearsGood point, and sorry I missed that, edited. Also about the loop instruction, it is not worthwhile to use at all it seems?
-
Peter Cordes about 4 yearsAlso, you broke your explanation in the last paragraph: it's shift then adc so the new bit is in the bottom position. And rcl ax, 1 is a 2-byte instruction while
adc ax, 0
is 3. RCL by 1 is often less efficient thanadc ax,ax
, but on some CPUs it's break-even with separate shift and adc. See uops.info/…. On AMD those are all 1-uop instructions so separate shift-and-add are worse than a single RCL by 1. -
Antonin GAVREL about 4 yearsLet us continue this discussion in chat.