MIPS Assembly - String (ASCII) Instructions

11,725

Solution 1

ASCII text is not converted to machine code. It is stored via the format found on Wikipedia.

ASCII Code Chart

MIPS uses this format to store ASCII strings. As for .asciiz in particular, it is the string plus the NUL character. So, according to the sheet, A is 41 in hexadecimal, which is just 0100 0001 in binary. But don't forget the NUL character, so: 0100 0001 0000.

When storing the string, I'd take Mars MIPS simulator's idea and just start the memory section at a known address in memory and make any references to the label message set to that location in memory.

Please note that everything in the data section is neither R-type, I-type, nor J-type. It is just raw data.

Solution 2

Data is not executable and should not be converted to machine code. It should be encoded in the proper binary representation of the data type for your target.

Solution 3

As other answers have noted, the ascii contained in a .ascii "string" directive is encoded in it's raw binary format in the data segment of the object file. As to what happens from there, that depends on the binary format the assembler is encoding into. Ordinarily data is not encoded into machine code, however GNU as will happily assemble this:

.text
start:
  .ascii "Hello, world"
  addi $t1, $zero, 0x1
end:

If you disassemble the output in objdump ( I'm using the mips-img-elf toolchain here ) you'll see this:

Disassembly of section .text:

00000000 <message>:
   0:   48656c6c    0x48656c6c
   4:   6f2c2077    0x6f2c2077
   8:   6f726c64    0x6f726c64
   c:   20090001    addi    t1,zero,1

The hexadecimal sequence 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 spells out "Hello, world". I came here while looking for an answer as to why GAS behaves like this. Mars won't assemble the above program, giving an error that data directives can't be used in the text segment Does anyone have any insight here?

Share:
11,725
darksky
Author by

darksky

C, C++, Linux, x86, Python Low latency systems Also: iOS (Objective-C, Cocoa Touch), Ruby, Ruby on Rails, Django, Flask, JavaScript, Java, Bash.

Updated on June 04, 2022

Comments

  • darksky
    darksky almost 2 years

    I am writing an assembler in C for MIPS assembly (so it converts MIPS assembly to machine code).

    Now MIPS has three different instructions: R-Type, I-Type and J-Type. However, in the .data. section, we might have something like message: .asciiz "hello world". In this case, how would we convert an ASCII string into machine code for MIPS?

    Thanks

  • darksky
    darksky over 12 years
    Yes I am aware of that. My .data can only have .word or .asciiz. If it is .word then I just convert the number to its 32-bit representation. But how would you represent .asciiz as a machine code instruction? I need to convert it to machine code. So array: .word 0:10 would create 10 instructions of this: 000000000000000000000000000001010
  • Jason LeBrun
    Jason LeBrun over 12 years
    .asciiz Is not a machine code instruction, it's an assembler directive. It tells the assembler that it should store this data in a certain format in the final binary file. In other words, the assembler is in charge of converting your representation of the data into the correct binary format, and storing it that way in the executable file.
  • Variable Length Coder
    Variable Length Coder over 12 years
    You would not represent .asciiz as machine code instructions. Assuming you're implementing a fairly standard ABI, you would store it as a sequence of bytes, each byte containing the ASCII value of one letter, followed by a NUL terminator.
  • darksky
    darksky over 12 years
    Ah right. I'm sorry for using "machine code instruction". I meant a "sequence of bytes". Thank you.
  • Peter Cordes
    Peter Cordes over 5 years
    Insight into what? Why MARS has training wheels on its assembler, and doesn't let you assemble arbitrary bytes where the asm source asks for them? Normally you'd put strings in .section .rodata, where they'll be linked as part of the text segment, but putting them in the text section somewhere they won't be executed is totally fine. Or manually encoding an instruction with .byte 0x20, 0x09, 0x00, 0x01 or something. (Usually no reason to do so, but you can if you want.)
  • Peter Cordes
    Peter Cordes over 5 years
    But if you don't understand what you're doing, it's easy to put data where execution will fall into it, and that can be confusing for beginners, hence the training wheels / nerf padding in MARS. I think its emulator/simulator does run your program from MIPS machine code, though, so I don't think MARS is "assembling" straight into emulator internals, and restricting the .text section to only asm instructions it can parse from text.
  • Peter Cordes
    Peter Cordes over 5 years
    And assuming the input character set is also ASCII (or maybe UTF-8), the assembler should thus simply copy the bytes from the source to the output file (at the current output position), up to the end of quoted string. Although you do need to process C-style escape sequences like \n = 0xa (LF = linefeed).
  • ajxs
    ajxs over 5 years
    Hi Peter, thanks for your response. I'm not concerned with the behavior of MARS here, I was interested into why GAS allows you to use .ascii directives to encode raw bytes into the .text section. I think you've just answered that question though in saying that you can place arbitrary binary data wherever you like using these directives, potentially for manually encoding an instruction. I looked again and yes, objdump will attempt to interpret the binary data you inserted with these directives as instructions.
  • Peter Cordes
    Peter Cordes over 5 years
    Right, objdump can't tell how the bytes got there. It's all just bytes in the assembler's output file. MARS is definitely the exception; most assemblers are like GAS and will happily assemble a line of asm source into bytes in whatever the current section is. It's up to the programmer to make sure that's useful.
  • ajxs
    ajxs over 5 years
    Thank you for clearing this up. This way of looking at things makes perfect sense, and explains why GAS allows this.