Is it possible to store the address of a label in a variable and use goto to jump to it?

36,734

Solution 1

The C and C++ standards do not support this feature. However, the GNU Compiler Collection (GCC) includes a non-standard extension for doing this as described in this article. Essentially, they have added a special operator "&&" that reports the address of the label as type "void*". See the article for details.

P.S. In other words, just use "&&" instead of "&" in your example, and it will work on GCC.
P.P.S. I know you don't want me to say it, but I'll say it anyway,... DON'T DO THAT!!!

Solution 2

I know the feeling then everybody says it shouldn't be done; it just has to be done. In GNU C use &&the_label; to take the address of a label. (https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) The syntax you guessed, goto *ptr on a void*, is actually what GNU C uses.

Or if you want to use inline assembly for some reason, here's how to do it with GNU C asm goto

// unsafe: this needs to use  asm goto so the compiler knows
// execution might not come out the other side
#define unsafe_jumpto(a) asm("jmp *%0"::"r"(a):)

// target pointer, possible targets
#define jumpto(a, ...) asm goto("jmp *%0" : : "r"(a) : : __VA_ARGS__)

int main (void)
{
  int i=1;
  void* the_label_pointer;

  the_label:
  the_label_pointer = &&the_label;

label2:

  if( i-- )
    jumpto(the_label_pointer, the_label, label2, label3);

label3:
  return 0;
}

The list of labels must include every possible value for the_label_pointer.

The macro expansion will be something like

asm goto("jmp *%0" : : "ri"(the_label_pointer) : : the_label, label2, label3);

This compiles with gcc 4.5 and later, and with the latest clang which just got asm goto support some time after clang 8.0. https://godbolt.org/z/BzhckE. The resulting asm looks like this for GCC9.1, which optimized away the "loop" of i=i / i-- and just put the the_label after the jumpto. So it still runs exactly once, like in the C source.

# gcc9.1 -O3 -fpie
main:
    leaq    .L2(%rip), %rax     # ptr = &&label
    jmp *%rax                     # from inline asm
.L2:
    xorl    %eax, %eax          # return 0
    ret

But clang didn't do that optimization and still has the loop:

# clang -O3 -fpie
main:
    movl    $1, %eax
    leaq    .Ltmp1(%rip), %rcx
.Ltmp1:                                 # Block address taken
    subl    $1, %eax
    jb      .LBB0_4                  # jump over the JMP if i was < 1 (unsigned) before SUB.  i.e. skip the backwards jump if i wrapped
    jmpq    *%rcx                   # from inline asm
.LBB0_4:
    xorl    %eax, %eax              # return 0
    retq

The label address operator && will only work with gcc. And obviously the jumpto assembly macro needs to be implemented specifically for each processor (this one works with both 32 and 64 bit x86).

Also keep in mind that (without asm goto) there would be no guarantee that the state of the stack is the same at two different points in the same function. And at least with some optimization turned on it's possible that the compiler assumes some registers to contain some value at the point after the label. These kind of things can easily get screwed up then doing crazy shit the compiler doesn't expect. Be sure to proof read the compiled code.

These are why asm goto is necessary to make it safe by letting the compiler know where you will / might jump, getting consistent code-gen for the jump and the destination.

Solution 3

You can do something similar with setjmp/longjmp.

int main (void)
{
    jmp_buf buf;
    int i=1;

    // this acts sort of like a dynamic label
    setjmp(buf);

    if( i-- )
        // and this effectively does a goto to the dynamic label
        longjmp(buf, 1);

    return 0;
}

Solution 4

According to the C99 standard, § 6.8.6, the syntax for a goto is:

    goto identifier ;

So, even if you could take the address of a label, you couldn't use it with goto.

You could combine a goto with a switch, which is like a computed goto, for a similar effect:

int foo() {
    static int i=0;
    return i++;
}

int main(void) {
    enum {
        skip=-1,
        run,
        jump,
        scamper
    } label = skip; 

#define STATE(lbl) case lbl: puts(#lbl); break
    computeGoto:
    switch (label) {
    case skip: break;
        STATE(run);
        STATE(jump);
        STATE(scamper);
    default:
        printf("Unknown state: %d\n", label);
        exit(0);
    }
#undef STATE
    label = foo();
    goto computeGoto;
}

If you use this for anything other than an obfuscated C contest, I will hunt you down and hurt you.

Solution 5

In the very very very old version of C language (think of the time dinosaurs roamed the Earth), known as "C Reference Manual" version (which refers to a document written by Dennis Ritchie), labels formally had type "array of int" (strange, but true), meaning that you could declare an int * variable

int *target;

and assign the address of label to that variable

target = label; /* where `label` is some label */

Later you could use that variable as the operand of goto statement

goto target; /* jumps to label `label` */

However, in ANSI C this feature was thrown out. In the standard modern C you cannot take address of a label and you cannot do "parametrized" goto. This behavior is supposed to be simulated with switch statements, pointers-to-functions and other methods etc. Actually, even "C Reference Manual" itself said that "Label variables are a bad idea in general; the switch statement makes them almost always unnecessary" (see "14.4 Labels").

Share:
36,734
CanadianGirl827x
Author by

CanadianGirl827x

Updated on July 05, 2022

Comments

  • CanadianGirl827x
    CanadianGirl827x almost 2 years

    I know everyone hates gotos. In my code, for reasons I have considered and am comfortable with, they provide an effective solution (ie I'm not looking for "don't do that" as an answer, I understand your reservations, and understand why I am using them anyway).

    So far they have been fantastic, but I want to expand the functionality in such a way that requires me to essentially be able to store pointers to the labels, then go to them later.

    If this code worked, it would represent the type of functionality that I need. But it doesn't work, and 30 min of googling hasn't revealed anything. Does anyone have any ideas?

    int main (void)
    {
      int i=1;
      void* the_label_pointer;
    
      the_label:
    
      the_label_pointer = &the_label;
    
      if( i-- )
        goto *the_label_pointer;
    
      return 0;
    }
    
  • mrduclaw
    mrduclaw over 14 years
    +1 for just doing it in assembly, that's how I solved a similar issue previously.
  • RickNZ
    RickNZ over 14 years
    Just a caution that setjmp/longjmp can be slow, since they save and restore much more than just the program counter.
  • Ahmed
    Ahmed over 14 years
    What is the difference between puts(#lbl) and puts(lbl)?
  • outis
    outis over 14 years
    The # is the preprocessor stringizing operator (en.wikipedia.org/wiki/C_preprocessor#Quoting_macro_argument‌​s). It converts identifiers into strings. puts(lbl) won't compile because lbl isn't a char *.
  • outis
    outis over 14 years
    Rather, it will compile with warnings and crash if you run it.
  • EvilTeach
    EvilTeach about 14 years
    +1 for evil thinking and use of macros above and beyond the call of duty.
  • sam hocevar
    sam hocevar over 12 years
    There is no guarantee that the switch/case will be implemented as a computed goto. Quite often it is compiled as if it was a series of if/else if/else if/... and the generated assembly will test for each value rather than compute a single address to jump to.
  • Brian Campbell
    Brian Campbell over 12 years
    @SamHocevar Sure, you can't depend on how it will be implemented (though cases like this, in which you are using a small range with no holes, are much more likely to be optimized this way). But despite whether the optimization is applied, it is semantically equivalent to a goto that is conditional on the value that you pass in, due to the fall-through behavior. The behavior is the same, the implementation only effects the performance. And it seems to be a relevant answer to the OP's question, since he's looking to build a state machine using gotos, for which switch would do the trick.
  • Justin Dennahower
    Justin Dennahower over 10 years
    goto label address is great for writing an interpreter.
  • Calmarius
    Calmarius over 9 years
    Can't you just lea eax, label; mov label_ptr, eax (intel syntax), to store the pointer in a variable?
  • Fabel
    Fabel over 9 years
    There is no doubt it can be implemented in assembly (which maybe could be considered better in this case). One benefit of implementing it in C is that the compiler do some optimizations.
  • Dwayne Robinson
    Dwayne Robinson over 9 years
    I'd like to know why in the world they used double ampersands (logical and), when the existing get-the-address-of-an-identifier '&' would have made the most sense. The only reason why I can figure is that label identifiers appear to exist in a parallel but separate scope as variable identifiers, and thus there could be ambiguity between getting the address of a label vs variable if both were named the same (arguably though that's just bad practice to declare an int foo and foo: in the same function). If this ever gets into the standard, I'd hope for '&', not '&&'.
  • kungfooman
    kungfooman about 8 years
    One of the best answers here, thanks very much, helped me out in a reverse engineering project.
  • Pietro Braione
    Pietro Braione over 7 years
    Note that this is not standard C++, rather an extension provided by the GNU C++ compiler (see gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/…). Clang also has this extension, while Visual C++ does not (see stackoverflow.com/questions/6421433/address-of-labels-msvc).
  • Kariddi
    Kariddi over 6 years
    Totally do it. If you are writing an interpreter loop that's the way to do it.
  • chqrlie
    chqrlie almost 5 years
    Your implementation of Duff's device is broken: the case 0: should be moved to the end of the do body and followed by an empty statement. As coded, sending 0 bytes incorrectly sends 8 bytes.
  • chqrlie
    chqrlie almost 5 years
    This does not work: depending on whether i is stored in a register or on the stack, its original value (1) will be restored by longjmp() or not, hence potentially causing an infinite loop.
  • HelloWorld
    HelloWorld over 4 years
    The benefit of an address label is also having access to the stack, not just the (faster) function call. But indeed might be one of the few solutions for MSVC
  • Peter Cordes
    Peter Cordes about 2 years
    Oh interesting, so the GNU C labels as values extension is just reintroducing a historical C feature, with somewhat different syntax (void *target = &&label and goto *target).
  • glades
    glades about 2 years
    I think this is actually a very useful feature for border cases where you can't do infinite recursion because it would blow your stackframe and you need to track context without branching everytime before you jump. Sad that its only implemented in gcc :(
  • Fabel
    Fabel about 2 years
    @glades The same thing can be achieved with a switch statement, since the labels need to belong to a predefined set anyway. If you place all functions in one switch you can both call and goto any any label in perfectly portable C. Yes, case labels can go anywere iin the code, even inside if blocks of whatever. (This is true for the answer rewritten by Peter Cordes, my original answer allowed jumping between code in different object files in a less limited and less secure way.)
  • glades
    glades about 2 years
    @Fabel I'm considering that but how would you do it if your code jumps into a label from multiple places and then has to return to the section it jumped from? It can't be a function for some reason, how would you do that with a switch statement?
  • Fabel
    Fabel about 2 years
    @glades Let the function have two arguments: the variable used in the switch statement and a pointer to a struct containing the actual arguments for the "function" (which is just one of the cases). This way the single function can be called recursively just like if it's a different function. If the "functions" need to return different kinds of values the struct can be used for that too. Of course each "function" can use a different struct (or a union if preferred). It's perfectly safe since the caller and the "function" agree on it.
  • glades
    glades about 2 years
    @Fabel: That would be a possibility if I could use functions. The problem is that within switch statements I need to recursively call another code section that might itself call this code section again. As nobody knows how many times this will happen I run the risk of a stack overflow.
  • glades
    glades about 2 years
    @chqrlie I guess OP copied the example from wikipedia where it's stated that "This code assumes that initial count > 0." On another note I don't think this kind of loop unrolling makes sense now as the compiler will unroll the loop if it makes sense and even if it doesn't ALU pipelining will forward calculate the exit conditions of the loop for many iterations so that this kind of manual trickery is irrelevant on modern processors.
  • glades
    glades about 2 years
    @PietroBraione It should be in the C standard, it makes sense in some edge cases when you don't want to dive down to assembly just for doing that and for portability reasons.
  • Fabel
    Fabel almost 2 years
    @glades You can both jump to another case label in the switch statement (as a common state machine), but then not return to the previous state if you haven't saved it in some way. Or you can call the single function recursively and be able to return (as with a normal function) but risk a stack overflow. Those methods can be mixed safely. And you can store a previous state in any way you like (like with a pointer to a label). I fail to see any limitations, except for the finite set of states/functions/labels (you can not add additional "states" in another object file and jump between).