Armed with a text editor

mu's views on program and recipe! design

Python format string vulnerabilities (1) Posted 2005.11.30 13:45 PST (#)

One doesn't normally think of python as vulnerable to format string attacks. And it's not, at least in the security sense. But after getting odd reports of failures adding tags to MP3 files with Quod Libet, particularly one that pointed me straight at mmap, and then tracing it to a format string problem in mmap_move_method (already reported as and, I was concerned.

The iii:move on mmap_move_method line 5 means to pull three values from the tuple, and store them as integers in the variables passed into the variadic scanf-style function. Written and thoroughly tested on your standard 32-bit platform, all is well. However since these variables are really longs, there's a problem. On the rising AMD64 platform, a long is eight bytes, no longer the same size as a four byte int.

Furthermore, thanks to uninitialized local storage, chances are really good that /dest/, /src/, and /count/ start out as some weird value like 0xFEDCBA0987654321 instead of a pretty 0x0000000000000000. Upon the successful parse writing an int to the long's storage, we now have something like 0xFEDCBA090000000A when we wanted 0xA. And that's on a little-endian system; on a big-endian we'd end up with the equally absurd 0x0000000A87654321.

There's one bright light to this story: while python code is more idiomatically one of exception catching, its C implementation is necessarily Look Before You Leap. In this case the looking catches the absurd scenario (lines 10-14) before it causes any harm in the memmove (line 18), so there is no segfault. Unfortunately since my code wasn't expecting this error, it exited unexpectedly and left the files it was modifying in a corrupted state. I have since fixed my calling code to work around this python bug, but I'm hoping for a day when I no longer need this code.

This is the backstory. Next time I'll talk about what I did after finding this bad format string.


Joe Wreschnig @ 2005.11.30 19:23:

Actually, there are two things called "format string vulnerabilities". The first is trusting user input to be part of the format string itself, the classic printf(input, ...). Python's not vulnerable to that because you get a ValueError rather than stack corruption.

The second kind is not properly sanitizing input for things like os.system("ls %s" % input), which is still a problem in any language.

You've discovered a third kind, which probably only languages like Python are "vulnerable" to - I don't doubt someone could create memory corruption though misuse of this, if Python had fewer sanity checks on the C end.

Michael Urman @ 2005.11.30 19:23:

The separation of format and print methods is more important to Python's limited vulnerability to the first. If the base print had the formatting built in, we'd see a lot more ValueErrors.