Armed with a text editor

mu's views on program and recipe! design

December 2005

Python format string vulnerabilities (2) Posted 2005.12.01 21:41 PST

After finding Python's mmap implementation fails on 64-bit systems due to a simple format string problem, I was distressed. This is code that works on the systems it was developed on. It's code that compiles without error on systems it fails on. It's due to a subtle format string inaccuracy that doesn't even receive the help of the compiler checking for invalid association. Yet the consequences are anything but subtle.

Unit tests are a great way to catch these sorts of bugs. If the mmap module had been developed with Test First Programming, chances are very good that this behavior would have been covered with a test and caught on the first supported 64-bit platform with mmap. I hope when Python fixes the mmap bug that my sample reproduction code is turned into a unit test. Then if things blow up on some future 128-bit system we'll find out sooner than a bug report.

But that's not enough. I want my programs to be bullet-proof. I want something I write in Python and test on 32-bit systems to work on 64-bit systems. Always. Or at least to blow up early at import. To proactively ensure this I crafted and now present one ugly program to check format strings in PyArg_ParseTuple, PyArg_ParseTupleAndKeywords, or (poorly) Py_BuildValue. The results (below) are surprisingly comforting.

There are very few errors that are obviously in this 32-bit/64-bit clash. There are a lot of unclear cases. Is a size_t the size of an int or a long? Is it always that size, or should it be transferred through a variable of known size such as an unsigned long? There are also plenty of false mismatches from my format checker, but since the numbers aren't insanely large it's good to err in that direction.

What is the best way to approach Python parsing format strings in the future? I offer two alternatives, and seek others. Do you think the current code needs to change?

Format Check Results

When both unknown and unsigned-mismatches are filtered, these are the results for trunk/Python and trunk/Modules as of r39818. Note bltinmodule is the only pure long vs int conflict outside of mmapmodule (although the size_t vs int/long goes one way in mmapmodule and another in bsddbmodule). The conflict in bltinmodule isn't currently a problem, as the parsed value is not used—it appears to exist solely for cleaner error reporting. Anything too weird is a limitation of my checker.

mactoolboxglue.c:143                      `err' type `OSErr' >< i: int
pythonrun.c:864                   `filename' type `PyObject' >< z: char
pythonrun.c:864                            `text' type `int' >< z: char
bltinmodule.c:1910                     `reverse' type `long' >< i: int
socketmodule.c:1590                `buflen' type `socklen_t' >< i: int
socketmodule.c:3071                     `fd' type `SOCKET_T' >< i: int
mmapmodule.c:465                        `size' type `size_t' >< l: long
mmapmodule.c:538                 `dest' type `unsigned long' >< i: int
mmapmodule.c:538                  `src' type `unsigned long' >< i: int
mmapmodule.c:538                `count' type `unsigned long' >< i: int
mmapmodule.c:867                 `access' type `access_mode' >< i: int
mmapmodule.c:955                 `access' type `access_mode' >< i: int
xxmodule.c:207                               `b' type `long' >< #: int
_bsddb.c:2583                       `flags' type `u_int32_t' >< i: int
_bsddb.c:3751                       `flags' type `u_int32_t' >< i: int
_bsddb.c:3778                       `flags' type `u_int32_t' >< i: int
_bsddb.c:3802                       `flags' type `u_int32_t' >< i: int
_bsddb.c:3825                     `timeout' type `u_int32_t' >< i: int
_bsddb.c:3825                       `flags' type `u_int32_t' >< i: int
_bsddb.c:4256                       `flags' type `u_int32_t' >< i: int
_bsddb.c:4371                       `flags' type `u_int32_t' >< i: int
operator.c:23              `a1' type `PyObject *a1; int a2;' >< O: PyObject
operator.c:23              `a2' type `PyObject *a1; int a2;' >< i: int
zlibmodule.c:126                         `input' type `Byte' >< s: char
zlibmodule.c:203                         `input' type `Byte' >< s: char
zlibmodule.c:399                         `input' type `Byte' >< s: char
zlibmodule.c:469                         `input' type `Byte' >< s: char
zlibmodule.c:781                           `buf' type `Byte' >< s: char
zlibmodule.c:781                   `adler32val' type `uLong' >< l: long
zlibmodule.c:799                           `buf' type `Byte' >< s: char
zlibmodule.c:799                     `crc32val' type `uLong' >< l: long
bsddbmodule.c:473                     `recno' type `recno_t' >< i: int
bsddbmodule.c:512                     `recno' type `recno_t' >< i: int
bsddbmodule.c:794                     `reclen' type `size_t' >< i: int
clmodule.c:886                             `x' type `int x;' >< i: int
_cursesmodule.c:379                     `attr' type `attr_t' >< l: long
_cursesmodule.c:388                     `attr' type `attr_t' >< l: long
_cursesmodule.c:426                     `attr' type `attr_t' >< l: long
_cursesmodule.c:436                     `attr' type `attr_t' >< l: long
_cursesmodule.c:472                     `attr' type `attr_t' >< l: long
_cursesmodule.c:482                     `attr' type `attr_t' >< l: long
_cursesmodule.c:517                     `attr' type `attr_t' >< l: long
_cursesmodule.c:546                     `attr' type `attr_t' >< l: long
_cursesmodule.c:603                      `ch1' type `chtype' >< l: long
_cursesmodule.c:603                      `ch2' type `chtype' >< l: long
_cursesmodule.c:692                     `attr' type `attr_t' >< l: long
_cursesmodule.c:863                     `attr' type `attr_t' >< l: long
_cursesmodule.c:872                     `attr' type `attr_t' >< l: long
_cursesmodule.c:907                     `attr' type `attr_t' >< l: long
_cursesmodule.c:916                     `attr' type `attr_t' >< l: long
_cursesmodule.c:1010                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1020                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1056                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1066                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1390                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1399                    `attr' type `attr_t' >< l: long
_cursesmodule.c:1677      `(int *) &event.bstate' type `int' >< l: long
fcntlmodule.c:174                          `arg' type `char' >< i: int
skipping ../python/trunk/Modules/_hashopenssl.c:405 pop from empty list

See also unfiltered ParseTuple and unfiltered BuildValue logs.

(0 Comments ) (0 Trackbacks)