Have you heard about fuzzing but are not sure what it is or if you should do it? This guide should quickly get you up to speed on what it’s all about.
What is fuzzing?
Fuzzing is a way of discovering bugs in software by providing randomised inputs to programs to find test cases that cause a crash. Fuzzing your programs can give you a quick view on their overall robustness and help you find and fix critical bugs.
Fuzzing is ultimately a black box technique, requiring no access to source code, but it can still be used against software for which you do have source code, because it will potentially find bugs more quickly and avoid the need to review lots of code. Once a crash is detected, if you have the source code, it should become much easier to fix.
Pros and cons of fuzzing
Fuzzing can be very useful, but it is no silver bullet. Here are some of the pros and cons of fuzzing:
- Can provide results with little effort: once a fuzzer is up and running, it can be left for hours, days or months to look for bugs with no interaction
- Can reveal bugs that were missed in a manual audit
- Provides an overall picture of the robustness of the target software
- Will not find all bugs: fuzzing may miss bugs that do not trigger a full program crash, and may be less likely to trigger bugs that are only triggered in highly specific circumstances
- The crashing test cases that are produced may be difficult to analyse, as the act of fuzzing does not give you much knowledge of how the software operates internally
- Programs with complex inputs can require much more work to produce a smart enough fuzzer to get sufficient code coverage
Smart and dumb fuzzing
Fuzzers provide random input to software. This may be in the form of a network protocol, a file of a certain format or direct user input. The fuzzed input can be completely random with no knowledge of what the expected input should look like, or it can be created to look like valid input with some alterations.
A fuzzer that generates completely random input is known as a “dumb” fuzzer, as it has no built-in intelligence about the program it is fuzzing. A dumb fuzzer requires the smallest amount of work to produce (it could be as simplistic as piping /dev/random into a program). This small amount of work can produce results for very little cost – one of fuzzing’s big advantages.
However, sometimes a program will only perform certain processing if particular aspects of the input are present. For example, a program may accept a “name” field in its input, and this field may have a “name length” associated with it. If these fields are not present in a form that is valid enough for the program to identify, it may never attempt to read the name. However, if these fields are present in a valid form, but the length value is set to the incorrect value, the program may read beyond the buffer containing the name and trigger a crash. Without input that is at least partly valid, this is very unlikely to happen. In these cases, “smart” fuzzers can be used. These are programmed with knowledge of the input format (i.e. a protocol definition or rules for a file format). The fuzzer can then construct mostly valid input and only fuzz parts of the input within that basic format.
The greater the level of intelligence that you build into a fuzzer, the deeper you may be able to go into a protocol or file format’s processing, but the more work you create for yourself. A balance needs to be found between these two extremes. It can be good to begin with a much more dumb fuzzer and increase its intelligence as the code quality of the software you are testing increases. If you get lots of crashes with a simplistic fuzzer, there is no point spending a long time making it more intelligent until the code quality increases to a point where the code requires it.
Types of fuzzer
Broadly speaking, fuzzers can be split into two categories based on how they create input to programs – mutation-based and generation-based. This section details those categories as well as offering a brief description of a more advanced technique called Evolutionary Fuzzing.
Mutation-based fuzzers are arguably one of the easier types of fuzzer to create. This technique suites dumb fuzzing but can be used with more intelligent fuzzers as well. With mutation, samples of valid input are mutated randomly to produce malformed input.
A dumb mutation fuzzer can simply select a valid sample input and alter parts of it randomly. For many programs, this can provide a surprising amount of mileage, as inputs are still often significantly similar enough to a valid input, so that good code coverage can be achieved without the need for further intelligence.
You can build in greater intelligence by allowing the fuzzer to do some level of parsing of the samples to ensure that it only modifies specific parts or that it does not break the overall structure of the input such that it is immediately rejected by the program. Some protocols or file formats will incorporate checksums that will fail if they are modified arbitrarily. A mutation-based fuzzer should usually fix these checksums so that the input is accepted for processing or the only code that will be tested is the checksum validation and nothing else.
Two useful techniques that can be used by mutation-based fuzzers are described below.
A fuzzer can take saved sample inputs and simply replay them after mutating them. This works well for file format fuzzing where a number of sample files can be saved and fuzzed to provide to the target program. Simple or stateless network protocols can also be fuzzed effectively with replay, as the fuzzer will not need to make lots of legitimate requests to get deep into the protocol. For a more complex protocol, replay may be more difficult as the fuzzer may need to respond in a dynamic way to the program to allow processing to continue deep into the protocol, or the protocol may simply be inherently non-replayable.
Man-in-the-Middle or Proxy
You may have heard of Man-in-the-Middle (MITM) as a technique used by penetration testers and hackers, but it can also be used for mutation-based network protocol fuzzing. MITM describes the situation where you place yourself in the middle of a client and server (or two clients in the case of peer-to-peer networking), intercepting and possibly modifying messages passed between them. In this way, you are acting like a proxy between the two. The term MITM is generally used when it is not expected that you will be acting like a proxy, but for our purposes the terms are largely interchangeable.
By setting your fuzzer up as a proxy, it can mutate requests or responses depending on whether you are fuzzing the server or the client. Again, the fuzzer could have no intelligence about the protocol and simply randomly alter some requests and not others, or it could intelligently target requests at the specific level of the protocol in which you are interested.
Proxy-based fuzzing can allow you to take an existing deployment of a networked program and quickly insert a fuzzing layer into it, without needing to make your fuzzer act like a client or server itself.
Generation-based fuzzers actually generate input from scratch rather than mutating existing input. Generation-based fuzzers usually require some level of intelligence in order to construct input that makes at least some sense to the program, although generating completely random data would also technically be generation.
Generation fuzzers often split a protocol or file format into chunks which they can build up in a valid order, and randomly fuzz some of those chunks independently. This can create inputs that preserve their overall structure, but contain inconsistent data within that structure. The granularity of these chunks and the intelligence with which they are constructed define the level of intelligence of the fuzzer.
While mutation-based fuzzing can have a similar effect as generation fuzzing (as, over time, mutations will be randomly applied without completely breaking the input’s structure), generating inputs ensures that this will be so. Generation fuzzing can also get deeper into a protocol more easily, as it can construct valid sequences of inputs applying fuzzing to specific parts of that communication. It also allows the fuzzer to act as a true client/server, generating correct, dynamic responses where these cannot be blindly replayed.
Evolutionary fuzzing is an advanced technique, which we will only briefly describe here. It allows the fuzzer to use feedback from each test case to learn over time the format of the input. For example, by measuring the code coverage of each test case, the fuzzer can heuristically work out which properties of the test case exercise a given area of code, and it can gradually evolve a set of test cases that cover the majority of the program code. Evolutionary fuzzing often relies on other techniques similar to genetic algorithms and may require some form of binary instrumentation to operate correctly.
What are you really fuzzing?
Even for relatively dumb fuzzers, it is important to keep in mind what part of the code your test cases are actually likely to hit. To give a simple example, if you are fuzzing an application protocol that uses TCP/IP and your fuzzer randomly mutates a raw packet capture, you are likely to be corrupting the TCP/IP packets themselves and your input is unlikely to get processed by the application at all. Or, if you were testing an OCR program that parsed images of text into real text, but you were mutating the whole of an image file, you could end up testing its image parsing code more often than the actual OCR code. If you wanted to target that OCR processing specifically, you might wish to keep the headers of the image file valid.
Likewise, you may be generating input that is so random that it does not pass an initial sanity check in the program, or the code contains a checksum that you do not correct. You are then only testing that first branch in the program, never getting deeper into the program code.
Anatomy of a fuzzer
To operate effectively, a fuzzer needs to perform a number of important tasks:
- Generate test cases
- Record test cases or any information needed to reproduce them
- Interface with the target program to provide test cases as input
- Detect crashes
- Fuzzers often split many of these tasks out into separate modules, for example having one library that can mutate data or generate it based on a definition and another to provide test cases to the target program and so on. Below are some notes on each of these tasks.
Test case generation
Generating test cases will vary depending on whether mutation-based or generation-based fuzzing is being employed. With either, there will be something that needs randomly transforming, whether it is a field of a particular type or an arbitrary chunk of data.
These transformations can be completely random, but it is worth remembering that edge and corner cases can often be the source of bugs in programs. As such, you may wish to favour such cases and include values such as:
- Very long or completely blank strings
- Maximum and minimum values for integers on the platform
- Values like -1, 0, 1 and 2
- Depending on what you are fuzzing, there may be specific values or characters that are more likely to trigger bugs. For example:
- Null characters
- New line characters
- Format string values (%n, %s, etc.)
- Application specific keywords
The simplest way to reproduce a test case is to record the exact input used when a crash is detected. However, there are other ways to ensure reproducibility that can be more convenient in certain circumstances.
One way to do this is to store the initial seed used for the random component of test case generation, and ensure that all subsequent random behaviour follows a path that can be traced back to that seed. By re-running the fuzzer with the same seed, the behaviour should be reproducible. For example, you may only record the test case number and the initial seed and then quickly re-execute generation with that seed until you reach the given test case.
This technique can be useful when the target program may accumulate dependencies based on past inputs. Previous inputs may have caused the program to initialise various items in its memory that are required to be present to trigger the bug. In these situations, simply recording the crashing test case would not be sufficient to reproduce the bug.
Interfacing with the target
Interfacing with the target program to provide the fuzzed input is often straightforward. For network protocols, it may simply involve sending the test case over the network, or responding to a client request; for file formats, it may simply mean executing the program with a command line argument pointing to the test case.
However, sometimes the input is provided in a form that is not trivial to generate in an automated way or where scripting the program to execute each test case has a high overhead and proves to be very slow. Creative thinking in these cases can reveal ways to exercise the relevant piece of code with the right data. For example, this may be performed by instrumenting a program in memory artificially to execute a parsing function with the input provided as an argument entirely in memory. This can remove the need for the program to go through a lengthy loading procedure before each test case, and further speed increases could be obtained by having test cases generated and provided completely in memory rather than going via the hard drive.
Crash detection is critical for fuzzing. If you cannot accurately determine when a program has crashed, you will not be able to identify a test case as triggering a bug. There are a number of common ways to approach this:
Attach a debugger
This will provide you with the most accurate results and you can script the debugger to provide you with a crash trace as soon as a crash is detected. However, having a debugger attached can slow programs significantly and this can cause quite an overhead. The fewer test cases you can generate in a given period of time, the fewer chances you have of finding a crash.
See if the process disappears
Rather than attaching a debugger, you can simply see if the process ID of the target still exists on the system after executing the test case. If the process has disappeared, it probably crashed. You can re-run the test case in a debugger later if you want some more information about the crash, and you can even do this automatically for each crash, while still avoiding the slowdown of having the debugger attached for every case.
If the program normally responds to your test cases, you can set a timeout after which you assume the program has either crashed or frozen. This can also detect bugs that cause the program to become unresponsive but not necessarily to terminate.
Whichever method you use, the program should be restarted whenever it crashes or becomes unresponsive, in order to allow fuzzing to continue.
There are a number of things you can do to measure or improve the quality of your fuzzing. While these are all good things to keep in mind, you may not need to bother with them all if you are already getting lots of unique crashes within a useful timeframe.
Possibly one of the most important factors in fuzzing is speed. How many test cases per second/minute can you run? Sensible values will of course depend on the target, but the more test cases you can execute, the more likely you will be to find a crash in a given time period. Fuzzing is random, so every test case is like a lottery ticket, and you want as many of them as you can get.
There are lots of things you can do to increase the speed of your test cases, such as improving the efficiency of your generation or mutation routines, parallelising test cases, decreasing timeouts or running programs in “headless” modes where they do not display a GUI. And of course if you want to, you can simply buy faster kit!
Finding crashes is of course only the start of the process. Once you find a crashing test case, you will need to analyse it, work out what the bug is and either fix it or write an exploit for it depending on your motivation. If you have thousands of crashing test cases, this can be quite daunting.
By categorising crashes you can prioritise them according to which ones are most interesting to you. This can also help you identify when one test case is triggering the same bug as another, so you only keep the cases relating to unique crashes. In order to do this, you will need some automated information about the crash so you can make a decision. Running the test case with the target attached to a debugger can provide a crash trace which you can parse to find values such as the exception type, register values, stack contents and so on.
One tool by Microsoft which can help with this is called, !exploitable (pronounced “bang exploitable”), which works with the Windbg debugger to categorise crashes according to how exploitable it thinks the bug is.
Test case reduction
As fuzzing randomly alters input, it is common for a crashing test case to have multiple alterations which are not relevant to triggering the bug. Test case reduction is the act of narrowing down a test case to the minimum set of alterations from a valid input required to trigger the bug, so that you only need to focus on that part of the input in your analysis.
This reduction can be performed manually, but it can also be performed automatically by the fuzzer. When a crashing test case is encountered, the fuzzer can re-execute the test case several times, gradually reducing the alterations made to the input until the smallest set of changes remains, whilst still triggering the bug. This can simplify your analysis and may also help to categorise crashing test cases as you will know precisely what parts of the input are affected.
Code coverage is a measure of how much of the program’s code has been executed by the fuzzer. The idea is that the more coverage you get, the more of the program you have actually tested. Measuring code coverage can be tricky and often requires binary instrumentation to track which portions of code are being executed. You can also measure code coverage in different ways, such as by line, by basic block, by branch or by code path.
Code coverage is not a perfect measure with regards to fuzzing, as it is possible to execute code without revealing bugs in it, and there are often areas of code that almost never get executed, such as safety error checks that are unlikely to really be needed and are very unlikely to be interesting to us anyway. Nevertheless, some form of code coverage measurement can provide insight into what your fuzzer is triggering within the program, especially when your fuzzing is completely black box and you may not yet know much about the program’s inner workings.
Some tools and technologies that may help with code coverage include Pai Mei, Valgrind, DynamoRIO and DTrace.
There are a number of existing frameworks that allow you to create fuzzers without having to work from scratch. Some of these frameworks are complex and it may still take a while to create a working fuzzer for your target; by contrast, others take a very simple approach. A selection of these frameworks and fuzzers is listed here for your reference:
Radamsa is designed to be easy to use and flexible. It attempts to “just work” for a variety of input types and contains a number of different fuzzing algorithms for mutation.
Sulley provides a comprehensive generation framework, allowing structured data to be represented for generation based fuzzing. It also contains components to help with recording test cases and detecting crashes.
The Peach framework can perform smart fuzzing for file formats and network protocols. It can perform both generation- and mutation-based fuzzing and it contains components to help with modelling and monitoring the target.
SPIKE is a network protocol fuzzer. It requires good knowledge of C to use and is designed to run on Linux.
Grinder is a web browser fuzzer, which also has features to help in managing large numbers of crashes.
NodeFuzz is a nodejs-based harness for web browsers, which includes instrumentation modules to gain further information from the client side.