Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain and Mohammad Amin Alipour
Department of Computer Science
University of Houston

5th International Workshop on the Next Level of Test Automation (NEXTA) 2022
Co-located with the 15th IEEE International Conference on Software Testing, Veriﬁcation and Validation (ICST) 2022

Removing Uninteresting Bytes in
Software Fuzzing

Software Fuzzing?

Software Fuzzing?
A Random Software Testing technique.

Found bugs in many software packages.
Regularly used in large software companies.
Popular Fuzzers – AFL, AFL++, honggfuzz, libfuzzer

Software Fuzzing
Input Seed
P
Execute
Program P
Coverage Guided
Fuzzer
Mutated
Seeds
Interesting
Seeds

Software Fuzzing
Input Seed
P
Execute
Program P
Coverage Guided
Fuzzer
Mutated
Seeds
Interesting
Seeds
Data on discovered paths, branches, lines, crashes.

Software Fuzzing
Input Seed
P
Execute
Program P
Coverage Guided
Fuzzer
Mutated
Seeds
Data on discovered paths, branches, lines, crashes.

Fuzzers try to increase the speed of this discovery.
Interesting
Seeds

The Problem?

Input Seed
The Problem

Input Seed
Large number of
uninteresting bytes, which
do not contribute to new
program behaviour when
they are changed.

Example: payload data in a
network protocol
The Problem

The Problem
Mutating a lot of uninteresting bytes in a large input seed wastes
fuzzing resources and slows down exploration.

Thus, the seeds we use in fuzzing have a signiﬁcant impact on
fuzzing performance.

Previous Strategies

Previous Strategies for improving Fuzzer Performance
Structure-aware fuzzing. Use grammar speciﬁcations to generate meaningful
input seeds e.g. Peach, LangFuzz, CSmith, or learn structure of the input while
fuzzing to do so, e.g. Grimoire 2019, Skyﬁre 2017.

Smart Mutation Scheduling. Probabilistically schedule mutation operators
based on heuristics, e.g. MOPT [25].

Seed Selection. Use diﬀerent strategies to discard seeds from a
corpus, e.g. by SAT solving seed execution traces for redundant seeds
OPTIMIN [5], and discarding seeds by execution time and ﬁle size, e.g.
MINSET [Rebert et al. USENIX Security 2014].

Previous Strategies for improving Fuzzer Performance
Structure-aware fuzzing. Use grammar speciﬁcations to generate meaningful
input seeds e.g. Peach, LangFuzz, CSmith, or learn structure of the input while
fuzzing to do so, e.g. Grimoire 2019, Skyﬁre 2017.

Smart Mutation Scheduling. Probabilistically schedule mutation operators
based on heuristics, e.g. MOPT [Lyu et al. USENIX Security 2019].

Seed Selection. Use diﬀerent strategies to discard seeds from a
corpus, e.g. by SAT solving seed execution traces for redundant seeds
OPTIMIN [5], and discarding seeds by execution time and ﬁle size, e.g.
MINSET [Rebert et al. USENIX Security 2014].

Previous Strategies for improving Fuzzer Performance
Structure-aware fuzzing. Use grammar speciﬁcations to generate meaningful
input seeds e.g. Peach, LangFuzz, CSmith, or learn structure of the input while
fuzzing to do so, e.g. Grimoire 2019, Skyﬁre 2017.

Smart Mutation Scheduling. Probabilistically schedule mutation operators
based on heuristics, e.g. MOPT [Lyu et al. USENIX Security 2019].

Seed Selection. Use diﬀerent strategies to discard seeds from a corpus, e.g. by
SAT solving seed execution traces for redundant seeds (OPTIMIN [Herrera et al.
ISSTA 2021]), and discarding seeds by execution time and ﬁle size, e.g.
MINSET [Rebert et al. USENIX Security 2014].

Can removing uninteresting bytes
from seeds improve fuzzing
performance?

DIAR

DIAR
Reduces the size of a seed by removing uninteresting bytes, while
maintaining a certain level of coverage.

Customizable – allows users to set a time budget for reduction and set
the maximum allowable loss of test coverage.

The Core of DIAR:

Delta Debugging

DIAR uses delta debugging (DD) [Zeller, Hildebrandt TSE 2002]:

In DIAR, we modify DD to perform inadequate reduction [Alipour et al.
ASE 2016], where the reduced seed preserves a % of the coverage of
the original seed, set by the Coverage Similarity constraint, along with
some other constraints.

The Core of DIAR
DD Module
O
P
Program
Original
seed R
Reduced seed
(Preserves coverage of the
Original Seed in P)

The Core of DIAR | Delta Debugging
Say we want to reduce S0,
Coverage Similarity Constraint = x%
S0
Covers a set
of lines L

The Core of DIAR | Delta Debugging
S0
Covers a set
of lines L
Covers > x% of L
Say we want to reduce S0,
Coverage Similarity Constraint = x%
S1

The Core of DIAR | Delta Debugging
S0
Covers a set
of lines L
Covers > x% of L
Say we want to reduce S0,
Coverage Similarity Constraint = x%
S1
S2
Covers < x% of L

The Core of DIAR | Delta Debugging
S0
Covers a set
of lines L
Covers > x% of L
Say we want to reduce S0,
Coverage Similarity Constraint = x%
S1
S2
Covers < x% of L
X

The Core of DIAR | Delta Debugging
S0
R
Covers a set
of lines L
Covers > x% of L
Say we want to reduce S0,
Coverage Similarity Constraint = x%
S1
S2
Covers < x% of L
X
Covers > x% of L
A 1-minimal seed
(reducing it any further will cause
it lose the property constraint)

The Core of DIAR | Delta Debugging
(A modiﬁed version of the Delta Debugging algorithm in Zeller and Hilderbrant, “Simplifying and Isolating
Failure-Inducing Inputs”, TSE 28(2), 2002.)

DIAR Architecture

O
T
Target binary
Original seed
DIAR Architecture
DIAR

O O
T
Target binary
Original seed
.
.
.
Delta Debugging
Component
1. Generate
chunks
Chunks (or deltas)
DIAR Architecture
DIAR

O O
T
Target binary
Original seed
.
.
.
Delta Debugging
Component
1. Generate
chunks
Chunks (or deltas)
User-deﬁned
Constraints
DIAR Architecture
DIAR

O O
T
Target binary
Original seed
T.
.
.
Delta Debugging
Component
1. Generate
chunks
2. Execute
generated seed
on target
Generated
seeds
Chunks (or deltas)
User-deﬁned
Constraints
DIAR Architecture
DIAR

O O
T
Target binary
Original seed
T.
.
.
Delta Debugging
Component
1. Generate
chunks
2. Execute
generated seed
on target
3. Analyze generated
coverage data
Generated
seeds
Chunks (or deltas)
User-deﬁned
Constraints
DIAR Architecture
DIAR

DIAR Architecture
O O
T
Target binary
Original seed
T.
.
.
Delta Debugging
Component
1. Generate
chunks
2. Execute
generated seed
on target
3. Analyze generated
coverage data
Generated
seeds
R
4. Returns a 1-minimal seed
satisfying user-deﬁned
constraints.
Chunks (or deltas)
User-deﬁned
Constraints
DIAR

Experiments

Experimental Setup
Test Subjects

xmllint (from libxml2)
readelf (from binutils)

Fuzzing Campaigns

Each test subject was fuzzed with AFL for 24 hours with the original seed and the reduced seed.

Each fuzzing campaign was repeated 3 times.

In total, 6 fuzzing sessions were performed for each test subject.

Path Coverage
Results
AFL found signiﬁcantly more paths with the DIAR reduced seed (almost four times as many) with readelf.

Fuzzing with the original seed seemed to suﬀer more stagnation.

Seed Reduction
Results
DIAR can reduce a seed by more than 80% in under 5 minutes, retaining at least
75% coverage similarity with the coverage of the original seed.

Results
Lines and Branches
Covered

Results
Lines and Branches
Covered
Fuzzing campaigns with the reduced seed covered more lines and branches for readelf.

Crashes Found
Results

Crashes Found
Results
Fuzzing campaigns with the reduced seed found more crashes for readelf.

Seed Reduction
Results
The readelf seed has a signiﬁcantly high number of uninteresting bytes, e.g. bytes
in data/text section in the ELF binary seed, that were removed.

Seed Reduction
Results
The xmllint seed had a relatively lesser number of uninteresting bytes, and also
there may have been a loss of some important bytes in the inadequate reduction
process.

Takeaways

More crashes found.
Fuzzing with the DIAR-reduced seed found more crashes (with readelf).

More paths discovered.
Fuzzing with a DIAR-reduced seed yielded signiﬁcantly higher number
of paths (with readelf).

The more the uninteresting bytes, the better.
A seed that has more uninteresting bytes seems to beneﬁt more from DIAR.

Preserve 100% coverage for small seeds.
When seeds are very small, adequate reduction may be used instead. t

.
Takeaways from the Experiments

More crashes found.
Fuzzing with the DIAR-reduced seed found more crashes (with readelf).

More paths discovered.
Fuzzing with a DIAR-reduced seed yielded signiﬁcantly higher number
of paths (with readelf).

The more the uninteresting bytes, the better.
A seed that has more uninteresting bytes seems to beneﬁt more from DIAR.

Preserve 100% coverage for small seeds.
When seeds are very small, adequate reduction may be used instead. t

Takeaways from the Experiments

More crashes found.
Fuzzing with the DIAR-reduced seed found more crashes (with readelf).

More paths discovered.
Fuzzing with a DIAR-reduced seed yielded signiﬁcantly higher number
of paths (with readelf).

The more uninteresting bytes, the better.
A seed that has more uninteresting bytes seems to beneﬁt more from DIAR.

Preserve 100% coverage for small seeds.
When seeds are very small, adequate reduction may be used instead. t

Takeaways from the Experiments

Concluding Remarks

DIAR Architecture

Input Seed
Large number of uninteresting
bytes, which do not contribute
to new program behaviour
when they are changed.

Example: payload data in a
network protocol
DIAR Architecture

Crashes Found Paths Found
Input Seed
Large number of uninteresting
bytes, which do not contribute
to new program behaviour
when they are changed.

Example: payload data in a
network protocol
DIAR Architecture

Crashes Found Paths Found
Input Seed
Large number of uninteresting
bytes, which do not contribute
to new program behaviour
when they are changed.

Example: payload data in a
network protocol
DIAR Architecture
Thank You

Removing Uninteresting Bytes in Software Fuzzing

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Removing Uninteresting Bytes in Software Fuzzing

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx