A tool that automatically creates fuzzing harnesses based on a library

Overview

AutoHarness

Created by Akshat Parikh

What is this tool?

AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pieces of code that can be embedded fairly deep into the library. It is very hard or sometimes even impossible for smart fuzzers to reach that codepath. Even for large fuzzing projects such as oss-fuzz, there are still parts of the codebase that are not covered in fuzzing. Hence, this program tries to alleviate this problem in some capacity as well as provide a tool that security researchers can use to initially test a code base. This program only supports code bases which are coded in C and C++.

Setup/Demonstration

This program utilizes llvm and clang for libfuzzer, Codeql for finding functions, and python for the general program. This program was tested on Ubuntu 20.04 with llvm 12 and python 3. Here is the initial setup.

sudo apt-get update;
sudo apt-get install python3 python3-pip llvm-12* clang-12 git;
pip3 install pandas lief subprocess os argparse ast;

Follow the installation procedure for Codeql on https://github.com/github/codeql. Make sure to install the CLI tools and the libraries. For my testing, I have stored both the tools and libraries under one folder. Finally, clone this repository or download a release. Here is the program's output after running on nginx with the multiple argument mode set. This is the command I used.

python3 harness.py -L /home/akshat/nginx-1.21.0/objs/ -C /home/akshat/codeql-h/ -M 1 -O /home/akshat/autoharness/ -D nginx -G 1 -Y 1 -F "-I /home/akshat/nginx-1.21.0/objs -I /home/akshat/nginx-1.21.0/src/core -I /home/akshat/nginx-1.21.0/src/event -I /home/akshat/nginx-1.21.0/src/http -I /home/akshat/nginx-1.21.0/src/mail -I /home/akshat/nginx-1.21.0/src/misc -I /home/akshat/nginx-1.21.0/src/os -I /home/akshat/nginx-1.21.0/src/stream -I /home/akshat/nginx-1.21.0/src/os/unix" -X ngx_config.h,ngx_core.h

Results: image It is definitely possible to raise the success by further debugging the compilation and adding more header files and more. Note the nginx project does not have any shared objects after compiling. However, this program does have a feature that can convert PIE executables into shared libraries.

Planned Features (in order of progress)

  1. Struct Fuzzing

The current way implemented in the program to fuzz functions with multiple arguments is by using fuzzing data provider. There are some improvements to make in this integration; however, I believe I can incorporate this feature with data structures. A problem which I come across when coding this is with codeql and nested structs. It becomes especially hard without writing multiple queries which vary for every function. In short, this feature needs more work. I was also thinking about a simple solution using protobufs.

  1. Implementation Based Harness Creation

Using codeql, it is possible to use to generate a control flow graph that maps how the parameters in a function are initialized. Using that information, we can create a better harness. Another way is to look for implementations for the function that exist in the library and use that information to make an educated guess on an implementation of the function as a harness. The problems I currently have with this are generating the control flow graphs with codeql.

  1. Parallelized fuzzing/False Positive Detection

I can create a simple program that runs all the harnesses and picks up on any of the common false positives using ASAN. Also, I can create a new interface that runs all the harnesses at once and displays their statistics.

Contribution/Bugs

If you find any bugs with this program, please create an issue. I will try to come up with a fix. Also, if you have any ideas on any new features or how to implement performance upgrades or the current planned features, please create a pull request or an issue with the tag (contribution).

PSA

This tool generates some false positives. Please first analyze the crashes and see if it is valid bug or if it is just an implementation bug. Also, you can enable the debug mode if some functions are not compiling. This will help you understand if there are some header files that you are missing or any linkage issues. If the project you are working on does not have shared libraries but an executable, make sure to compile the executable in PIE form so that this program can convert it into a shared library.

References

  1. https://lief.quarkslab.com/doc/latest/tutorials/08_elf_bin2lib.html
Comments
  • Fuzzers never compile

    Fuzzers never compile

    Hello, I made it to the final step in your README:

    python3 harness.py -L /root/nginx/ -C /root/codeql/ -M 1 -O /root/autoharness/ -D nginx -G 1 -Y 1 -F "-I /root/nginx/objs -I /root/nginx/src/core -I /root/nginx/src/event -I /root/nginx/src/http -I /root/nginx/src/mail -I /root/nginx/src/misc -I /root/nginx/src/os -I /root/nginx/src/stream -I /root/nginx/src/os/unix" -X ngx_config.h,ngx_core.h

    and it appears to be doing something:

    Compiling query plan for /root/codeql/multiargfunc.ql.
    [1/1 comp 19.4s] Compiled /root/codeql/multiargfunc.ql.
    Starting evaluation of codeql-cpp/multiargfunc.ql.
    [1/1 eval 4.9s] Evaluation done; writing results to /root/autoharness/multiarg.bqrs.
    Shutting down query evaluator.
    

    however, it never builds any fuzzers, this is all that appears on screen:

                      f             g                                                  t
    0             _Exit          void                                              [int]
    1        __asprintf           int    [const char *__restrict__, char **__restrict__]
    2    __asprintf_chk           int  [int, const char *__restrict__, char **__restr...
    3        __bswap_16    __uint16_t                                       [__uint16_t]
    4        __bswap_32    __uint32_t                                       [__uint32_t]
    ..              ...           ...                                                ...
    812         waitpid       __pid_t                              [int, __pid_t, int *]
    813        wcstombs        size_t  [size_t, char *__restrict__, const wchar_t *__...
    814          wctomb           int                                  [char *, wchar_t]
    815           write       ssize_t                        [int, size_t, const void *]
    816          zError  const char *                                              [int]
    
    [817 rows x 3 columns]****
    

    Am using Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 and have installed all of the requirements as mentioned in the README. Any tips would be much appreciated.

    opened by geeknik 11
  • Could not resolve type ...

    Could not resolve type ...

    Hi, The following command failed to generate harness.

    python3 harness.py -L /local/codeql-home/nginx-1.21.1/objs/ -C /local/codeql-home/codeql/ -M 1 -O /local/nginx/autoharness/ -D nginx -G 1 -Y 1 -F "-I /local/codeql-home/nginx-1.21.1/objs -I /local/codeql-home/nginx-1.21.1/src/core -I /local/codeql-home/nginx-1.21.1/src/event -I /local/codeql-home/nginx-1.21.1/src/http -I /local/codeql-home/nginx-1.21.1/src/mail -I /local/codeql-home/nginx-1.21.1/src/misc -I /local/codeql-home/nginx-1.21.1/src/os -I /local/codeql-home/nginx-1.21.1/src/stream -I /local/codeql-home/nginx-1.21.1/src/os/unix" -X ngx_config.h,ngx_core.h
    

    Error messages:

    Compiling query plan for /local/codeql-home/codeql/multiargfunc.ql.
    ERROR: Could not resolve module cpp. There should probably be a qlpack.yml file declaring dependencies in /local/codeql-home/codeql or an enclosing directory. (/local/codeql-home/codeql/multiargfunc.ql:1,8-11)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:3,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:3,30-39)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:6,40-51)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:9,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:9,27-36)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:10,65-76)
    ERROR: Could not resolve type Function (/local/codeql-home/codeql/multiargfunc.ql:13,6-14)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:13,18-22)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:14,18-27)
    ERROR: Could not resolve type Struct (/local/codeql-home/codeql/multiargfunc.ql:14,91-97)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:4,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:6,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,47-53)
    
    opened by JerryWang304 10
  • Master

    Master

    Lots of changes, some that I can remember:

    • Factoring out code to create bash commands into command_builder.py.
    • Change from readelf to nm to only harness dynamic, defined, exported function symbols.
    • Ignore functions with void * or array arguments. Arrays may be supported later.
    • Get parameters in the right order.
    • Handle const.

    Note: only tested using mode=1 and detect=1 on libsodium, with other settings you'll likely still get lots of errors but let's fix those by creating and resolving issues.

    opened by Jelle-Nauta 0
  • Support array parameters

    Support array parameters

    Functions with array parameters are currently ignored because they would lead to code like:

    auto data1= provider.ConsumeIntegral<char[16]>();
    

    Proposal: handle array-parameters as a special case, e.g. generating code like:

    char data1[16];
    for (size_t idx = 0; idx < 16; ++idx) {
        data1[idx] = provider.ConsumeIntegral<char>();
    }
    

    There may be a more elegant way to do this, but I don't see it at the moment.

    opened by Jelle-Nauta 0
  • Create test suite

    Create test suite

    Currently there are many different combinations of settings (mode, detect) and library-properties (exported or not, defined or not, function or other symbols, etc.) and many of these probably lead to errors, e.g. through uncompilable harness code.

    Proposal: create a minimal set of libraries with a comprehensive set of symbols and properties, and a test workflow to verify that autoharness works - or find bugs that can then be addressed.

    opened by Jelle-Nauta 0
Releases(1.0)
  • 1.0(Jul 10, 2021)

    Initial Release of AutoHarness -added executable to shared object functionality -added automatic header detection or function reconstruction -added automatic fuzzing harness creation for one argument and multiple arguments

    Source code(tar.gz)
    Source code(zip)
A module to develop and apply old-style links

Old-Linkage-Dev (OLD) Old Linkage Development is a module to develop and apply old-style links. Old-style links stand for some traditional or conventi

Tarcadia 2 Dec 04, 2021
personal dotfiles for rolling release linux distros

dotfiles Screenshots: Directions: Deploy my dotfiles with yadm Packages from arch listed in .installed-packages Information on osu! see ~/Games/osu!/.

-pacer- 0 Sep 18, 2022
The purpose is to have a fairly simple python assignment that introduces the basic features and tools of python

This repository contains the code for the python introduction lab. The purpose is to have a fairly simple python assignment that introduces the basic

1 Jan 24, 2022
This repository containing cross-section cut and fill calculations using Python programming language.

cross-section This repository is containing cut and fill calculations for cross-section using Python programming language. This codes is made to calcu

3 Jun 15, 2022
Agora-token-helper - Some help tools for AgoraToken

Agora Token Helper Support AgoraToken version 001 - 006. But for security reason

A simply program to find active jackbox.tv game codes

PeepingJack A simply program to find active jackbox.tv game codes How does this work? It uses a threadpool to loop through all possible codes in a ran

3 Mar 20, 2022
This bot uploads telegram files to MixDrop.co,File.io.

What is about this bot ? This bot uploads telegram files to MixDrop.co, File.io. Usage: Send any file, and the bot will upload it to MixDrop.co, File.

Abhijith NT 3 Feb 26, 2022
Built as part of an assignment for S5 OOSE Subject CSE

Installation Steps: Download and install Python from here based on your operating system. I have used Python v3.8.10 for this. Clone the repository gi

Abhinav Rajesh 2 Sep 09, 2022
🌌A Python library to exhaustively enumerate a combinatorial space represented by a function

exhaust A Python library to exhaustively enumerate a combinatorial space represented by a function. The API is modelled after Python's random module a

Maik Riechert 1 Dec 05, 2021
a pull switch (or BYO button) that gets you out of video calls, quick

zoomout a pull switch (or BYO button) that gets you out of video calls, quick. As seen on Twitter System compatibility Tested on macOS Catalina (10.15

Brian Moore 422 Dec 30, 2022
An animal facts python module

An animal facts python module

Fayas Noushad 3 Dec 19, 2021
A python script that will automate the boring task of login to the captive portal again and again

A python script that will automate the boring task of login to the captive portal again and again

Rakib Hasan 2 Feb 09, 2022
JHBuild is a tool designed to ease building collections of source packages, called “modules”.

JHBuild README JHBuild is a tool designed to ease building collections of source packages, called “modules”. JHBuild was originally written for buildi

GNOME Github Mirror 46 Nov 22, 2022
Project in which we modelise an Among Us problem using graph theories.

Python-AmongUsProblem Project in which we modelise an Among Us problem using graph theories. The rules are as following: Total of 100 players 10 playe

Gabriel Shenouda 1 Feb 09, 2022
Repositório do programa ConstruDelas - Trilha Python - Módulos 1 e 2

ConstruDelas - Introdução ao Python Nome: Visão Geral Bem vinda ao repositório do curso ConstruDelas, módulo de Introdução ao Python. Aqui vamos mante

WoMakersCode 8 Oct 14, 2022
An implementation of multimap with per-item expiration backed up by Redis.

MultiMapWithTTL An implementation of multimap with per-item expiration backed up by Redis. Documentation: https://loggi.github.io/python-multimapwitht

Loggi 2 Jan 17, 2022
sumCulator Это калькулятор, который умеет складывать 2 числа.

sumCulator Это калькулятор, который умеет складывать 2 числа. Но есть условия: Эти 2 числа не могут быть отрицательными (всё-таки это вычитание, а не

0 Jul 12, 2022
Extra scripts to improve user experience related to OpenTaiko

OpenTaiko-Utils Extra scripts to improve user experience related to OpenTaiko osu2tja /!\ IMPORTANT NOTE /!\ Converted charts that aren't yours are fo

2 Dec 25, 2022
This speeds up PyCharm's package index processes and avoids CPU & memory overloading

This speeds up PyCharm's package index processes and avoids CPU & memory overloading

1 Feb 09, 2022
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

RDFLib RDFLib is a pure Python package for working with RDF. RDFLib contains most things you need to work with RDF, including: parsers and serializers

RDFLib 1.8k Jan 02, 2023