SBFT24 Competition #1941

phi-go · 2024-01-08T08:22:31Z

This PR combines all fuzzers submitted to SBFT24 and the mutation measurer to allow experiments for the competition.

Co-authored-by: Philipp Görz <phi-go@users.noreply.github.com>

(cherry picked from commit 1a31072)

DonggeLiu · 2024-01-17T11:40:30Z

Can we confirm that this only happens in this branch and not master?

I think we can.
Earlier I prepared these two experiments on master as a comparison:
#1945 (comment)

TuneFuzz had no such error in that experiment:

BTW, libFuzzer had no build error either:

There were some Coverage run failed., but they were fuzzer runtime errors like ERROR: libFuzzer: out-of-memory (used: 2065Mb; limit: 2048Mb), which is unrelated.

phi-go · 2024-01-17T11:48:59Z

So it seems that in this code:

    def set_up_corpus_directories(self):
        """Set up corpora for fuzzing. Set up the input corpus for use by the
        fuzzer and set up the output corpus for the first sync so the initial
        seeds can be measured."""
        fuzz_target_name = environment.get('FUZZ_TARGET')
        target_binary = fuzzer_utils.get_fuzz_target_binary(
            FUZZ_TARGET_DIR, fuzz_target_name)
        input_corpus = environment.get('SEED_CORPUS_DIR')
        os.makedirs(input_corpus, exist_ok=True)
        if not environment.get('CUSTOM_SEED_CORPUS_DIR'):
            _unpack_clusterfuzz_seed_corpus(target_binary, input_corpus)
        else:
            _copy_custom_seed_corpus(input_corpus)

The variable target_binary is set to None. Responsible function get_fuzz_target_binary is here: https://github.com/phi-go/fuzzbench/blob/72926c0bdf8614f16adaef2b4cd658e1908f6186/common/fuzzer_utils.py#L73.

In that function the target binary path is only returned if the file exists. So this is probably a build error not specifically a corpus thing. This is under the assumption that FUZZ_TARGET is set.

phi-go · 2024-01-17T11:52:56Z

I modified part of the Makefile generation to support the mutation testing docker builds, maybe I broke something there. @alan32liu could you take a look at the following changes to the files, I thought those should be fine so maybe another pair of eyes would be good:

https://github.com/google/fuzzbench/pull/1941/files#diff-9ba00a2744edb4b6e8a4768b520cd4b147e26ddec73c13337aac6a79ccfa99a0

docker/generate_makefile.py
docker/image_types.yaml
experiment/build/gcb_build.py

phi-go · 2024-01-17T12:30:18Z

Oh, I see now. On the cloud it seems the mutation analysis build process is used for the fuzzer builds, which is definitely wrong... Though, I don't yet understand why that happens.

https://www.googleapis.com/download/storage/v1/b/fuzzbench-data/o/sbft-standard-cov-01-16%2Fbuild-logs%2Fbenchmark-bloaty_fuzz_target-fuzzer-libfuzzer.txt?generation=1705424198204020&alt=media

DonggeLiu · 2024-01-17T12:40:16Z

I modified part of the Makefile generation to support the mutation testing docker builds, maybe I broke something there. @alan32liu could you take a look at the following changes to the files, I thought those should be fine so maybe another pair of eyes would be good:

https://github.com/google/fuzzbench/pull/1941/files#diff-9ba00a2744edb4b6e8a4768b520cd4b147e26ddec73c13337aac6a79ccfa99a0

docker/generate_makefile.py

docker/image_types.yaml

experiment/build/gcb_build.py

I did not notice anything either: They replicate coverage, and nothing seems too strange.

However, I noticed that TuneFuzz works fine on some benchmarks (e.g., freetype2_ftfuzzer):

In fact, ignoring the build errors, the error in doing trails only occur on benchmark jsoncpp_jsoncpp_fuzzer:

Not too sure about the build errors, though. The log is not very useful:

{
  "insertId": "yys9kff991ita",
  "jsonPayload": {
    "component": "dispatcher",
    "traceback": "Traceback (most recent call last):\n  File \"/work/src/experiment/build/builder.py\", line 191, in build_fuzzer_benchmark\n    buildlib.build_fuzzer_benchmark(fuzzer, benchmark)\n  File \"/work/src/experiment/build/gcb_build.py\", line 140, in build_fuzzer_benchmark\n    _build(config, config_name)\n  File \"/work/src/experiment/build/gcb_build.py\", line 124, in _build\n    raise subprocess.CalledProcessError(result.retcode, command)\nsubprocess.CalledProcessError: Command '['gcloud', 'builds', 'submit', '/work/src', '--config=/tmp/tmprsixo8fm', '--timeout=14400s', '--worker-pool=projects/fuzzbench/locations/us-central1/workerPools/buildpool-e2-std-32']' returned non-zero exit status 1.\n",
    "message": "Failed to build benchmark: curl_curl_fuzzer_http, fuzzer: tunefuzz.",
    "experiment": "sbft-standard-cov-01-16",
    "instance_name": "d-sbft-standard-cov-01-16"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "fuzzbench",
      "zone": "projects/1097086166031/zones/us-central1-c",
      "instance_id": "7879014797801950091"
    }
  },
  "timestamp": "2024-01-16T12:42:50.595247299Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2024-01-16T12:42:50.595247299Z"
}

I suppose one thing we can do is to add extensive logs related to these build errors (which seem to have more victims and deserve a higher priority).
Then we run a simple test experiment and debug with the logs.

phi-go · 2024-01-17T12:46:41Z

Luckily the build-logs do show something, see my other comment. Though, they are not available on a local build, I'll try to patch that in, maybe I messed something up in the dockerfile dependencies. I can test that locally for now.

Also thank you for taking a look.

phi-go · 2024-01-17T14:27:47Z

Ok, I can confirm that mutation testing is not used locally to build bloaty_fuzz_target-libfuzzer and more imporantly it builds without a problem. So it should be something in the gcb specific code, which I do not really understand that well. Also the build-logs are truncated so we do not see the remaining info: https://www.googleapis.com/download/storage/v1/b/fuzzbench-data/o/sbft-standard-cov-01-16%2Fbuild-logs%2Fbenchmark-bloaty_fuzz_target-fuzzer-libfuzzer.txt?generation=1705424198204020&alt=media.

However, before I dig deeper we can still complete the evaluation without fixing this. For now we planned to do the mutation analysis part locally on our blades, we do not support that many benchmarks so this is fine. The coverage for all benchmarks we could do an a branch without our changes and only the fuzzer PRs.

phi-go · 2024-01-17T17:31:30Z

I suppose one thing we can do is to add extensive logs related to these build errors (which seem to have more victims and deserve a higher priority).
Then we run a simple test experiment and debug with the logs.

The missing information seems to just be truncated from the build-log, I changed the code a bit to allow storing everything for the gcb_build execute call, I hope that should reveal the missing info. Let's try the simple test experiment, if you feel comfortable with that.

DonggeLiu · 2024-01-17T22:54:19Z

How about:

--fuzzers libafl libfuzzer tunefuzz pastis
--benchmarks freetype2_ftfuzzer jsoncpp_jsoncpp_fuzzer bloaty_fuzz_target lcms_cms_transform_fuzzer

Because it includes good success/failure comparisons on both fuzzers and benchmarks.

benchmark \ fuzzer	libafl	libfuzzer	tunefuzz	pastis
freetype2_ftfuzzer	Can Run	Can Run	Can Run	Can Run but NaN
jsoncpp_jsoncpp_fuzzer	Can Run	Can Run	Error doing trails	Error doing trails
bloaty_fuzz_target	Cannot Build	Cannot Build	Cannot Build	Cannot Build
lcms_cms_transform_fuzzer	Can Run	Can Run	Can Run	Can Run

Let me know if you'd like to add more.

phi-go · 2024-01-18T13:13:35Z

Thank you for looking into this so thoroughly. This sounds like a plan. If you want to reduce compute more, even one hour runs and a few trials should give us enough to debug, though, I don't know how to do this with flags.

/gcbrun run_experiment.py -a --mutation-analysis --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft-dev-01-18 --fuzzers libafl libfuzzer tunefuzz pastis --benchmarks freetype2_ftfuzzer jsoncpp_jsoncpp_fuzzer bloaty_fuzz_target lcms_cms_transform_fuzzer

whexy · 2024-01-18T17:11:39Z

Hello, I'm the author of BandFuzz. I've noticed that stb_stbi_read_fuzzer shows NaN in the 01-16 report, but it appears in mutation tests. Additionally, we also have NaN values for harfbuzz_hb-shape-fuzzer and sqlite3_ossfuzz in sbft-standard-cov-01-18 report

I am able to successfully build and run these three targets on my local setup. I have reviewed the build logs above but haven't found any valid reasons for this discrepancy.

phi-go · 2024-01-18T18:05:59Z

@whexy both of those experiments did not complete. I don't think you need to worry yet. Though, thank you for being watchful. The final coverage report will be found in this PR.

whexy · 2024-01-18T23:27:08Z

@phi-go Thank you for your reply. I greatly appreciate it. Since the 01-16 has been completed and there are still several NaN values in the report, I am curious if the experiment was intentionally stopped or if something went wrong.

phi-go · 2024-01-18T23:42:24Z

Oh, the 01-16 experiment was in this PR, sadly there seem to still be some issues with our mutation testing integration using gcb and we didn't have time to look into that yet, as discussed above. The other PR is the master branch + plus the competitors fuzzers, so there shouldn't be an issue.

kdsjZh · 2024-01-19T09:52:05Z

Hi @phi-go , Thanks for your patience in answering our questions and efforts in hosting this competition!

I have questions regarding the final metric. Does this year we still use two sets of benchmarks (bug/cov, both public and new private programs like last year) and measure the coverage and unique bugs? Or the mutation analysis results will also be taken into consideration for those programs that are compatible?

Do we use ranking or normalized score as the final metric?

phi-go · 2024-01-19T13:48:44Z

@kdsjZh thank you for participating which makes it all much more worthwhile!

We want to use mutation analysis as the main measurement result, after all it was the main reason for us to do the competition in the first place, though this is not final yet. You will get the FuzzBench coverage on the default benchmarks and the new mutation analysis results on a limited set of benchmarks. Sadly we didn't have time to do a more extensive evaluation.

Regarding using ranking or normalized score, we have not yet made a final decision. I expect we would use something along the lines of the geomean of killed mutants across the benchmarks. If you or others want to provide input on what we should chose please feel free to do so.

phi-go · 2024-01-19T14:02:23Z

@alan32liu do you think we could run the experiment mentioned here: #1941 (comment)

This is not urgent, I would just like to fix the issue with running mutation testing on the cloud version of FuzzBench.

kdsjZh · 2024-01-19T14:21:16Z

Thanks for your reply!

You will get the FuzzBench coverage on the default benchmarks and the new mutation analysis results on a limited set of benchmarks. Sadly we didn't have time to do a more extensive evaluation.

If I understand correctly, "mutants killed" is used to assess fuzzers' bug-finding capability. In that case, we'll only run mutation analysis and cov results on the default 23 coverage programs, right? Do we have an extended private coverage benchmark like last year to avoid overfitting?

phi-go · 2024-01-19T14:52:24Z

To be clear, for mutation testing we plan to provide data for 9 subjects. Sadly, we didn't have time to get more to work. We do not plan for extended private benchmarks. We would have liked to have a more thorough evaluation, including private subjects, as you rightly suggest. It just was not possible for us to do timewise. However, note that mutation testing is more resistant to overfitting than a coverage or bug based measurement (@vrthra might want to chime in here).

vrthra · 2024-01-19T20:49:07Z

As @phi-go mentions, we did not have enough time for extended bench marks.

DonggeLiu · 2024-01-20T04:09:55Z

/gcbrun run_experiment.py -a --mutation-analysis --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft-dev-01-20 --fuzzers libafl libfuzzer tunefuzz pastis --benchmarks freetype2_ftfuzzer jsoncpp_jsoncpp_fuzzer bloaty_fuzz_target lcms_cms_transform_fuzzer

DonggeLiu · 2024-01-20T04:13:46Z

Experiment sbft-dev-01-20 data and results will be available later at:
The experiment data.
The experiment report.
OR
The experiment report.

kdsjZh and others added 30 commits January 18, 2023 14:19

create sbft holder

9d3f271

fix typo

f9e7003

add mutation_analysis image with gllvm

aaa049c

[WIP] mua framework integration

e90cd53

[WIP] update-alternatives in mua image

a0e2b1f

[WIP] integrate mutation analysis into fuzzbench, build mua image

aeee7a6

[WIP] mua integration fuzzbench

e994f65

[WIP] mua integration recover changes

ffbf1e4

fuzzbench integration: build mutants

40cad5e

add mua command line option

396140e

move mua code from local only to local+gcb

b070a60

add process_mua

f1e8c26

Co-authored-by: Philipp Görz <phi-go@users.noreply.github.com>

faster starts for dev

8a76d1f

SBFT'24 placeholder commit

c6e2cbe

BandFuzz: SBFT'24 placeholder commit

37bb822

Merge branch 'google:master' into master

e626e5d

Fox: SBFT'24 placeholder

afbcae8

SBFT'24 placeholder

ee90506

Add TuneFuzz

85832b8

tunefuzz: enable cmplog

5ca0f58

pass presubmit checks

e2768cb

Merge branch 'google:master' into sbft

6f731ff

impl build_mua for gcb_build (untested)

02039ab

Merge remote-tracking branch 'vanilla/master'

1744f2b

fuzzers: update tunefuzz

750ae8a

fuzzers: CI test update

ceb41f1

Merge branch 'google:master' into sbft

340244b

Merge branch 'google:master' into sbft

67d685d

changes fixing performance and cloud issues

07a45f5

Fix OpenH264 based on OSS-Fuzz

7ab3ef8

(cherry picked from commit 1a31072)

phi-go added 4 commits January 17, 2024 17:15

store build logs for local runs

47af5a9

allow build scripts compiling same binary twice

b8672c8

allow unlimited log size for gcb_build

0452521

update sbft

bb4bef7

prashast and others added 4 commits January 18, 2024 08:22

Add new mode

e391d08

SBFT'24 final commit.

f18ec0d

Final Merge Mystique

f3eb055

Final Merge Fox

93ee438

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBFT24 Competition #1941

SBFT24 Competition #1941

phi-go commented Jan 8, 2024

DonggeLiu commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

DonggeLiu commented Jan 17, 2024

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024

phi-go commented Jan 17, 2024

DonggeLiu commented Jan 17, 2024

phi-go commented Jan 18, 2024

whexy commented Jan 18, 2024 •

edited

Loading

phi-go commented Jan 18, 2024

whexy commented Jan 18, 2024

phi-go commented Jan 18, 2024

kdsjZh commented Jan 19, 2024 •

edited

Loading

phi-go commented Jan 19, 2024

phi-go commented Jan 19, 2024

kdsjZh commented Jan 19, 2024

phi-go commented Jan 19, 2024

vrthra commented Jan 19, 2024

DonggeLiu commented Jan 20, 2024

DonggeLiu commented Jan 20, 2024 •

edited

Loading

SBFT24 Competition #1941

Are you sure you want to change the base?

SBFT24 Competition #1941

Conversation

phi-go commented Jan 8, 2024

DonggeLiu commented Jan 17, 2024 • edited Loading

phi-go commented Jan 17, 2024 • edited Loading

phi-go commented Jan 17, 2024 • edited Loading

phi-go commented Jan 17, 2024 • edited Loading

DonggeLiu commented Jan 17, 2024

phi-go commented Jan 17, 2024 • edited Loading

phi-go commented Jan 17, 2024

phi-go commented Jan 17, 2024

DonggeLiu commented Jan 17, 2024

phi-go commented Jan 18, 2024

whexy commented Jan 18, 2024 • edited Loading

phi-go commented Jan 18, 2024

whexy commented Jan 18, 2024

phi-go commented Jan 18, 2024

kdsjZh commented Jan 19, 2024 • edited Loading

phi-go commented Jan 19, 2024

phi-go commented Jan 19, 2024

kdsjZh commented Jan 19, 2024

phi-go commented Jan 19, 2024

vrthra commented Jan 19, 2024

DonggeLiu commented Jan 20, 2024

DonggeLiu commented Jan 20, 2024 • edited Loading

DonggeLiu commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

phi-go commented Jan 17, 2024 •

edited

Loading

whexy commented Jan 18, 2024 •

edited

Loading

kdsjZh commented Jan 19, 2024 •

edited

Loading

DonggeLiu commented Jan 20, 2024 •

edited

Loading