Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .github/workflows/clusterfuzzlite.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# SPDX-FileCopyrightText: 2026 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0

name: ClusterFuzzLite

on:
push:
branches:
- dev
paths-ignore:
- '**.cff'
- '**.json'
- '**.md'
- '**.rst'
- '**.txt'
- 'docs/**'
pull_request:
branches:
- dev
paths-ignore:
- '**.cff'
- '**.json'
- '**.md'
- '**.rst'
- '**.txt'
- 'docs/**'
schedule:
- cron: '0 6 * * *' # Daily at 06:00 UTC

# Avoid duplicate runs for the same source branch and repository.
# For pull_request events, uses the source repo name from
# github.event.pull_request.head.repo.full_name; otherwise uses github.repository.
# For push events, uses the branch name from github.ref_name.
# For pull_request events, uses the source branch name from github.head_ref.
# This ensures events for the same repo and branch share the same group,
# and avoids cross-fork collisions when branch names are reused.
concurrency:
group: >-
${{ github.workflow }}-${{
github.event.pull_request.head.repo.full_name || github.repository
}}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions:
contents: write
issues: write

jobs:
fuzzing:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
sanitizer: [address]
steps:
- name: Build Fuzzers (${{ matrix.sanitizer }})
id: build
uses: google/clusterfuzzlite/actions/build_fuzzers@v1
with:
sanitizer: ${{ matrix.sanitizer }}
language: python
dockerfile-path: fuzz/Dockerfile

- name: Run Fuzzers (${{ matrix.sanitizer }})
id: run
uses: google/clusterfuzzlite/actions/run_fuzzers@v1
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
fuzz-seconds: 300
mode: ${{ github.event_name == 'pull_request' && 'code-change' || 'batch' }}
sanitizer: ${{ matrix.sanitizer }}
storage-repo: https://${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
storage-repo-branch: gh-pages
storage-repo-branch-coverage: gh-pages
Comment on lines +72 to +74
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow configuration specifies storage-repo-branch: gh-pages and storage-repo-branch-coverage: gh-pages for storing fuzzing corpus and coverage data. This assumes that a gh-pages branch exists in the repository. If this branch doesn't exist, the workflow may fail when trying to store corpus/coverage data. Consider either: 1) ensuring the gh-pages branch exists before this workflow runs, 2) documenting the need to create this branch in the PR description or setup documentation, or 3) using a different branch that already exists (like a dedicated fuzzing-data branch).

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot from security point of view, storing fuzzing corpus in gh-pages (publicly available) is a concern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing fuzzing corpus in gh-pages (public) is generally acceptable for open-source projects. The corpus contains test inputs that triggered interesting code paths, not secrets or sensitive data. However, if crash-triggering inputs are concerning:

Options:

  1. Keep gh-pages public - Standard practice for OSS fuzzing (e.g., OSS-Fuzz projects)
  2. Use a private branch - Change storage-repo-branch to a non-public branch if this repo has one
  3. Disable corpus storage - Remove the storage-repo* parameters (fuzzing will still work, just won't persist corpus between runs)

The current setup follows ClusterFuzzLite best practices for public repositories. Crash artifacts (line 76-83) are only uploaded as workflow artifacts (not gh-pages), which have limited retention and access.


- name: Upload crash artifacts
if: failure() && steps.run.outcome == 'failure'
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.sanitizer }}-artifacts
path: ./out/artifacts
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,14 @@ logs/
# Temp files
*.tmp
*.temp

# Fuzzing artifacts
fuzz/corpus/
fuzz/crashes/
fuzz/artifacts/
fuzz/*.profraw
fuzz/*.profdata
crash-*
leak-*
timeout-*
oom-*
29 changes: 29 additions & 0 deletions fuzz/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# SPDX-FileCopyrightText: 2026 PyThaiNLP Project
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileType: SOURCE

# Dockerfile for ClusterFuzzLite fuzzing
# This extends the OSS-Fuzz base builder image for Python projects

FROM gcr.io/oss-fuzz-base/base-builder-python

# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
libicu-dev \
pkg-config && \
rm -rf /var/lib/apt/lists/*

# Copy repository to $SRC/pythainlp
COPY . $SRC/pythainlp

# Set working directory
WORKDIR $SRC/pythainlp

# Install pythainlp in development mode with minimal dependencies
# This installs the package without heavy ML dependencies to speed up builds
RUN pip install --no-cache-dir -e .

# Copy build script to $SRC/build.sh as expected by OSS-Fuzz/ClusterFuzzLite
COPY fuzz/build.sh $SRC/
Loading