Skip to content

feat: add glob-based file inclusion and exclusion filtering with dry run support#41

Open
kulnor wants to merge 8 commits intoMIT-LCP:mainfrom
kulnor:main
Open

feat: add glob-based file inclusion and exclusion filtering with dry run support#41
kulnor wants to merge 8 commits intoMIT-LCP:mainfrom
kulnor:main

Conversation

@kulnor
Copy link
Copy Markdown
Collaborator

@kulnor kulnor commented Mar 29, 2026

Processing all supported data files in the directory tree is often not desirable.

To provide greater control over which data files are processed, I added repeatable --include and --exclude options that use glob patterns for filtering. A --dry-run option can be used to preview the selected files.

See the docs/file_filtering.md for details

Copilot AI review requested due to automatic review settings March 29, 2026 02:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds glob-based file filtering to Croissant Maker’s discovery pipeline and exposes it via the CLI, including a dry-run mode to preview selected files.

Changes:

  • Extend discover_files() to support include_patterns / exclude_patterns filtering.
  • Wire include/exclude options through CLI → MetadataGenerator → discovery.
  • Add --dry-run mode and document file filtering behavior; add unit tests for filtering.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/croissant_maker/files.py Adds glob-based include/exclude filtering on discovered relative paths.
src/croissant_maker/__main__.py Introduces --include/--exclude options and --dry-run early-exit behavior; passes patterns into generation.
src/croissant_maker/metadata_generator.py Stores include/exclude patterns and applies them during discovery.
tests/test_files.py Adds unit tests validating include/exclude filtering behavior.
docs/file_filtering.md New documentation describing filtering and dry-run usage.
Comments suppressed due to low confidence (1)

docs/file_filtering.md:40

  • Docs refer to an early-exit flag --list-files, but the CLI option implemented in this PR is --dry-run. Update the documentation to use the correct flag name (and ensure wording matches the CLI help) to avoid confusion.
### CLI (`croissant_maker.__main__`)
The CLI uses `typer` to handle multi-value options for `--include` and `--exclude`. It also implements the early-exit logic for `--list-files` to provide a fast "dry-run" experience.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kulnor
Copy link
Copy Markdown
Collaborator Author

kulnor commented Mar 29, 2026

Patches issues #43

@kulnor kulnor added the enhancement New feature or request label Mar 29, 2026
Copy link
Copy Markdown
Collaborator

@rafiattrach rafiattrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice addition, glob filtering and dry-run are genuinely useful features, thank you!
Two small things before merging listed below:


if not output:
output = _get_default_output_name(input)
typer.echo(f"Auto-generated output filename: {output}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message prints even when --dry-run is passed: no file is actually created so it's misleading. The if not output: block (lines 120–122) should be guarded with and not dry_run for example so it's skipped during dry runs.

Copy link
Copy Markdown
Collaborator Author

@kulnor kulnor Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Just pushed the fix.

The `MetadataGenerator` class stores the include/exclude patterns and passes them to the discovery utility. This ensures that the generated `distribution` (FileObjects) and `recordSet` (RecordSets) only contain the filtered subset.

### CLI (`croissant_maker.__main__`)
The CLI uses `typer` to handle multi-value options for `--include` and `--exclude`. It also implements the early-exit logic for `--list-files` to provide a fast "dry-run" experience.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: --list-files on last line was never an actual CLI flag, it leaked from the original internal variable name list_files. More broadly, per-feature .md
files tend to be really hard to maintain without a full docs pipeline to validate them — this PR is actually a great example of how quickly they can drift! I'd suggest
holding off on this pattern until we have a full docs site with proper CI validation. Really appreciate the addition though!

Copy link
Copy Markdown
Collaborator Author

@kulnor kulnor Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the file for now, and this is for sure an interesting point. I usually ask the agents to update the documentation as needed. But it is easy to miss one. Maybe should have this integrated in my precommit habits.

@kulnor
Copy link
Copy Markdown
Collaborator Author

kulnor commented Mar 31, 2026

@rafiattrach let me know if you need anything else to approve this PR so I can bring my fork up to date.

@rafiattrach
Copy link
Copy Markdown
Collaborator

@rafiattrach let me know if you need anything else to approve this PR so I can bring my fork up to date.

Thanks for the print fix! however for the docs, did you mean to remove technical_overview.md instead of the file_filtering.md file? We'll aim to have some proper docs separately and can then include for example the different flags with some use cases etc.

@kulnor
Copy link
Copy Markdown
Collaborator Author

kulnor commented Apr 1, 2026

@rafiattrach Oops, I did indeed delete the wrong file... Just fixed this (restored the technical_overview.md and deleted the file_filtering.md). Sorry about that :-)

Copy link
Copy Markdown
Collaborator

@rafiattrach rafiattrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kulnor! Looks great! Could you just rebase and possibly remove inline import if it already exists at the top? (small comment below for the exact line)


# If just listing files, output and exit
if dry_run:
from croissant_maker.handlers.registry import (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing I noticed: I believe this is already imported at the top of the file, do we need this again as inline import?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just accepted the rebase pull request. I don't see these imports at the top, so I didn't make any other changes.

Now it looks like we have conflicts, and this is where my git skills are starting to fall apart (merge conflict in PR). Is this something you can help with or provide guidance?

…ponents--croissant-maker

chore(main): release croissant-maker 0.2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants