Skip to content

Commit

Permalink
from-pdf-pt now uses *.pdf as default glob; prepared for release
Browse files Browse the repository at this point in the history
  • Loading branch information
fracpete committed Jul 4, 2024
1 parent c93bf71 commit 5b63994
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 3 deletions.
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changelog
=========

0.0.2 (2024-07-05)
------------------

- `from-pdf-pt` now uses `*.pdf` as default glob


0.0.1 (2024-05-02)
------------------

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ def _read(f):
},
packages=find_namespace_packages(where='src'),
install_requires=[
"llm-dataset-converter>=0.2.1",
"llm-dataset-converter>=0.2.4",
"pypdf",
"simple-range>=0.0.3",
],
version="0.0.1",
version="0.0.2",
author='Peter Reutemann',
author_email='fracpete@waikato.ac.nz',
entry_points={
Expand Down
2 changes: 1 addition & 1 deletion src/ldc_pdf/pretrain/_pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def initialize(self):
Initializes the reading, e.g., for opening files or databases.
"""
super().initialize()
self._inputs = locate_files(self.source, input_lists=self.source_list, fail_if_empty=True)
self._inputs = locate_files(self.source, input_lists=self.source_list, fail_if_empty=True, default_glob="*.pdf")
if self.page_range is None:
self.page_range = "first-last"
if self.combine_pages is None:
Expand Down

0 comments on commit 5b63994

Please sign in to comment.