Skip to content

chore: add experimental blob.pdf_chunking function #1370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 11, 2025

Conversation

shuoweil
Copy link
Contributor

@shuoweil shuoweil commented Feb 6, 2025

Verified in test env screen/6VAwRjCVt9aqYj6 screen/9B3P7VQPD667HUr

pypdf is under BSD-3-Clause license: https://pypdf.readthedocs.io/en/stable/meta/faq.html#which-license-does-pypdf-use

  1. blob.pdf_chunk reads in GCS link as input, extract and chunk pdf file based on user specified chunk_size and overlap_size
  2. blob.pdf_extract reads in GCS link as input, extract pdf files
  3. refactor code of blob.image_blur for better readability

@shuoweil shuoweil self-assigned this Feb 6, 2025
@shuoweil shuoweil requested review from a team as code owners February 6, 2025 05:19
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 6, 2025
@shuoweil shuoweil enabled auto-merge (squash) February 6, 2025 05:20
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Feb 6, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Feb 7, 2025
@shuoweil shuoweil force-pushed the shuowei-pdf-chunking-v1 branch from fd440c0 to 6873ce5 Compare February 7, 2025 20:00
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Feb 7, 2025
@shuoweil shuoweil requested a review from GarrettWu February 7, 2025 20:00
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 7, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 7, 2025
@shuoweil shuoweil force-pushed the shuowei-pdf-chunking-v1 branch from 4e4bd5f to 5619c31 Compare February 11, 2025 00:08
@shuoweil shuoweil requested a review from GarrettWu February 11, 2025 00:12
@shuoweil shuoweil merged commit ffe7dc6 into main Feb 11, 2025
23 checks passed
@shuoweil shuoweil deleted the shuowei-pdf-chunking-v1 branch February 11, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants