Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrain with parquet files #87

Conversation

brabbit61
Copy link
Collaborator

  • Extracts the corrected entities from the parquet file and store as json
  • Populates the data_path folder (passed as an arg) with the JSON files
  • Can process a combination of txt and json files during training (simultaneously).

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 44.00% and project coverage change: -0.69 ⚠️

Comparison is base (97e145e) 48.01% compared to head (c319474) 47.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev      #87      +/-   ##
==========================================
- Coverage   48.01%   47.33%   -0.69%     
==========================================
  Files          19       19              
  Lines        1918     1971      +53     
==========================================
+ Hits          921      933      +12     
- Misses        997     1038      +41     
Impacted Files Coverage Δ
src/preprocessing/labelling_preprocessing.py 18.06% <ø> (-0.53%) ⬇️
src/preprocessing/labelling_data_split.py 65.66% <43.54%> (-15.70%) ⬇️
.../hf_token_classification/huggingface_preprocess.py 75.47% <46.15%> (-9.64%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tieandrews
Copy link
Collaborator

See PR #86 for these changes cherry picked into the organized structured

@tieandrews tieandrews closed this Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants