Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hugging Face Model Training #45

Merged
merged 66 commits into from
Jun 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
281cb57
feat: added entity label file loading function
tieandrews May 26, 2023
79fd36f
feat: script to coordinate training process
tieandrews May 29, 2023
f289c36
feat: bash script to run training and set params
tieandrews May 29, 2023
ca9c5fa
feat: script to generate hf formatted data
tieandrews May 29, 2023
dd3696f
docs: updated ner_training target name
tieandrews May 29, 2023
7f4ac68
docs: model training readme
tieandrews May 29, 2023
c64e778
feat: automated utils to evaluate trained models
tieandrews May 30, 2023
7361e91
feat: bash script to run evaluation on checkpoint
tieandrews May 30, 2023
26b20ce
bug: added _init_ to src to make module
tieandrews May 30, 2023
8d69aa6
bug: token based evaluation same as entity based
tieandrews May 30, 2023
2cb379a
feat: added new return objects
tieandrews May 30, 2023
eff149e
docs: updated docs to match new input format
tieandrews May 30, 2023
df022a7
feat: improved error checking and logging
tieandrews May 30, 2023
234aa3d
feat: cleaned up imports and constants
tieandrews May 30, 2023
a5348f8
feat: last imports cleaned
tieandrews May 30, 2023
0944ea3
feat: added custom metrics to mlflow logging
tieandrews May 30, 2023
9cf2140
feat: model training bash script
tieandrews May 30, 2023
1a18d96
feat: basic notebook to use on colab to train
tieandrews May 30, 2023
2357286
docs: initial commit
tieandrews May 30, 2023
9ea5abe
docs: results and models directory setup
tieandrews May 30, 2023
9192ad8
feat: removed unused label2id object
tieandrews May 31, 2023
0153d73
feat: separated folder location setting
tieandrews May 31, 2023
8e31500
bug: setup individual copied objects for pred/true
tieandrews May 31, 2023
1ac2bbc
bug: added defined max token length
tieandrews May 31, 2023
89be2df
bug: added quotes for paths with spaces in colab
tieandrews May 31, 2023
4534ba2
bug: fixed labelled file location with spaces
tieandrews May 31, 2023
f2bf1ac
Merge remote-tracking branch 'origin/dev' into 22-fine-tune-allenai-s…
tieandrews Jun 1, 2023
4cbfaaf
feat: rename and update standard params
tieandrews Jun 1, 2023
5ca03dd
docs: switch default data folder
tieandrews Jun 1, 2023
e7faccb
docs: clean up and make better defaults
tieandrews Jun 1, 2023
fd60e9b
feat: update to use new train/val/test data struct
tieandrews Jun 1, 2023
24eb8fb
feat: setting up __init__ files
tieandrews Jun 1, 2023
e40751c
feat: added final model name field
tieandrews Jun 1, 2023
00dd4bf
feat: added custom model logging at training end
tieandrews Jun 1, 2023
fb534fa
feat: setup huggingface batch inference
tieandrews Jun 1, 2023
de94f40
docs: init file setup
tieandrews Jun 1, 2023
2578015
bug: fixed batch prediction
tieandrews Jun 1, 2023
3f30515
feat: include max_samples by default for local run
tieandrews Jun 1, 2023
bb14221
feat: added better input checking from tests
tieandrews Jun 2, 2023
8ea406b
tests: fixes to complete test suite
tieandrews Jun 2, 2023
0b43a54
docs: added transformers requirements
tieandrews Jun 2, 2023
31c2fc8
bug: moved docopt to main
tieandrews Jun 3, 2023
c135b09
tests: initial basic tests
tieandrews Jun 3, 2023
92df812
feat: colab train notebook autoreload used
tieandrews Jun 3, 2023
da53411
docs: delete model card of specter2
tieandrews Jun 3, 2023
67088c4
tests: added tests for calculate/plot methods
tieandrews Jun 6, 2023
ca10b58
feat: added stride to generating overlap windows
tieandrews Jun 8, 2023
37fefaf
feat: added early stopping with patience 5
tieandrews Jun 8, 2023
5357087
feat: added warmup and seed
tieandrews Jun 8, 2023
9521d9c
bug: added load_best_model_at_end for early stop
tieandrews Jun 8, 2023
a9572c6
bug: added control for when mlflow diasbled
tieandrews Jun 19, 2023
babdb8b
feat: updated preprocessing stride length
tieandrews Jun 19, 2023
50ac915
feat: updated grouped_entities to aggregationstrat
tieandrews Jun 19, 2023
af0e2d4
feat: baseline test results
tieandrews Jun 19, 2023
bc669ec
docs: roberta finetune v3 results
tieandrews Jun 19, 2023
386c058
feat: bert-multilanguage results
tieandrews Jun 20, 2023
b84e7ae
feat: specter2 results
tieandrews Jun 20, 2023
7215a39
docs: removed results folder
tieandrews Jun 20, 2023
129fac0
docs: HF model training readme update
tieandrews Jun 21, 2023
1c617e2
docs: update colab notebook
tieandrews Jun 21, 2023
743780b
feat: final roberta model training script
tieandrews Jun 21, 2023
81bc738
feat: automated baseline evaluation script
tieandrews Jun 21, 2023
8d99f29
docs: labelsetudio setup guide
tieandrews Jun 21, 2023
9211ba3
Merge branch 'dev' into 22-fine-tune-allenai-specter2-model-for-ner
tieandrews Jun 21, 2023
aa181eb
bug: docopt outside of main
tieandrews Jun 21, 2023
fc50bd7
bug: mermaid diagram not rendering in README
tieandrews Jun 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ data/**/*.json
data/**/*.csv
!data/entity-extraction/raw/taxa.csv

# ignore files in models folder but keep .gitkeep
models/ner/*
!.gitkeep

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
Loading