Skip to content

UCLA-BD2K/Aztec-Duplicate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Download new set of data from Solr.

Run main.java to create mappings. - Change id on line 355 in BaseCases.java to updated id - Final results are in all_name_combined_tools.json and all_name_mapping.csv. - Train folder should be filled with data

Run preprocess.py to set up for doc2vec. Run doc2vec_train.py to train doc2vec and get vectors.

Notes: If changes are made to the metadata, the combination function should be updated. Currently matching on names; update code block at line 44 on Main.java if different criteria wanted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published