This repository focuses on recognizing entities in news articles and mapping them to predefined entity names with corresponding IDs.
- Please follow the steps to download Mongodb: https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-windows/
- Please follow the steps to download Milvus: https://milvus.io/docs/install_standalone-docker.md
- pip3 install -r requirements.txt
- If running embedding model in your local pc, Please:
- pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Prerequiest for generate_mapping_company.py
:
- Please create Milvus vector datebase use this code
Create_VDB.py
underScripts.VDB_Similiarity_Search
- Setting for Embedding model, either
Server
orLocal
Option 1:
- run code step by step
- run
mongodb.py
to insert raw news data - run
ner.py
to extract companies for each sentence - run
ner_output_processor.py
to further process from ner - run
generate_mapping_company.py --embedding_method Local
to do similarity calculation
Option 2:
run code in one shot:
- run
main.py --embedding_method Local
for all functionality