Contribution to Kaggle's COVID-19 Open Research Dataset Challenge (CORD-19) via various NLP techniques.
Our initial approach to the problem is text summarization and topic modeling.
Existing kernels (mainly EDA):
- https://www.kaggle.com/tanulsingh077/a-comprehensive-resource-notebook-for-beginners
- https://www.kaggle.com/docxian/cord-19-metadata-evaluation
- https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
- https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles
- https://www.kaggle.com/finalepoch/medical-ner-using-spacy
Useful resources:
- https://towardsdatascience.com/comparing-text-summarization-techniques-d1e2e465584e
- https://paperswithcode.com/task/extractive-document-summarization/latest
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4261035/
- https://www.sciencedirect.com/science/article/pii/S1532046418302156
- https://github.com/NLPatVCU/medaCy
- https://github.com/stanfordnlp/stanza
- https://github.com/allenai/scibert
- https://arxiv.org/pdf/1903.10676.pdf
- https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/
- https://medium.com/@mgalkin/knowledge-graphs-in-natural-language-processing-acl-2019-7a14eb20fce8
- https://github.com/MicheleNuijten/statcheck