GitHub - midas-network/term-extraction

To run:

cd project directory
source venv/bin/activate
python update_data.py
python term_counter.py

In Pycharm:

Make sure that you are using the correct python interpreter, specifically the one in the virtual environment.

To do this, go to:

Pycharm -> Settings -> Project -> Python Interpreter -> Add -> Existing environment -> Interpreter

and set it to project_dir/venv/bin/python

Todo for Jeff:

Update update_data.py to grab grants from the database
Update extraction_utils.build_corpus_words_only to accept a filename as a parameter instead of using the hardcoded papers.json
Update term_counter.py to pass in the filename to extraction_utils.build_corpus_words_only
Update term_counter.py to pass in the correct fields (field names in the grants.json that you want to extract text from
change the output file names to something better, and include the datasource in the name (e.g. "term_counts_grants.json")
change do_manual() to something more descriptive (e.g. extract_terms_from_datasource)
call do_manual() once for papers, and once for grants
Look in our regex term list in term_counter.py and try to group diseases into a single line. For example:
1. "lassa fever" and "lassa hemorrhagic fever" are counted seperately. A suggested improvement would be:
2. "(lassa fever|lassa hemorrhagic fever)"
3. Another example could be "meningococcal disease", and "meningococcal", perhaps (meningococcal disease|meningococcal) is better
4. Verify these combos with Alice?
Instead of CSV output, add HTML output. In the HTML, instead of printing the 2nd column in the huge file, just highlight the search terms in RED or something. Have fun with it.
1. I'm going to oversimplify this probably, but really you could just write:

 <html><body><table><tr><th>search term></th><th>result</th></tr>
 then for every term that is a match: <tr><td>term</td><td>this is <span class="highlight">all</span> the text</td></tr>
 finally: </table></body></html>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
output		output
utils		utils
.gitignore		.gitignore
readme.MD		readme.MD
requirements.txt		requirements.txt
term_counter.py		term_counter.py
update_data.py		update_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To run:

In Pycharm:

Todo for Jeff:

About

Releases

Packages

Languages

midas-network/term-extraction

Folders and files

Latest commit

History

Repository files navigation

To run:

In Pycharm:

Todo for Jeff:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages