-
Notifications
You must be signed in to change notification settings - Fork 0
For 45symbols submission: http://www.45symbols.com/symbols/portfolio/screen-scrying-ali-razzak/
AHTARazzak/protein_sentiment_analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MADE BY ALI RAZZAK FOR 45SYMBOLS PROPOSAL OF 19/20 NO COMMERCIAL USE INTENDED To be run in Python3, Ubuntu Mint Sarah. This script reads a list of keywords in the "pathogenslist.txt" file (separated by "\n") and inputs them into the rcsb.org search engine. It then takes the top 25 hits for each keyword and downloads the fasta sequence for each of those proteins. It scans the sequence for english words and scores them based on sentiment analysis (as scored in the "sentiment-words-DFE-785960.csv") file. Finally it returns a list of (".txt") files in directories associated to each search term with the respective proteins name, sentiment ("+" positive or "-" negative) in column 1, matched word (column 2), and score (column 3). When using remember to: 1. have "sentiment-words-DFE-785960.csv", "words.txt", and "pathogenlist.txt" in same directory as "protein_search_word_sentiment.py" and all in working directory. 2. "pathogenlist.txt" can be edited for other search terms. 3. path to chrome may be different, if so change "note #1" in script. 4. make sure variable downloaded path to same directory as files download to be default
About
For 45symbols submission: http://www.45symbols.com/symbols/portfolio/screen-scrying-ali-razzak/
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published