Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ross and Shah Submission #6

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions bioinformaticsProject/BioCompProject.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#Shell Script for Biocomputing Bash Project Fall 2021

#Usage: bash BioCompProject.sh
#the code will ask users for the path's of where each folder is located. However to make it
#easier on users, we suggest organizing the project's files based on the structure below:
##### /bioinformaticsProject : /muscle, /ref_sequences, /BioCompProject.sh, /proteomes,
# /hmmer/bin/: hmmbuild hmmsearch


# The following code is getting the path of McRa gene, HSP70 gene and Proteomes from the user.
printf "Hi, I hope you are doing great.\nCould you please give me the path of McRA_Genes?\n"
printf "Example: ref_sequences directory is in Biocomp-Project, which is current working directory, enter './ref_sequences'\n"
printf "If any path is your current path, please enter only a '.'\n"

read Path1
printf "Awesome, and also share the path of HSP70 genes?\n"
read Path2
printf "Great! Now kindly enter the path where proteomes are stored\n"
read Path3
printf "Thank you. Almost there...I promise. Please tell us the path where your Muscle program is stored.\n"
read Path4
printf "Finally, could you list the path for your hmmbuild and hmmsearch?\n"
read Path5

printf "The paths you have entered are: "
echo $Path1 $Path2 $Path3 $Path4 $Path5

#following commands concatenates all the McrA gene files and all of the HSP70 gene files
#results in one file for McrA genes and anothe for HSP70 genes
cat $Path1/mcrAgene*.fasta > $Path1/mcrAlist.fasta
cat $Path2/hsp70gene*.fasta > $Path2/hsp70list.fasta

#next set of commands runs the Muscle program to create alignment files
#for both types of reference sequences. Then a profile is built for each to perform a search

$Path4/muscle -in $Path1/mcrAlist.fasta -out $Path1/mcrAlignment.fasta

$Path4/muscle -in $Path2/hsp70list.fasta -out $Path2/hsp70Alignment.fasta

$Path5/hmmbuild $Path1/mcrAbuild.fasta $Path1/mcrAlignment.fasta

$Path5/hmmbuild $Path2/hsp70build.fasta $Path2/hsp70Alignment.fasta

#removes any file with this naming convention
rm genehitlist.csv

#print a header row for your CSV file
echo -e 'Proteome\tMcrA Hits\tHSP70 Hits' >> genehitlist.csv

#begin for loop to generate a search image that will be then referenced to all 50 proteomes
#to find any proteome that has the most McrA and HSP70 hits/matches
for i in {01..50}

do

$Path5/hmmsearch --tblout mcrAsearch.fasta $Path1/mcrAbuild.fasta $Path3/proteome_$i*

mcrAhit=$(cat mcrAsearch.fasta | grep -v "^#" | wc -l)

$Path5/hmmsearch --tblout hsp70search.fasta $Path2/hsp70build.fasta $Path3/proteome_$i*

hsp70hit=$(cat hsp70search.fasta | grep -v "^#" | wc -l)

echo -e Proteome_$i'\t'$mcrAhit'\t\t'$hsp70hit >> genehitlist.csv

done


# The following pipeline travereses genehitlist.csv to find the best 5 candidates that are
#methanogens with pH resistance qualities. First, we check to see if there are any McrA hits, and
#discard any that return with 0 hits. Following this we sort the reminaing proteomes based on their
#the number of hsp70 gene hits/matches. Finally, this list is stored in a new .txt file and is displayed for the user to see

cat genehitlist.csv |awk -F '\t' '$2>0'|sort -k 3 -nr | head -n 5 | cut -f 1 > UltimateHitList.txt

echo "This is the ultimate list of Proteomes to use for your experiment...Good luck!"
cat UltimateHitList.txt

Binary file not shown.
5 changes: 5 additions & 0 deletions bioinformaticsProject/UltimateHitList.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Proteome_50
Proteome_45
Proteome_42
Proteome_03
Proteome_24
51 changes: 51 additions & 0 deletions bioinformaticsProject/genehitlist.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Proteome McrA Hits HSP70 Hits
Proteome_01 0 4
Proteome_02 0 2
Proteome_03 1 3
Proteome_04 0 4
Proteome_05 1 2
Proteome_06 0 0
Proteome_07 1 2
Proteome_08 0 5
Proteome_09 0 1
Proteome_10 0 3
Proteome_11 0 6
Proteome_12 0 6
Proteome_13 0 3
Proteome_14 0 2
Proteome_15 1 1
Proteome_16 1 1
Proteome_17 0 4
Proteome_18 0 8
Proteome_19 2 1
Proteome_20 0 3
Proteome_21 0 5
Proteome_22 0 9
Proteome_23 2 2
Proteome_24 1 2
Proteome_25 0 5
Proteome_26 0 1
Proteome_27 0 1
Proteome_28 0 1
Proteome_29 1 0
Proteome_30 0 1
Proteome_31 0 7
Proteome_32 0 4
Proteome_33 0 0
Proteome_34 0 2
Proteome_35 0 1
Proteome_36 0 3
Proteome_37 0 1
Proteome_38 1 1
Proteome_39 1 1
Proteome_40 0 2
Proteome_41 0 1
Proteome_42 1 3
Proteome_43 0 3
Proteome_44 1 1
Proteome_45 1 3
Proteome_46 0 2
Proteome_47 0 1
Proteome_48 1 1
Proteome_49 0 3
Proteome_50 1 3
Binary file added bioinformaticsProject/hmmer/bin/alimask
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmalign
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmbuild
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmconvert
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmemit
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmfetch
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmlogo
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmpgmd
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmpgmd_shard
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmpress
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmscan
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmsearch
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmsim
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/hmmstat
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/jackhmmer
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/makehmmerdb
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/nhmmer
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/nhmmscan
Binary file not shown.
Binary file added bioinformaticsProject/hmmer/bin/phmmer
Binary file not shown.
Loading