Skip to content

Commit

Permalink
fix bugs
Browse files Browse the repository at this point in the history
  • Loading branch information
Guangyuan Li authored and Guangyuan Li committed Dec 28, 2020
1 parent f979f70 commit f2f67a4
Show file tree
Hide file tree
Showing 8 changed files with 102 additions and 10 deletions.
Binary file modified .DS_Store
Binary file not shown.
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ We recommend to try out our web application for that: https://deepimmuno.herokua
> Plase refer to **DeepImmuno-GAN** if you want to generate immunogenic peptide
Enjoy and don't hesitate to ask me questions (contact at the bottom), I will be responsive!
Enjoy and don't hesitate to ask me questions (contact at the bottom), I will be responsive! Feel free to raise an issue on github page!

## Citation
If you find that tool useful in your research, please consider citing our preprint:
Expand All @@ -25,12 +25,16 @@ numpy = 1.18.5

pandas = 1.1.1

```
Note: This is the enviroment that I used for development and I also tested it. But as long as you use python > 3, tensorflow = 2.3, It should also work.
```

#### How to use?

If you want to query a single epitope (peptide + HLA), for example you want to query peptide _**HPPLMNVER**_ along with _**HLA-A*0201**_. You need to

```
python3 deepimmuno-cnn.py --mode "single" --epitope "HPPLMNVER" --HLA "HLA-A*0201"
python3 deepimmuno-cnn.py --mode "single" --epitope "HPPLMNVER" --hla "HLA-A*0201"
```

If you want to query multiple epitopes, you just need to prepare a csv file like this:
Expand All @@ -47,6 +51,10 @@ Then you run:
python3 deepimmuno-cnn.py --mode "multiple" --intdir "/path/to/above/file" --outdir "/path/to/output/folder"
```

**1. Please note, when you specify the output dir, don't include the forward slash at the end, for example, use "~/Desktop" instead "~/Desktop/"**

**2. PLease note, if python3 doesn't work, you can replace python3 to python, it depends your installed python interpreter**

A full help prompt is as below:

```
Expand Down Expand Up @@ -76,6 +84,10 @@ numpy = 1.18.4

pandas = 1.0.5

```
Note: This is the enviroment that I used for development and I also tested it. But as long as you use python > 3, pytorch = 1.4, It should also work.
```

#### How to use

Pretty simple, just run like this
Expand All @@ -97,6 +109,10 @@ optional arguments:
--outdir OUTDIR specifying your output folder
```

**1. Please note, when you specify the output dir, don't include the forward slash at the end, for example, use "~/Desktop" instead "~/Desktop/"**

**2. PLease note, if python3 doesn't work, you can replace python3 to python, it depends your installed python interpreter**

## Contact

Guangyuan(Frank) Li
Expand Down
63 changes: 63 additions & 0 deletions data/hla2paratopeTable_aligned.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
HLA pseudo
HLA-A*0203 ------------YYEKVAH---TVDTLYRY-YYTKWEQWYTWY---
HLA-A*0205 ------Y-----YYEKVV---------YDYYRDTKWQWYWY-----
HLA-A*0206 -----MY-----YYEKVAH---TVDTLYRYHYYTKWVQLYTWY---
HLA-A*0207 ------M-----YYEKVAH---TVDTLYR-HYYTKWVQLYTWY---
HLA-A*3001 ------M-----YYENV-QT-----DTLYIYHYTKWWQLYWY----
HLA-A*6801 -----MY-----YYRNNAQT---V--DTLYYDYTKWAVQWYTWY--
HLA-A*9235 ------M-----YYEKV-HT-----DTYRYHYYTKWVQLYWY----
HLA-A*9253 ------M-----YYEKHT---------VDTRYYTKWYTWY------
HLA-B*0602 ----YDS------YEKYRQ-------ANKYWYYTKWEQWYWY----
HLA-B*1557 ---------------YQWN-----TLYRYEFYWTKWAAYW------
HLA-B*1801 ------Y----HSYRNITNTYES-NLYLR-YSYTKWVQYTW-----
HLA-B*3505 -------------YSGFYNDMNH-NSYPG-----------------
HLA-B*4001 ---MYHT-----KYREITNT-ESNY-----RYYSKVQYWY------
HLA-B*4002 ---MYHT----FKYREISNTYES-NLYLSYYYITKWVQLYEWY---
HLA-B*4044 ---MYHT----FKYREISNTYES-NLYLSYYYITKWVQLYEWY---
HLA-B*4102 ----YHT-----KYREIT---ESNL---YLYYYTKWQDYWY-----
HLA-B*4104 ---MYHT-----KYREISNT-ESNL---YLYYYTKWAVQDYTWY--
HLA-B*4202 ------------YYRNIYAQTES-NLYL--YYYTKWAVQDYWY---
HLA-B*4601 -----------MYYREKY-QTVS-NLY---RYYTKWEQWYLWY---
HLA-B*5706 ------M-----YYENSTNI---------YIYYIKWQLYWY-----
HLA-B*5712 ------M-----YYENMTYN------NYIYDYWTKWQLYWY-----
HLA-B*8103 -----------YYYRNIYAQTES-NYY---NYYSKVQDYEWY----
HLA-B*9234 ------------MYEKH--T-----DTLYRYYYTKWVQYTWY----
HLA-C*0102 -----------YHYRES--ATIF-NT-YWHFYWSKSEHQYWY----
HLA-C*0304 ------M-----YREKRQT-----SNLY--RYYYTWEQYWY-----
HLA-C*0401 ------Y-----YREKYRQD----NKLYLRFFYTKWRYWY------
HLA-C*0517 ------M-----YYEKRQT-----NKLYRYNFYTKWERYWY-----
HLA-C*0602 MYDSYEK------YRQADV-------NKLYWYYTKWQWYWY-----
HLA-C*0756 -----YD----FSYREKY-QAVS-NLYR--SDYTKLAQLYTWY---
HLA-C*1604 -----------YYYRNIFNTYES-NLY---RYYTKWQLYLWY----
HLA-A*0101 ------M-----YYQENTHT---NTLYIIYRDYTKWAQRYYRGRY-
HLA-A*0201 ---MYMM-----YYEKVVHHTHTVDTLYRYYYYTKWVQLYYYYY--
HLA-A*0224 ----MYM-----YYEKH-HT----DTLYRYYYYTKWVQLYWY----
HLA-A*0301 -----MY-----YQENVVAQ------DTYYRDYTKWEQLYTWY---
HLA-A*0362 ------M-----YYENA-QT-----DTLYYRDYTKWEQLYWY----
HLA-A*1101 ------Y-----YYENAQV---TVDTLYYYRDYTKWAAQQYWY---
HLA-A*2301 ------------YYEKVAHT------NIYLFYYTKWVQQYTGY---
HLA-A*2402 ------M-----YYEKKVHT----NIYLMFHYYTKWVQYYTYRY--
HLA-A*3003 ----YSM-----YYENVHTE----NTLYIYEHYTKWRLYTWY----
HLA-A*9234 ------------MYEKV-HT----HDT-YRYYYTKWVLLYWY----
HLA-B*0702 -----------YYYRNIY-QTES-NLYYY-DYYTKWEQRYEWY---
HLA-B*0801 ---MYDY----YFYRNIFNTDES-NLYLSYNYYTKWVVQDYWYY--
HLA-B*1402 ---YYSY----YSEYNICTNTES-NLYLW-YFYTKWELYTW-----
HLA-B*1501 MYYAMYY----YYREI--NTYES-NLYLRYSYYTKWAEQWYLWY--
HLA-B*2703 ----YHT-----EHREICTE-D---TLY-LYDYTKWVQLYEWY---
HLA-B*2705 ---YYHT----EYREEICTE-DEDTLYL-YYDYTKWVQLAYEWY--
HLA-B*2709 ---MYHT----EYREQICTE-TDDTY---YYHYTKWVVQLYEWY--
HLA-B*2713 ---MYHTEVVRFYREEICAK-DTDYYYHYHDAYTKWVRQLYYECWY
HLA-B*3501 -----MM----YYRNIFTNTYES-NLYIRYSSYTKWAVQLYWY---
HLA-B*3508 ----MMY----YYRQIITNTYES-NLYIR-YSYTKWQRWYYWY---
HLA-B*3901 -----YY----YSEYNIC-TTES-NLYLR-YFYTKWVQYYWY----
HLA-B*4201 ------------YYRNIYAQTES-NY-L--YYYTKWAVQDYWY---
HLA-B*4402 ---MYYT-----KYREISNT-ENNTT--YIYDYTKWVVQDYLSRY-
HLA-B*4403 ---MYYT-----KYREITTE---NTY--IRYDYTKWVVQLYLSRY-
HLA-B*4405 ---MYYT-----KYREITTE---NTY--I-YYYTKWVVQDYLSRY-
HLA-B*5101 ------Y-----YYRNINT------YNIYWYYYTKWEQLYLW----
HLA-B*5301 ------Y-----YYRNIFT----NTENIYIRYYTKWVQLYWY----
HLA-B*5701 ------M-----YYENTYNI------YIYDSYWTKWVQLYWY----
HLA-B*5703 ----MYY-----YEENA-ST----NTYNIYYYYTKWVQLYWYY---
HLA-B*5801 ------------YYENSTYE---NIYIRYSAYYTKWVQLYWYY---
HLA-B*8102 ------M----YYYRNIYAQTES-NLYY--NYYSKAVQLYWY----
HLA-C*1510 -----MY-----YYENMQTD---DNIYIYIYDYTKWVQLYLWY---
15 changes: 10 additions & 5 deletions deepimmuno-cnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import auc,precision_recall_curve,roc_curve,confusion_matrix
import argparse
import os

Expand Down Expand Up @@ -138,7 +136,7 @@ def rescue_unknown_hla(hla, dic_inventory):



def hla_data_aaindex(hla_dic,hla_type,after_pca): # return numpy array [34,12,1]
def hla_data_aaindex(hla_dic,hla_type,after_pca,dic_inventory): # return numpy array [46,12,1]
try:
seq = hla_dic[hla_type]
except KeyError:
Expand All @@ -148,7 +146,7 @@ def hla_data_aaindex(hla_dic,hla_type,after_pca): # return numpy array [34,12
encode = encode.reshape(encode.shape[0], encode.shape[1], -1)
return encode

def construct_aaindex(ori,hla_dic,after_pca):
def construct_aaindex(ori,hla_dic,after_pca,dic_inventory):
series = []
for i in range(ori.shape[0]):
peptide = ori['peptide'].iloc[i]
Expand All @@ -157,7 +155,7 @@ def construct_aaindex(ori,hla_dic,after_pca):

encode_pep = peptide_data_aaindex(peptide,after_pca) # [10,12]

encode_hla = hla_data_aaindex(hla_dic,hla_type,after_pca) # [46,12]
encode_hla = hla_data_aaindex(hla_dic,hla_type,after_pca,dic_inventory) # [46,12]
series.append((encode_pep, encode_hla, immuno))
return series

Expand All @@ -167,6 +165,7 @@ def hla_df_to_dic(hla):
col1 = hla['HLA'].iloc[i] # HLA allele
col2 = hla['pseudo'].iloc[i] # pseudo sequence
dic[col1] = col2
return dic



Expand Down Expand Up @@ -217,13 +216,19 @@ def file_process(upload,download):
def main(args):
mode = args.mode
if mode == 'single':
print("mode is single")
epitope = args.epitope
print("queried epitope is {}".format(epitope))
hla= args.hla
print("queried epitope is {}".format(hla))
score = computing_s(epitope,hla)
print(score)
elif mode == 'multiple':
print("mode is multiple")
intFile = args.intdir
print("input file is {}".format(intFile))
outFolder = args.outdir
print("output will be in {}".format(outFolder))
file_process(intFile,outFolder)


Expand Down
14 changes: 11 additions & 3 deletions deepimmuno-gan.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@
import torch.nn.functional as F
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import argparse
import os

Expand Down Expand Up @@ -126,6 +123,16 @@ def __len__(self):

# auxiliary function during training GAN
def sample_generator(batch_size):
batch_size = 64
lr = 0.0001
num_epochs = 100
seq_len = 10
hidden = 128
n_chars = 21
d_steps = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
G = Generator(hidden,seq_len,n_chars,batch_size).to(device)
G.load_state_dict(torch.load('./models/wassGAN_G.pth'))
noise = torch.randn(batch_size, 128).to(device) # [N, 128]
generated_data = G(noise) # [N, seq_len, n_chars]
return generated_data
Expand Down Expand Up @@ -285,6 +292,7 @@ def inverse_transform(hard): # [N,seq_len]
def main(args):
#batch= args.batch
outdir = args.outdir
print("outdir is {}".format(outdir))
generation = sample_generator(64).detach().cpu().numpy() # [N,seq_len,n_chars]
hard = np.argmax(generation, axis=2) # [N,seq_len]
pseudo = inverse_transform(hard)
Expand Down
Binary file modified models/.DS_Store
Binary file not shown.
Binary file modified models/cnn_model_331_3_7/.data-00000-of-00001
Binary file not shown.
Binary file modified models/cnn_model_331_3_7/.index
Binary file not shown.

0 comments on commit f2f67a4

Please sign in to comment.