Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output format in P-group and G-group #13

Open
jpgsu opened this issue May 31, 2021 · 4 comments
Open

Output format in P-group and G-group #13

jpgsu opened this issue May 31, 2021 · 4 comments

Comments

@jpgsu
Copy link

jpgsu commented May 31, 2021

Hi,

Thank you for this great tool.
Can HATK outputs the logistic result, Manhattan plot and heatmap result based on Pgroup/Ggroup?
I noticed that it has --Pgroup/--Ggroup parameters in HATK.py.
But, I got error message when I tried to do the P-group analysis using the example data.
Any help is appreciated.

Jen-Ping
#######################

python3 HATK.py
--variants example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
--hped example/wtccc_filtered_58C_RA.hatk.300+300.hped
--Pgroup
--pheno example/wtccc_filtered_58C_RA.hatk.300+300.phe
--pheno-name RA
--out MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
--imgt 3320
--hg 18
--imgt-dir example/IMGTHLA3320
--multiprocess 8

Namespace(Ggroup=False, HLA=None, NoCaption=False, Pgroup=True, aa=None, ar=None, bmarkergenerator=False, chped=None, condition=None, condition_list=None, covar=None, covar_name=None, dict_AA=None, dict_SNPS=None, fam=None, fourF=False, hat=None, heatmap=False, hg='18', hla2hped=False, hped='example/wtccc_filtered_58C_RA.hatk.300+300.hped', imgt='3320', imgt2seq=False, imgt_dir='example/IMGTHLA3320', input=None, leave_NotFound=False, logistic=False, manhattan=False, maptable=None, metaanalysis=False, multiprocess=8, no_indel=False, nomencleaner=False, omnibus=False, oneF=False, out='MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18', phased=None, pheno='example/wtccc_filtered_58C_RA.hatk.300+300.phe', pheno_name='RA', platform=None, point_color='#778899', point_size='15', reference_allele=None, rhped=None, s1_bim=None, s1_logistic_result=None, s2_bim=None, s2_logistic_result=None, save_intermediates=False, threeF=False, top_color='#FF0000', twoF=False, variants='example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18', yaxis_unit='10')

[IMGT2Seq.py::WARNING]: Given '--Pgroup' argument will be overridden to '--4field'.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA A.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA B.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA C.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DRB1.

[HLA_Study.py]: IMGT2Seq result :
< IMGT2Sequence(Newly generated.) >

  • HLA Amino Acids : MyHLAStudyP/HLA_DICTIONARY_AA.hg18.imgt3320
  • HLA SNPs : MyHLAStudyP/HLA_DICTIONARY_SNPS.hg18.imgt3320
  • HLA Allele Table : MyHLAStudyP/HLA_ALLELE_TABLE.imgt3320.hat
  • Maptables for heatmap :
    A : MyHLAStudyP/HLA_MAPTABLE_A.hg18.imgt3320.txt
    B : MyHLAStudyP/HLA_MAPTABLE_B.hg18.imgt3320.txt
    C : MyHLAStudyP/HLA_MAPTABLE_C.hg18.imgt3320.txt
    DPA1: MyHLAStudyP/HLA_MAPTABLE_DPA1.hg18.imgt3320.txt
    DPB1: MyHLAStudyP/HLA_MAPTABLE_DPB1.hg18.imgt3320.txt
    DQA1: MyHLAStudyP/HLA_MAPTABLE_DQA1.hg18.imgt3320.txt
    DQB1: MyHLAStudyP/HLA_MAPTABLE_DQB1.hg18.imgt3320.txt
    DRB1: MyHLAStudyP/HLA_MAPTABLE_DRB1.hg18.imgt3320.txt

[HLA_Study.py]: Given HPED file('example/wtccc_filtered_58C_RA.hatk.300+300.hped') is to be processed by NomenCleaner.

[NomenCleaner.py]: Generating CHPED with P code HLA alleles.

[bMarkerGenerator.py]: Making Reference Panel for "MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18"

[1] Generating Amino acid(AA)sequences from HLA types.
[2] Encoding Amino acids positions.
Error: No variants remaining after --exclude.
[3] Encoding HLA alleles.
[4] Generating DNA(SNPS) sequences from HLA types.
[5] Encoding SNP positions.
Error: No variants remaining after --exclude.
[6] Extracting founders.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.SNPS.CODED.bed.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.AA.CODED.bed.
[7] Merging SNP, HLA, and amino acid datasets.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.AA.FOUNDERS.fam.
[8] Performing quality control.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.bed.
awk: cannot open MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.FRQ.frq (No such file or directory)
awk: cannot open MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.FRQ.frq (No such file or directory)
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.bed.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bed.
[9] Making reference panel for HLA-AA,SNPS,HLA and Normal variants(SNPs) is Done!

[HLA_Study.py]: bMarkerGenerator result(Prefix) :
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
Traceback (most recent call last):
File "HATK.py", line 243, in
myStudy = HLA_Study(args)
File "/gfs/hp48/jps/HLA160/HATK/src/HLA_Study.py", line 224, in init
_ref_allele=_args.reference_allele)
File "/gfs/hp48/jps/HLA160/HATK/HLA_Analysis/HLA_Analysis.py", line 147, in init
kwargs['_ref_allele'] = MakeDefaultReferenceAllele(_bfile, _out)
File "/gfs/hp48/jps/HLA160/HATK/HLA_Analysis/HLA_Analysis.py", line 430, in MakeDefaultReferenceAllele
bim = pd.read_csv(_bfile+".bim", sep='\t', header=None, usecols=[1,4,5], names=["Label", "Al1", "Al2"])
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 689, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bim' does not exist: b'MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bim'

@WansonChoi
Copy link
Owner

@jpgsu

It would have failed obviously. Sorry for the incomplete exception handling.

I intentionally blocked the usage of '--Ggroup' and '--Pgroup' arguments in IMGT2Seq so that it won't generate HLA sequence dictionary, maptable, etc. for G/P-group. I remember there was some issue in generalizing the function of the IMGT2Seq to G/P-group level. If you implement the IMGT2Seq with '--Ggroup' or '--Pgroup', it overrides those arguments to '--4field'.

Meanwhile, NomenCleaner transforms a hped file to G-group or P-group nomenclature if '--Ggroup' or '--Pgroup' arguments are given.

The main usage of HATK is based on the collective usage of those modules. Your argument combination must have failed because the output nomenclature from the IMGT2Seq(4-field) and NomenCleaner(P-group) are not matched. Sorry again for the incomplete exception handling.

HATK encourages conducting a study with N-field HLA nomenclature. Actually, the G/P-group transformation of the NomenCleaner is originally for a special purpose. However, If you do need to conduct your study with G/P-group, Let me know. I'll try to manage it.

@jpgsu
Copy link
Author

jpgsu commented May 31, 2021

We want to do our HLA analysis in various resolutions.
It would be very appreciated if you could help us with this.

Jen-Ping

@WansonChoi
Copy link
Owner

@jpgsu

I'll take a look to fix the P/G-group compatibility of the IMGT2Seq. But, due to other jobs in my lab, It can take a couple of weeks or more.

@jpgsu
Copy link
Author

jpgsu commented Jun 1, 2021

It's okay. Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants