Output format in P-group and G-group #13

jpgsu · 2021-05-31T02:35:04Z

Hi,

Thank you for this great tool.
Can HATK outputs the logistic result, Manhattan plot and heatmap result based on Pgroup/Ggroup?
I noticed that it has --Pgroup/--Ggroup parameters in HATK.py.
But, I got error message when I tried to do the P-group analysis using the example data.
Any help is appreciated.

Jen-Ping
#######################

python3 HATK.py
--variants example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
--hped example/wtccc_filtered_58C_RA.hatk.300+300.hped
--Pgroup
--pheno example/wtccc_filtered_58C_RA.hatk.300+300.phe
--pheno-name RA
--out MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
--imgt 3320
--hg 18
--imgt-dir example/IMGTHLA3320
--multiprocess 8

Namespace(Ggroup=False, HLA=None, NoCaption=False, Pgroup=True, aa=None, ar=None, bmarkergenerator=False, chped=None, condition=None, condition_list=None, covar=None, covar_name=None, dict_AA=None, dict_SNPS=None, fam=None, fourF=False, hat=None, heatmap=False, hg='18', hla2hped=False, hped='example/wtccc_filtered_58C_RA.hatk.300+300.hped', imgt='3320', imgt2seq=False, imgt_dir='example/IMGTHLA3320', input=None, leave_NotFound=False, logistic=False, manhattan=False, maptable=None, metaanalysis=False, multiprocess=8, no_indel=False, nomencleaner=False, omnibus=False, oneF=False, out='MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18', phased=None, pheno='example/wtccc_filtered_58C_RA.hatk.300+300.phe', pheno_name='RA', platform=None, point_color='#778899', point_size='15', reference_allele=None, rhped=None, s1_bim=None, s1_logistic_result=None, s2_bim=None, s2_logistic_result=None, save_intermediates=False, threeF=False, top_color='#FF0000', twoF=False, variants='example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18', yaxis_unit='10')

[IMGT2Seq.py::WARNING]: Given '--Pgroup' argument will be overridden to '--4field'.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA A.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA B.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA C.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DRB1.

[HLA_Study.py]: IMGT2Seq result :
< IMGT2Sequence(Newly generated.) >

HLA Amino Acids : MyHLAStudyP/HLA_DICTIONARY_AA.hg18.imgt3320
HLA SNPs : MyHLAStudyP/HLA_DICTIONARY_SNPS.hg18.imgt3320
HLA Allele Table : MyHLAStudyP/HLA_ALLELE_TABLE.imgt3320.hat
Maptables for heatmap :
A : MyHLAStudyP/HLA_MAPTABLE_A.hg18.imgt3320.txt
B : MyHLAStudyP/HLA_MAPTABLE_B.hg18.imgt3320.txt
C : MyHLAStudyP/HLA_MAPTABLE_C.hg18.imgt3320.txt
DPA1: MyHLAStudyP/HLA_MAPTABLE_DPA1.hg18.imgt3320.txt
DPB1: MyHLAStudyP/HLA_MAPTABLE_DPB1.hg18.imgt3320.txt
DQA1: MyHLAStudyP/HLA_MAPTABLE_DQA1.hg18.imgt3320.txt
DQB1: MyHLAStudyP/HLA_MAPTABLE_DQB1.hg18.imgt3320.txt
DRB1: MyHLAStudyP/HLA_MAPTABLE_DRB1.hg18.imgt3320.txt

[HLA_Study.py]: Given HPED file('example/wtccc_filtered_58C_RA.hatk.300+300.hped') is to be processed by NomenCleaner.

[NomenCleaner.py]: Generating CHPED with P code HLA alleles.

[bMarkerGenerator.py]: Making Reference Panel for "MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18"

[1] Generating Amino acid(AA)sequences from HLA types.
[2] Encoding Amino acids positions.
Error: No variants remaining after --exclude.
[3] Encoding HLA alleles.
[4] Generating DNA(SNPS) sequences from HLA types.
[5] Encoding SNP positions.
Error: No variants remaining after --exclude.
[6] Extracting founders.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.SNPS.CODED.bed.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.AA.CODED.bed.
[7] Merging SNP, HLA, and amino acid datasets.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.AA.FOUNDERS.fam.
[8] Performing quality control.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.bed.
awk: cannot open MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.FRQ.frq (No such file or directory)
awk: cannot open MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.FRQ.frq (No such file or directory)
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.MERGED.FOUNDERS.bed.
Error: Failed to open
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bed.
[9] Making reference panel for HLA-AA,SNPS,HLA and Normal variants(SNPs) is Done!

[HLA_Study.py]: bMarkerGenerator result(Prefix) :
MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18
Traceback (most recent call last):
File "HATK.py", line 243, in
myStudy = HLA_Study(args)
File "/gfs/hp48/jps/HLA160/HATK/src/HLA_Study.py", line 224, in init
_ref_allele=_args.reference_allele)
File "/gfs/hp48/jps/HLA160/HATK/HLA_Analysis/HLA_Analysis.py", line 147, in init
kwargs['_ref_allele'] = MakeDefaultReferenceAllele(_bfile, _out)
File "/gfs/hp48/jps/HLA160/HATK/HLA_Analysis/HLA_Analysis.py", line 430, in MakeDefaultReferenceAllele
bim = pd.read_csv(_bfile+".bim", sep='\t', header=None, usecols=[1,4,5], names=["Label", "Al1", "Al2"])
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/jps/miniconda3/envs/CookHLA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 689, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bim' does not exist: b'MyHLAStudyP/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18.bim'

WansonChoi · 2021-05-31T05:46:47Z

@jpgsu

It would have failed obviously. Sorry for the incomplete exception handling.

I intentionally blocked the usage of '--Ggroup' and '--Pgroup' arguments in IMGT2Seq so that it won't generate HLA sequence dictionary, maptable, etc. for G/P-group. I remember there was some issue in generalizing the function of the IMGT2Seq to G/P-group level. If you implement the IMGT2Seq with '--Ggroup' or '--Pgroup', it overrides those arguments to '--4field'.

Meanwhile, NomenCleaner transforms a hped file to G-group or P-group nomenclature if '--Ggroup' or '--Pgroup' arguments are given.

The main usage of HATK is based on the collective usage of those modules. Your argument combination must have failed because the output nomenclature from the IMGT2Seq(4-field) and NomenCleaner(P-group) are not matched. Sorry again for the incomplete exception handling.

HATK encourages conducting a study with N-field HLA nomenclature. Actually, the G/P-group transformation of the NomenCleaner is originally for a special purpose. However, If you do need to conduct your study with G/P-group, Let me know. I'll try to manage it.

jpgsu · 2021-05-31T06:52:13Z

We want to do our HLA analysis in various resolutions.
It would be very appreciated if you could help us with this.

Jen-Ping

WansonChoi · 2021-06-01T01:07:25Z

@jpgsu

I'll take a look to fix the P/G-group compatibility of the IMGT2Seq. But, due to other jobs in my lab, It can take a couple of weeks or more.

jpgsu · 2021-06-01T10:26:14Z

It's okay. Many thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output format in P-group and G-group #13

Output format in P-group and G-group #13

jpgsu commented May 31, 2021

WansonChoi commented May 31, 2021

jpgsu commented May 31, 2021

WansonChoi commented Jun 1, 2021

jpgsu commented Jun 1, 2021

Output format in P-group and G-group #13

Output format in P-group and G-group #13

Comments

jpgsu commented May 31, 2021

WansonChoi commented May 31, 2021

jpgsu commented May 31, 2021

WansonChoi commented Jun 1, 2021

jpgsu commented Jun 1, 2021