Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAML filename I/O still not stable #60

Open
lonelyjoeparker opened this issue Jul 25, 2017 · 2 comments
Open

PAML filename I/O still not stable #60

lonelyjoeparker opened this issue Jul 25, 2017 · 2 comments
Assignees
Labels

Comments

@lonelyjoeparker
Copy link
Owner

lonelyjoeparker commented Jul 25, 2017

PAML is still sensitive to overly long (>99 chars?) pathnames, both in the .ctl control file and also in sys.argv when PAML itself is invoked.

Error 1

Typically errors will manifest in the pipeline STDOUT as:

attempting command /usr/bin/perl -w runCmd.pl /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/. /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./codeml /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./aamlOnTreeOne.ctl
o0 PamlTestWrapper
o1 	/Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/.
o2 	/Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./codeml
o3 	/Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./aamlOnTreeOne.ctl
o4 dir change to /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/.
o5 runCmd exe:/Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./codeml /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./aamlOnTreeOne.ctl 	
done with output
error:
done
trying to read site patterns' lnL from /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./lnf
trying to read site patterns' lnL from /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./lnf
/Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./lnf
SERIOUS: unable to open this file.
Not able to proceed without a taxon list. Check path and retry.
java.lang.NullPointerException
	at uk.ac.qmul.sbcs.evolution.convergence.handlers.AamlAnalysisSGE.getAllPatternSSLS(AamlAnalysisSGE.java:170)
	at uk.ac.qmul.sbcs.evolution.convergence.handlers.AamlAnalysisSGE.getPatternSSLS(AamlAnalysisSGE.java:146)
	at uk.ac.qmul.sbcs.evolution.convergence.analyses.MultiHnCongruenceAnalysis.run(MultiHnCongruenceAnalysis.java:359)
	at uk.ac.qmul.sbcs.evolution.convergence.runners.CongruenceRunner.main(CongruenceRunner.java:70)

Which can be replicated natively with:

localhost:tmp_data (master*) joeparker$ /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./codeml /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./aamlOnTreeOne.ctl
Abort trap: 6

If arg pathnames are the only issue this can be fixed by using absolute not relative paths to call codeml, eg.

localhost:pamlTest joeparker$ ./codeml aamlOnTreeOne.ctl

 14         verbose | verbose                0.00
  7         runmode | runmode                0.00
  4         seqtype | seqtype                2.00
 18      aaRatefile | aaRatefile             1.00
... etc

Error 2

But if long paths are present in the aamlOnTreeOne.ctl file these will also call problems (e.g.):

	seqfile = /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/XLOC_000538_Locus_1.cds.fastaXLOC_000538_Locus_1.cds.fastaconv1501008504647_pamlAA.phy 		* sequence data filename
	treefile = /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/H0.tre.pruned.tre    	* tree structure file name

	outfile = /Users/joeparker/Documents/all_work/programming/java/eclipse/qmul-genome-convergence-pipeline/junit-test-inputs/test_GenomeConvergencePipeline/a_debug_data_dir_2017/tmp_data/./aaml.out       * main result file name

Which fails with

localhost:tmp_data (master*) joeparker$ ./codeml aamlOnTreeOne.ctl

Error: err: option file. add space around the equal sign?.

Fix

Error 1, Error 2: In the short term I'm advising users to keep directory paths short.

In the long term solutions are:

  • Hack PAML some more to allow longer char arrays in all fnames
  • Keep recommending users use short paths
    • (and possibly throw an error if over 'some' length???)
  • some sort of horrible fudge involving tmp dirs?

Error 2 only: Built SimpleCongruenceRunner-lauraDev.jar - this uses file names only, not paths (e.g. MultiHnCongruenceAnalysis.java:318 - not pushed to repo) which is more robust to long names in the ctl file itself, but requires all I/O to be in the same directory (and doesn't fix Error 1 either.

@lonelyjoeparker
Copy link
Owner Author

  • find out what the PAML length limits actually are and
  • implement a length limit check for now (probably in aamlAnalysisWrapperSGE)

@lonelyjoeparker
Copy link
Owner Author

Also built a dev build for Laura K. which uses just file names (not paths). This should be slightly more robust (to error 2) but requires the .ctl file and all inputs/outputs to be in the same dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant