Skip to content
GuyKha edited this page Jul 31, 2016 · 49 revisions

General files that are created during processing (datasets)

processing time is set to be up to 6 hours except for SnpCgh microarray which is limited to 1 hour of processing

note: the original uploaded file and the files inside it (in case of an archive) is deleted by process_input_files.php

  1. process_log.text - log file for the project processing.

  2. working.txt - generated at the start of processing contain the processing start timestamp, used to indicate front end that the installation still running.

  3. working_done.txt - when the installation is done the file working.txt changes name to working_done.txt

  4. condensed_log.txt - condensed process log contains process levels to be displayed in the UI during processing

  5. completed.txt - created when processing is done, contain the timestamp of when the processing was done.

  6. error.txt - will be created in case an error occurred during processing.

    can contain the following errors:

    • Error : FASTA file uploaded as input. Upload FASTQ, or ZIP or GZ archives.
    • Error : Archive contained a file with no extension and the file type could not be determined.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
    • Error : File had no extension and the file type could not be determined.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
    • Error : Unknown file type as input.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
  7. zipTemp.txt - contain the output of the unzip of zip files uploaded

  8. gzTemp.txt - containt the output of the gzip (unpacking) of the files uploaded

  9. ploidy.txt - first line ploidy of the experiment, second line baseline ploidy, (created by project.create_server.php)

  10. name.txt - contains the name of the project (created by project.create_server.php).

  11. parent.txt - contains the parental strain of the project (created by project.create_server.php).

  12. dataType.txt - contains the code for the datatype (created by project.create_server.php):

    • 0 = SnpCghArray
    • 1:0 = WGseq single end read or 1:1 = WGseq Paired end read
    • 2:0 = ddRADseq single end read or 2:1 = ddRADseq Paired end read
    • 3:0 = RNAseq single end read or 3:1 = RNAseq Paired end read
    • 4:0 = IonExpress-seq single end read or 4:1 = IonExpress-seq Paired end read
  13. dataBiases.txt - contains data biases based on sequencing technology (created by project.create_server.php). each row is TRUE or FALSE based on the bias. the following biases are available for each sequencing (ordered in the order they appear in the file)

    • SnpCghArray - GC-content bias, chromosome-end bias.
    • WGseq - GC-content bias, chromosome-end bias.
    • ddRADseq - fragment-length bias, GC-content bias, chromosome-end bias.
    • RNAseq - ORF-length bias, GC-content bias, chromosome-end bias.
    • IonExpress-seq - GC-content bias, chromosome-end bias.
  14. restrictionEnzymes.txt - (created by project.create_server.php) created only for ddRADseq contain the name of restriction enzymes. (currently only MfeI_MboI) 15 snowAnnotations.txt - (created by project.create_server.php) contains 1 if during project creation showing annotations was chosen, otherwise 0. 16 genome.txt - (created by project.create_server.php) *1st line : (String) genome name. *2nd line : (String) hapmap name. (appears only if a hapmap was chosen)

  15. manualLOH.txt (not active) - (can be created by project.create_server.php) - if created contains manual LOH annotation information. format: one entry per pline... if input was provided, tab-delimited channels (chrID,startbp,endbp,R,G,B)

  16. datafile_<index>.<ext> - created by process_input_files.php when called during the installation process, saves the final data that was received from the user (all other data is deleted) ext can be fastq,xls,txt, or none to signify problem

Files created during Whole Genome NGS - Single end read (including intermediate files that my be deleted during the process)

  1. parent.txt - used to save the name of the directory of the project
  2. dataType.txt - contains number to describe the data type uploaded:
    • SnpCgh microarray - 1:0
  3. upload_size_1.txt - saves the size (in bytes) of the uploaded file
  4. datafiles.txt - contains the name of the datafiles to work on (if a Gareth's pileup format file was uploaded then the file will contain null1 and null2)
  5. data.sam - the sam file created from the user input
    • If the user inserted bam file - it's created by scripts_seqModules/bam2sam.sh (the original bam file and the .bai file are deleted)
  6. data_r1.b.fastq, data_r2.b.fastq, data_r2.c.fastq - temp files created by scripts_seqModules/sam2fastq.sh when converting sam files to fastq (these files are deleted by scripts_seqModules/sam2fastq.sh)
  7. data_r1.fastq, data_r2.fastq - final files created by scripts_seqModules/sam2fastq.sh when converting sam file to paired-FASTQ files
  8. SNP_CNV_v1.txt - created by scripts_seqModules/Gareth2pileups.sh in order to convert the uploaded tab-delimited text data to pileup formats used in the pipeline, during the processing the file temp_dir/temp.SNP_CNV_v1.txt is created and deleted
  9. putative_SNPs_v4.txt - created by scripts_seqModules/Gareth2pileups.sh in order to convert the uploaded tab-delimited text data to pileup formats used in the pipeline, during the processing the file temp_dir/temp.putative_SNPs_v4.txt is created and deleted
  10. genome.txt - contain genome and hapmap names in use, created by

Files created during Whole Genome NGS - Paired end read (including intermediate files that my be deleted during the process)

In addition to the files that are created in single end read:

  1. upload_size_2.txt - saves the size of the second uploaded file (in bytes)

Quota System files

  1. globalquota.txt - should be in the users folder, contain only one number which specifies the global quota for all users
  2. quota.txt - if located inside a user folder, then this sets a personal quota for the user.
  3. totalSize.txt - created by the quota system for each project/genome/hapmap and saves the total size of the entire project (to avoid calculating the size by the server each time).