Skip to content
GuyKha edited this page Jul 31, 2016 · 49 revisions

General files that are created during processing

processing time is set to be up to 6 hours except for SnpCgh microarray which is limited to 1 hour of processing

note: the original uploaded file and the files inside it (in case of an archive) is deleted by process_input_files.php

  1. process_log.text - log file for the project processing.

  2. working.txt - generated at the start of processing contain the processing start timestamp, used to indicate front end that the installation still running.

  3. working_done.txt - when the installation is done the file working.txt changes name to working_done.txt

  4. condensed_log.txt - condensed process log contains process levels to be displayed in the UI during processing

  5. completed.txt - created when processing is done, contain the timestamp of when the processing was done.

  6. error.txt - will be created in case an error occurred during processing.

    can contain the following errors:

    • Error : FASTA file uploaded as input. Upload FASTQ, or ZIP or GZ archives.
    • Error : Archive contained a file with no extension and the file type could not be determined.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
    • Error : File had no extension and the file type could not be determined.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
    • Error : Unknown file type as input.\nUpload FASTQ, or ZIP or GZ archives containing a FASTQ file.
  7. zipTemp.txt - contain the output of the unzip of zip files uploaded

  8. gzTemp.txt - containt the output of the gzip (unpacking) of the files uploaded

  9. ploidy.txt - first line ploidy of the experiment, second line baseline ploidy, (created by project.create_server.php)

  10. name.txt - contains the name of the project (created by project.create_server.php).

  11. parent.txt - contains the parental strain of the project (created by project.create_server.php).

  12. dataType.txt - contains the code for the datatype (created by project.create_server.php):

    • 0 = SnpCghArray
    • 1:0 = WGseq single end read or 1:1 = WGseq Paired end read
    • 2:0 = ddRADseq single end read or 2:1 = ddRADseq Paired end read
    • 3:0 = RNAseq single end read or 3:1 = RNAseq Paired end read
    • 4:0 = IonExpress-seq single end read or 4:1 = IonExpress-seq Paired end read
  13. dataBiases.txt - contains data biases based on sequencing technology (created by project.create_server.php). each row is TRUE or FALSE based on the bias. the following biases are available for each sequencing (ordered in the order they appear in the file)

    • SnpCghArray - GC-content bias, chromosome-end bias.
    • WGseq - GC-content bias, chromosome-end bias.
    • ddRADseq - fragment-length bias, GC-content bias, chromosome-end bias.
    • RNAseq - ORF-length bias, GC-content bias, chromosome-end bias.
    • IonExpress-seq - GC-content bias, chromosome-end bias.
  14. restrictionEnzymes.txt - (created by project.create_server.php) created only for ddRADseq contain the name of restriction enzymes.

Files created during Whole Genome NGS - Single end read (including intermediate files that my be deleted during the process)

  1. parent.txt - used to save the name of the directory of the project
  2. dataType.txt - contains number to describe the data type uploaded:
    • SnpCgh microarray - 1:0
  3. upload_size_1.txt - saves the size (in bytes) of the uploaded file
  4. datafiles.txt - contains the name of the datafiles to work on (if a Gareth's pileup format file was uploaded then the file will contain null1 and null2)
  5. data.sam - the sam file created from the user input
    • If the user inserted bam file - it's created by scripts_seqModules/bam2sam.sh (the original bam file and the .bai file is deleted)
  6. data_r1.b.fastq, data_r2.b.fastq, data_r2.c.fastq - temp files created by scripts_seqModules/sam2fastq.sh when converting sam files to fastq (these files are deleted by scripts_seqModules/sam2fastq.sh)
  7. data_r1.fastq, data_r2.fastq - final files created by scripts_seqModules/sam2fastq.sh when converting sam file to paired-FASTQ files
  8. SNP_CNV_v1.txt - created by scripts_seqModules/Gareth2pileups.sh in order to convert the uploaded tab-delimited text data to pileup formats used in the pipeline, during the processing the file temp_dir/temp.SNP_CNV_v1.txt is created and deleted
  9. putative_SNPs_v4.txt - created by scripts_seqModules/Gareth2pileups.sh in order to convert the uploaded tab-delimited text data to pileup formats used in the pipeline, during the processing the file temp_dir/temp.putative_SNPs_v4.txt is created and deleted
  10. genome.txt - contain genome and hapmap names in use, created by

Files created during Whole Genome NGS - Paired end read (including intermediate files that my be deleted during the process)

In addition to the files that are created in single end read:

  1. upload_size_2.txt - saves the size of the second uploaded file (in bytes)

Quota System files

  1. globalquota.txt - should be in the users folder, contain only one number which specifies the global quota for all users
  2. quota.txt - if located inside a user folder, then this sets a personal quota for the user.
  3. totalSize.txt - created by the quota system for each project/genome/hapmap and saves the total size of the entire project (to avoid calculating the size by the server each time).