-
Notifications
You must be signed in to change notification settings - Fork 0
add sompy app and indexing vcfs #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on @aledj2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)
dxapp.json
line 2 at r1 (raw file):
{ "name": "tso500_output_parser_v2.0",
can we use semantic versioning (v2.0.0). I think this may be v1.2.0
src/code.sh
line 8 at r1 (raw file):
# make output folder mkdir -p /home/dnanexus/out/logfiles/logfiles/ /home/dnanexus/out/vcf_index/vcf_index #install tabix (needed for indexing the vcfs)
remove if not used
src/code.sh
line 119 at r1 (raw file):
cd /home/dnanexus/out/vcf_index/vcf_index tabix -p vcf $filename.gz cd ~
can we delete the downloaded vcf so we don't have duplicates in the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)
src/code.sh
line 118 at r1 (raw file):
bgzip -c $filename > $gzip_vcf_path cd /home/dnanexus/out/vcf_index/vcf_index tabix -p vcf $filename.gz
should this be tabix -p vcf $gzip_vcf_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @aledj2)
dxapp.json
line 2 at r1 (raw file):
Previously, aledj2 (Aled Jones) wrote…
can we use semantic versioning (v2.0.0). I think this may be v1.2.0
done
src/code.sh
line 8 at r1 (raw file):
Previously, aledj2 (Aled Jones) wrote…
remove if not used
done
src/code.sh
line 118 at r1 (raw file):
Previously, aledj2 (Aled Jones) wrote…
should this be tabix -p vcf $gzip_vcf_path
probably, it's doing the same thing.
Have changed this section to output the indexed vcfs to the results folder
src/code.sh
line 119 at r1 (raw file):
Previously, aledj2 (Aled Jones) wrote…
can we delete the downloaded vcf so we don't have duplicates in the project?
the downloaded vcf isn't output, only the bgzipped vcf. This could be deleted but it makes loading it in IGV very slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)
src/code.sh
line 118 at r1 (raw file):
Previously, rebeccahaines1 wrote…
probably, it's doing the same thing.
Have changed this section to output the indexed vcfs to the results folder
I'm a little confused as to what filefolder is returning (line 115)
src/code.sh
line 119 at r1 (raw file):
Previously, rebeccahaines1 wrote…
the downloaded vcf isn't output, only the bgzipped vcf. This could be deleted but it makes loading it in IGV very slow
OK, missed the bgzip step!
src/code.sh
line 117 at r3 (raw file):
filefolder=$(jq -r '.folder' <<< $genome_vcf) #gzip_vcf_path=/home/dnanexus/out/vcf_index/vcf_index/$filename.gz gzip_vcf_path=$vcf_index_output$filefolder
is there a missing / between the two variables?
src/code.sh
line 121 at r3 (raw file):
gzip_vcf=$gzip_vcf_path/$filename.gz bgzip -c $filename > $gzip_vcf cd $gzip_vcf_path
so creating a folder for each sample/vcf, creating the filepath, bgzipping with the filepath (into a subflder) then cd-ing into subfolder and then indexing into the subfolder? not the easiest to follow!
could you comment or
mkdir -p $gzip_vcf_path && cd "$_"
bgzip -c ../$filename > filename.gz
tabix -p vcf $filename.gz
cd ..
Code quote:
gzip_vcf=$gzip_vcf_path/$filename.gz
bgzip -c $filename > $gzip_vcf
cd $gzip_vcf_path
tabix -p vcf $gzip_vcf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2)
src/code.sh
line 118 at r1 (raw file):
Previously, aledj2 (Aled Jones) wrote…
I'm a little confused as to what filefolder is returning (line 115)
It's taking the path (folder) of the vcf file from the json generated by dx describe, which is used to make sure the zipped vcf and index ends up in the write folder afterwards.
src/code.sh
line 117 at r3 (raw file):
Previously, aledj2 (Aled Jones) wrote…
is there a missing / between the two variables?
no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though
src/code.sh
line 121 at r3 (raw file):
Previously, aledj2 (Aled Jones) wrote…
so creating a folder for each sample/vcf, creating the filepath, bgzipping with the filepath (into a subflder) then cd-ing into subfolder and then indexing into the subfolder? not the easiest to follow!
could you comment or
mkdir -p $gzip_vcf_path && cd "$_"
bgzip -c ../$filename > filename.gz
tabix -p vcf $filename.gz
cd ..
Have added comments to clarify. This step is creating the paths and adding the bgzipped vcf and index to the sample folder in analysis_folder/Results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2)
src/code.sh
line 117 at r3 (raw file):
Previously, rebeccahaines1 wrote…
no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though
actually i got this wrong- the $filefolder STARTS with "/". sorry for the confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @aledj2)
src/code.sh
line 117 at r3 (raw file):
Previously, rebeccahaines1 wrote…
no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though
on line 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @aledj2)
src/code.sh
line 117 at r3 (raw file):
Previously, aledj2 (Aled Jones) wrote…
on line 8?
no $vcf_index_output (set on line 8) doesn't end with "/", but $filefolder starts with "/" so don't need to put one in between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 2 files at r2.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @rebeccahaines1)
This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"