Skip to content
This repository has been archived by the owner on Mar 28, 2023. It is now read-only.

add sompy app and indexing vcfs #5

Merged
merged 4 commits into from
Jun 1, 2022
Merged

add sompy app and indexing vcfs #5

merged 4 commits into from
Jun 1, 2022

Conversation

rebeccahaines1
Copy link
Contributor

@rebeccahaines1 rebeccahaines1 commented May 31, 2022

This change is Reviewable

@aledj2 aledj2 self-assigned this May 31, 2022
Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aledj2

Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on @aledj2)

Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)


dxapp.json line 2 at r1 (raw file):

{
	"name": "tso500_output_parser_v2.0",

can we use semantic versioning (v2.0.0). I think this may be v1.2.0


src/code.sh line 8 at r1 (raw file):

# make output folder
mkdir -p /home/dnanexus/out/logfiles/logfiles/ /home/dnanexus/out/vcf_index/vcf_index
#install tabix (needed for indexing the vcfs)

remove if not used


src/code.sh line 119 at r1 (raw file):

        cd /home/dnanexus/out/vcf_index/vcf_index
        tabix -p vcf $filename.gz
        cd ~

can we delete the downloaded vcf so we don't have duplicates in the project?

Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)


src/code.sh line 118 at r1 (raw file):

        bgzip -c $filename > $gzip_vcf_path
        cd /home/dnanexus/out/vcf_index/vcf_index
        tabix -p vcf $filename.gz

should this be tabix -p vcf $gzip_vcf_path

Copy link
Contributor Author

@rebeccahaines1 rebeccahaines1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @aledj2)


dxapp.json line 2 at r1 (raw file):

Previously, aledj2 (Aled Jones) wrote…

can we use semantic versioning (v2.0.0). I think this may be v1.2.0

done


src/code.sh line 8 at r1 (raw file):

Previously, aledj2 (Aled Jones) wrote…

remove if not used

done


src/code.sh line 118 at r1 (raw file):

Previously, aledj2 (Aled Jones) wrote…

should this be tabix -p vcf $gzip_vcf_path

probably, it's doing the same thing.
Have changed this section to output the indexed vcfs to the results folder


src/code.sh line 119 at r1 (raw file):

Previously, aledj2 (Aled Jones) wrote…

can we delete the downloaded vcf so we don't have duplicates in the project?

the downloaded vcf isn't output, only the bgzipped vcf. This could be deleted but it makes loading it in IGV very slow

Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2 and @rebeccahaines1)


src/code.sh line 118 at r1 (raw file):

Previously, rebeccahaines1 wrote…

probably, it's doing the same thing.
Have changed this section to output the indexed vcfs to the results folder

I'm a little confused as to what filefolder is returning (line 115)


src/code.sh line 119 at r1 (raw file):

Previously, rebeccahaines1 wrote…

the downloaded vcf isn't output, only the bgzipped vcf. This could be deleted but it makes loading it in IGV very slow

OK, missed the bgzip step!


src/code.sh line 117 at r3 (raw file):

        filefolder=$(jq -r '.folder' <<< $genome_vcf)
        #gzip_vcf_path=/home/dnanexus/out/vcf_index/vcf_index/$filename.gz
        gzip_vcf_path=$vcf_index_output$filefolder

is there a missing / between the two variables?


src/code.sh line 121 at r3 (raw file):

        gzip_vcf=$gzip_vcf_path/$filename.gz
        bgzip -c $filename > $gzip_vcf
        cd $gzip_vcf_path

so creating a folder for each sample/vcf, creating the filepath, bgzipping with the filepath (into a subflder) then cd-ing into subfolder and then indexing into the subfolder? not the easiest to follow!

could you comment or

mkdir -p $gzip_vcf_path && cd "$_"
bgzip -c ../$filename > filename.gz
tabix -p vcf $filename.gz
cd ..

Code quote:

        gzip_vcf=$gzip_vcf_path/$filename.gz
        bgzip -c $filename > $gzip_vcf
        cd $gzip_vcf_path
        tabix -p vcf $gzip_vcf

Copy link
Contributor Author

@rebeccahaines1 rebeccahaines1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2)


src/code.sh line 118 at r1 (raw file):

Previously, aledj2 (Aled Jones) wrote…

I'm a little confused as to what filefolder is returning (line 115)

It's taking the path (folder) of the vcf file from the json generated by dx describe, which is used to make sure the zipped vcf and index ends up in the write folder afterwards.


src/code.sh line 117 at r3 (raw file):

Previously, aledj2 (Aled Jones) wrote…

is there a missing / between the two variables?

no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though


src/code.sh line 121 at r3 (raw file):

Previously, aledj2 (Aled Jones) wrote…

so creating a folder for each sample/vcf, creating the filepath, bgzipping with the filepath (into a subflder) then cd-ing into subfolder and then indexing into the subfolder? not the easiest to follow!

could you comment or

mkdir -p $gzip_vcf_path && cd "$_"
bgzip -c ../$filename > filename.gz
tabix -p vcf $filename.gz
cd ..

Have added comments to clarify. This step is creating the paths and adding the bgzipped vcf and index to the sample folder in analysis_folder/Results

@rebeccahaines1 rebeccahaines1 marked this pull request as ready for review June 1, 2022 12:06
Copy link
Contributor Author

@rebeccahaines1 rebeccahaines1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @aledj2)


src/code.sh line 117 at r3 (raw file):

Previously, rebeccahaines1 wrote…

no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though

actually i got this wrong- the $filefolder STARTS with "/". sorry for the confusion

Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @aledj2)


src/code.sh line 117 at r3 (raw file):

Previously, rebeccahaines1 wrote…

no- $vcf_index_output ends with a "/"
I do need to delete the commented out line above it though

on line 8?

Copy link
Contributor Author

@rebeccahaines1 rebeccahaines1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @aledj2)


src/code.sh line 117 at r3 (raw file):

Previously, aledj2 (Aled Jones) wrote…

on line 8?

no $vcf_index_output (set on line 8) doesn't end with "/", but $filefolder starts with "/" so don't need to put one in between

Copy link

@aledj2 aledj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 1 of 2 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @rebeccahaines1)

@aledj2 aledj2 merged commit 5693d59 into main Jun 1, 2022
@aledj2 aledj2 deleted the v2.0 branch June 1, 2022 16:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants