Skip to content

MD5 mismatch for several SGDP fastqs #5

@Stikus

Description

@Stikus

Hello, in case anyone is here - maybe I get some help.

We are trying to download SGDP data (all good with HGDP) and we've got repeatedly wrong md5 checksums for several files (FASTQs):

#HGDP00530
b4a74b70718b0c245712d23d825c1d5d  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_1.fastq.gz.bak  
b4a74b70718b0c245712d23d825c1d5d  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_1.fastq.gz.bak2 
b4a74b70718b0c245712d23d825c1d5d  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_1.fastq.gz.bak3 

ce70ba1e8f3ed4bf070712750cc0a36d  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_2.fastq.gz.bak  
7d04683c31741672b6e199994835c7d7  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_2.fastq.gz.bak2 
ce70ba1e8f3ed4bf070712750cc0a36d  /srv/data/DATA/IGSR/SGDP/HGDP00530/ERR1019053_2.fastq.gz.bak3 

#HGDP01172
318f4efaeef3f263aec8e5191a5feed2  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_1.fastq.gz.bak  
318f4efaeef3f263aec8e5191a5feed2  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_1.fastq.gz.bak2 
318f4efaeef3f263aec8e5191a5feed2  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_1.fastq.gz.bak3 

7413c7feb13c4e739c24afe32c6d7482  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_2.fastq.gz.bak  
7413c7feb13c4e739c24afe32c6d7482  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_2.fastq.gz.bak2 
b395f5a5766b073f4e1a48a8a556497d  /srv/data/DATA/IGSR/SGDP/HGDP01172/ERR1019036_2.fastq.gz.bak3 

#HGDP01240
bca581c20769a6b798c9224ca7da73fd  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_1.fastq.gz.bak  
bca581c20769a6b798c9224ca7da73fd  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_1.fastq.gz.bak2 
bca581c20769a6b798c9224ca7da73fd  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_1.fastq.gz.bak3 

ba7054c4006a61a11b0cda71bb6315d1  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_2.fastq.gz.bak  
ba7054c4006a61a11b0cda71bb6315d1  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_2.fastq.gz.bak2 
ba7054c4006a61a11b0cda71bb6315d1  /srv/data/DATA/IGSR/SGDP/HGDP01240/ERR1025627_2.fastq.gz.bak3 

As you can see - problematic files are paired FASTQs, checksums are same for attempts (most of them) and didn't match checksums from file:

url	md5	Data collection	Data type	Analysis group	Sample	Population	Data reuse policy
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR101/003/ERR1019053/ERR1019053_1.fastq.gz	89aaa5d174f6e04db4bd691763810e1e	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP00530	French in France (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR101/003/ERR1019053/ERR1019053_2.fastq.gz	72f044469c51e9af660b7db6d55fda99	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP00530	French in France (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR101/006/ERR1019036/ERR1019036_2.fastq.gz	ba7eb8c2fa30a8e37fd29c1d538a6811	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP01172	Bergamo in Italy(Bergamo) (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR101/006/ERR1019036/ERR1019036_1.fastq.gz	ff79890ba960ce0e724131635f8deaea	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP01172	Bergamo in Italy(Bergamo) (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR102/007/ERR1025627/ERR1025627_1.fastq.gz	878024b924c7a39cda5d2588d70cc0bb	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP01240	Hezhen in China (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR102/007/ERR1025627/ERR1025627_2.fastq.gz	68336a8afee8cd2b83c14fc16af4120d	Simons Genome Diversity Project	sequence	PCR-free high coverage	HGDP01240	Hezhen in China (SGDP)	http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/simons_diversity_data/README_Simons_diversity_datareuse_statement.md

Can you check? Maybe there are any problems with these files on your end?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions