-
Notifications
You must be signed in to change notification settings - Fork 2
/
CHANGELOG
236 lines (156 loc) · 7.62 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
version-1.5.5
- remove compilation warning (#61,#64)
- fixes in mpiBWAByChr(#61)
- fix memory leaks (#64,#63,#62)
- fix parsing fastq (#65)
version-1.5.4
- improve IO (#59)
version-1.5.3
- fix a memory leak with trimmed reads and single end (#57)
- improved performances with trimmed alignment
version-1.5.2
- fix arguments with getopt (issue #53)
- fix compilation error (issue #54)
- fix fixmates (issue#55)
version-1.5.1
- fix bug in trimmed fastq condition (issue #45)
- fix load balancing (thanks to Vasim) (issue #47)
version-1.5
- Multithread IO reading/writing (issue #40)
- Add PG line in header (issue #40)
- Fix mate score computing (issue #41)
version-1.4 2021-04-13
- Update bwa-mem to 0.7.17 (issue #38)
- Add option to generate BAM, BGZF (issue #33)
- Refactoring and cleaning the code
- Multithread some parts of the code
version-1.3 2021-03-26
- Add an option -f to fixe the mate score, quality and cigar (like samtools fixmate does).
Usefull for compatibility with mpiMarkdup or samtools markdup.
When -f is set there is no more discordant.sam.
A typycal workflow is mpiBWAByChr (f) + mpiSort (u + b) + samtools index + samtools markdup
or mpiBWAByChr (f) + mpiMarkdup (issue#31)
- Add the option -K to set the number of bases to analyse (used for reproducibility) (issue#32).
- Fix a bug for offset read initialization and invalid free (issue #34)
- Fix load balancing (issue#30)
- Fix some memory leaks (issue#36)
version-1.2
- Fix release not link in README.md (issue#28)
version-1.1 2020-06-09
- Switch to multithread support in MPI_init
- Add a benchmark section in documentation
version-1.0 2020-03-25
- mpiBWA: MPI version of BWA MEM
----------------------------------------------------------------------
Preliminary releases
Release from the 11/05/2020
1) Switch to multithread support in MPI_init
2) Add a benchmark section in documentation
Release from the 06/03/2020
1) Review of the management of discordant reads in case of mpiBWAByChr regarding the question we wonder in the previous release
Case of discordant or unmapped fragment : we decide the fragment discordant goes in the chromosome SAM it belongs and in the discordant SAM.
The discordant SAM is here to help the mpiMarkDup step. Indeed discordant duplicate will be sort and marked like the others and then it will be possible to mark discordant reads in the chromosom SAM.
2) Fix memory leaks
Release from the 15/01/2020
1) Add a version of the main that aligns and splits the result by chromosome.
We create the output file with the header and SAM files by chromosome, a SAM for discordant and unmapped reads.
Rational:
The current version of mpiBWA creates one big SAM file. When this SAM is big (like around one tera bytes big or mor for whole genome sequencing data with high depth pf coverage), it makes it difficult to sort that file. The idea with this version is to create a SAM file by chromosome.
Each chromosome name comes from the header line of the genome reference. This way, the SAM file to sort and/or to mark duplicates is much smaller so we need less RAM and CPU by chromosome.
The extra overhead of the splitting is negligible compare with previous version.
Now we can sort individual chromosome in parallel, for instance the chr1 of 300X WGS is equivalent to a 30X.
Warning:
This version is under construction. It has been tested in the case in which the fastq files are trimmed.
Another tests are on-going. The sorting program is not updated to deal with chromosome independently.
This is under construction too.
How to use:
In the Makefile.am replace the line
pbwa7_SOURCES = main_parallel_version.c
with
pbwa7_SOURCES = main_parallel_version_split_by_chr.c
What's next:
1) Shall we include secondary alignment
2) Shall we include discordant reads in the chromosome it belongs?
Release from the 12/12/2019
1) remove MPI call after finalyze
Release from the 3/04/2019
Changes in Master branch
1) Add support of single read trimmed or not
Release from the 10/07/2018
Changes in Master branch
1) Fix a bug during the mapping in shared memory of the reference genome
This bug didn't appear with openMPI version but Intel compiler complains.
2) Creation of a google group
https://groups.google.com/forum/#!forum/hpc-bioinformatics-pipeline
3) To improve performance on Lustre file system removed the “suid” mount option (rw,nosuid,flock,lazystatfs).
With Beegfs set the "flock" to "on" for reproducibility.
Release from the 30/04/2018
Changes in Master branch
1) Add support for trimmed reads.
2) Be aware of the flock mode on parallel file system (Lustre, beegfs): flock must be on.
Otherwise reproducibility is not guaranteed.
Release from the 21/03/2018
1) Fix an inversion in file handle (line 1039 and 1056)
Release from 23/12/2017
changes in Master branch
1) 100% reproducibility with the pipeline control (bwa mem -t 8)
2) remove memory limits now offset are computed on 1gb buffer
3) remove some memory leaks
4) optimization of code
Release from 04/12/2017
Changes in the branch Master.
1) Fix a memory leak.
Release from 01/12/2017
Changes in the branch Master.
Fix the mpi_read_at buffer limit size.
Release from 29/11/2017
Add a new branch called Experimental.
Warning: This is experimental work, do not use in production. But test it and send us reports.
Release from 21/11/2017
Changes in LAZYCHUNCK branch:
1) forget a end condition
Release from 20/11/2017
Changes in LAZYCHUNCK branch:
1) remove memory leaks
2) update results section
3) test with 600Gb (x2) fastq files and 100 jobs: ok. (20 mn to approximate chuncks offset)
4) reproducibility with constant number of jobs: ok.
Notes:
If want 100% reproducibility whatever the number of jobs use the master branch or the FULLMPI branch.
But if you want go faster use LAZYCHUNK branch.
Release from 17/11/2017
Changes in LAZYCHUNCK branch
1) Improvement of the algorithm for window sizing.
2) Change the number of bases for the estimation line 541.
Release from 15/11/2017
Changes in LAZYCHUNK branch
1) Due to a mmap in the previous version the virtual memory may be big when the fastq file is large.
Some scheduler like Torque doesn't like this. We review the algorithm of computing chuncks.
first we divide the file in window of size (number of jobs) * 1Gb. On each window jobs approximate chuncks of 10 mega bases.
this way the virtual memory stay low.
2) We also saw problem on some architecture with MPI file read shared we replace it with MPI file read at.
3) We have also other projects of parallelization (sorting, marking duplicates, clustering... ) and we are looking for people willing to help us for developments and tests. Don't hesitate to contact me (frederic.jarlier@curie.fr).
Release from 30/06/2017
Major changes:
=======
First release
We have implemented a new algorithm.
In this algorithm you have master jobs and aligner jobs.
Master jobs are responsible for computing offset and chunk sizes.
This way all the chuncks have the same number of bases.
Then master jobs send chunks to bwa-mem and are aligned.
Finally master jobs write in the result sam file.
the total number of jobs must be: (number of master) * 8
8 is the number of threads used by bwa-mem.
First result test on broadwell of TGCC
Condition: Due to mpi_read_at buffer limit of 2gb.
You should limit the initial buffer read to 2gb per master jobs.
Futur release will remove this condition.
Tested on the SRR2052 WGS with 352*8 cpu
alignment time: 26 mn
Time to compute chunks: 8s
Scalability : ok
Reproducibility : ok
Command line :mpi_run -n 352 -c 8
or : MSUB -n 352
MSUB -c 8