Skip to content

Commit

Permalink
Added documentation on how to procceed with resequencing
Browse files Browse the repository at this point in the history
  • Loading branch information
svarona committed Jul 16, 2024
1 parent af3f1e7 commit 1200b79
Show file tree
Hide file tree
Showing 2 changed files with 151 additions and 0 deletions.
150 changes: 150 additions & 0 deletions docs/Handle-sample-resequencing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Handle sample resequencing

Sometimes, a sequencing run has to be repeated. Some of the possible causes are:

- The samples do not achieve the desired depth of the target organism (expired libraries or low-performance libraries are used because they are available and there is a hurry)
- The sequencing has poor quality, with few PF Clusters (Passing Filter clusters)
- R2 has failed and can only be used as single-end
- Others

In these cases, the researcher decides to resequence the samples in another run, and they usually provide the same name in the sequencing samplesheet. Therefore, upon demultiplexing, the samples might be named exactly the same, causing some issues when creating symbolic links in the RAW folder.

In these cases, there are two ways to proceed:

1. If the samples are named exactly the same in `/srv/fastq_repo` (e.g., `/srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S2_R1_001.fastq.gz` and `/srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S2_R1_001.fastq.gz`).
- In this case, we cannot use the bu-isciii tools. The symbolic links must be created manually.
- When creating the symbolic link in RAW, we will rename the samples from the second run, adding the name of the second run, so that the RAW folder looks something like this:

```bash
20130635_S293_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R1_001.fastq.gz
20130635_S293_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R2_001.fastq.gz
20150424_S296_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R1_001.fastq.gz
20150424_S296_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R2_001.fastq.gz
20130635-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R1_001.fastq.gz
20130635-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R2_001.fastq.gz
20150424-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R1_001.fastq.gz
20150424-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R2_001.fastq.gz
```

2. If the samples are not named exactly the same in `/srv/fastq_repo` (because they differ in the _S) (e.g. `/srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S2_R1_001.fastq.gz` and `/srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S25_R1_001.fastq.gz`).
- In this case, it can be done with the bu-isciii tools.
- Once we have used the bu-isciii tools, the files will need to be renamed manually to match exactly as in the previous case:

```bash
20130635_S293_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R1_001.fastq.gz
20130635_S293_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R2_001.fastq.gz
20150424_S296_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R1_001.fastq.gz
20150424_S296_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R2_001.fastq.gz
20130635-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R1_001.fastq.gz
20130635-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R2_001.fastq.gz
20150424-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R1_001.fastq.gz
20150424-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R2_001.fastq.gz
```

In these cases, what we are going to do is create different `samples_id.txt` files. We will have:

- `samples_id.txt`: This will contain the list of ALL the samples analyzed. In this specific example, it would look like this:

```bash
20130635
20150424
20130635-NovaSeq102
20150424-NovaSeq102
```

- `samples_id01.txt`: This will contain only the samples from the first delivery01. It would look like this:

```bash
20130635
20150424
```

- `samples_id02.txt`: This will contain only the samples from the second delivery02. It would look like this:

```bash
20130635-NovaSeq102
20150424-NovaSeq102
```

Next, what we will do is create a symbolic link in `00-reads` for the new samples, so that the 00-reads folder will look like this:

```bash
20130635_R1.fastq.gz
20130635_R2.fastq.gz
20150424_R1.fastq.gz
20150424_R2.fastq.gz
20130635-NovaSeq102_R1.fastq.gz
20130635-NovaSeq102_R2.fastq.gz
20150424-NovaSeq102_R1.fastq.gz
20150424-NovaSeq102_R2.fastq.gz
```

> [!IMPORTANT]
> This is slightly different in the case you have to **concatenate the sample's reads**.
When we have the scenario where the **samples are going to be concatenated**, for example, to increase sequencing depth, it is important to ensure that the reads can be concatenated, which means they must have been processed in the same way or even come from the same library. The first steps in the RAW folder are the same as in the previous case; we will create symbolic links to the samples in `/srv/fastq_repo`, renaming those from the second run. However, we will need to add a `lablog` where we have, for example in our case:
```bash
cat 20130635_S293_R1_001.fastq.gz 20130635-NovaSeq102_R1.fastq.gz > 20130635-merged_R1.fastq.gz
cat 20130635_S293_R2_001.fastq.gz 20130635-NovaSeq102_R2.fastq.gz > 20130635-merged_R2.fastq.gz
cat 20150424_S296_R1_001.fastq.gz 20150424-NovaSeq102_R1.fastq.gz > 20150424-merged_R1.fastq.gz
cat 20150424_S296_R2_001.fastq.gz 20150424-NovaSeq102_R2.fastq.gz > 20150424-merged_R2.fastq.gz
```
So that the RAW folder would look like this:
```bash
20130635_S293_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R1_001.fastq.gz
20130635_S293_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20130635_S293_R2_001.fastq.gz
20150424_S296_R1_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R1_001.fastq.gz
20150424_S296_R2_001.fastq.gz -> /srv/fastq_repo/MiSeq_GEN_235_20240702_user/20150424_S296_R2_001.fastq.gz
20130635-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R1_001.fastq.gz
20130635-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20130635_S293_R2_001.fastq.gz
20150424-NovaSeq102_R1.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R1_001.fastq.gz
20150424-NovaSeq102_R2.fastq.gz -> /srv/fastq_repo/NovaSeq_GEN_102_202408015_user/20150424_S296_R2_001.fastq.gz
20130635-merged_R1.fastq.gz
20130635-merged_R2.fastq.gz
20150424-merged_R1.fastq.gz
20150424-merged_R2.fastq.gz
```
The next step is to create the `samples_id`, so that, just like in the previous case, we will still have several `samples_ids` but in this case, instead of adding `sample-NovaSeq102`, we will add `sample-merged`. They would look like this:
- `samples_id.txt`: This will contain the list of ALL the samples analyzed. In this specific example, it would look like this:
```bash
20130635
20150424
20130635-merged
20150424-merged
```
- `samples_id01.txt`: This will contain only the samples from the first delivery01. It would look like this:
```bash
20130635
20150424
```
- `samples_id02.txt`: This will contain only the samples from the second delivery02. It would look like this:
```bash
20130635-merged
20150424-merged
```
Next, what we will do is create a symbolic link in `00-reads` for the new samples, so that the 00-reads folder will look like this:
```bash
20130635_R1.fastq.gz
20130635_R2.fastq.gz
20150424_R1.fastq.gz
20150424_R2.fastq.gz
20130635-merged_R1.fastq.gz
20130635-merged_R2.fastq.gz
20150424-merged_R1.fastq.gz
20150424-merged_R2.fastq.gz
```
> [!IMPORTANT]
> Don't forget to change the `samples_id.txt` to `samples_id2.txt` in all the lablogs of the new analyses so that the correct samples are analyzed.
1 change: 1 addition & 0 deletions docs/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
7. [[Update policy (DB and software)]]
8. How to: guides for BU-ISCIII members
- [[How to manage services in iSkyLims]]
- [[Handle sample resequencing]]
- [[Assembly service]]
- [[Viralrecon service]]
- [[ExomeEB ExomeTrio WGSTrio service]]
Expand Down

0 comments on commit 1200b79

Please sign in to comment.