Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umi qiaseq #324

Closed
genesandbones opened this issue Mar 6, 2024 · 3 comments
Closed

umi qiaseq #324

genesandbones opened this issue Mar 6, 2024 · 3 comments
Labels
question Further information is requested

Comments

@genesandbones
Copy link

genesandbones commented Mar 6, 2024

Description of the bug

I'm getting an error using the umi tools for a 2x100 qiaseq library.
The error is this: ValueError: barcode regex(es) do not include any umi groups (starting with 'umi_') regex.Regx('=', flags=regex.v0), None
image

I suspect this is because the qiaseq library is not setup on a sequencer to include any base masking and so therefore the UMI's are not added into the header, which as I understand, is what the umi_ is looking for.

My current command is this: --with_umi --umitools_extract_method regex --umitools_bc_pattern = ".+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12}).+" which is the command I have found in #49

I'm looking for help either writing the regex differently as a workaround or if there is something wrong with the umi handling. Thanks!

Command used and terminal output

nextflow run nf-core/smrnaseq -profile docker -r 2.3.0 -c ${inputConfig} --input ${inputFile} --with_umi --umitools_extract_method regex --umitools_bc_pattern = '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' --outdir "." -work-dir './work/'


Execution cancelled -- Finishing pending tasks before exit
-[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:UMITOOLS_EXTRACT (BS898-test-umi)'

Caused by:
  Process `NFCORE_SMRNASEQ:UMITOOLS_EXTRACT (BS898-test-umi)` terminated with an error exit status (1)

Command executed:

  umi_tools \
      extract \
      -I BS898-test-umi.umi_dedup.sorted.fastq.gz \
      -S BS898-test-umi.umi_extract.fastq.gz \
      --extract-method=regex --bc-pattern='=' \
      > BS898-test-umi.umi_extract.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:UMITOOLS_EXTRACT":
      umitools: $( umi_tools --version | sed '/version:/!d; s/.*: //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Matplotlib created a temporary cache directory at /tmp/matplotlib-vl6x_rga because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Traceback (most recent call last):
    File "/usr/local/bin/umi_tools", line 11, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main
      module.main(sys.argv)
    File "/usr/local/lib/python3.9/site-packages/umi_tools/extract.py", line 335, in main
      extract_cell, extract_umi = U.validateExtractOptions(options)
    File "/usr/local/lib/python3.9/site-packages/umi_tools/Utilities.py", line 1177, in validateExtractOptions
      raise ValueError("barcode regex(es) do not include any umi groups "
  ValueError: barcode regex(es) do not include any umi groups (starting with 'umi_') regex.Regex('=', flags=regex.V0), None

Relevant files

/*


  • Nextflow standard config file for smRNAseq

  • Defines bundled input files and tools required
  • to run a pipeline

*/

params {
config_profile_name = 'name'
config_profile_description = 'nf-core smRNAseq profile'
max_memory = '120GB'
max_cpus = 30
max_time = '24.h'
cleanup = true

mirtrace_species = "hsa"
mirna_gtf = "/home/Documents/references/hsa.gff3"
mature = "/home/Documents/references/mature.fa"
hairpin = "/home/Documents/references/hairpin.fa"
genome = "GRCh38"

protocol = "qiaseq"
fastp_min_length = 15

profiles {
  debug {
    cleanup = false
  }
}

}

System information

No response

@genesandbones genesandbones added the bug Something isn't working label Mar 6, 2024
@apeltzer
Copy link
Member

apeltzer commented Mar 7, 2024

This entirely depends on the sequence of UMIs you're using. The reads are dedupped before the UMis are removed, then extracted and added to the fastq header. This is what I used with some QIAGEN miRNA UMI data:

nextflow run nf-core/smrnaseq \
    --input Samplesheet.csv \
    --outdir 01_smrnaseq \
    --genome GRCh38 -profile yourprofile\
    --mirtrace_species "hsa" \
    --skip_mirdeep \
    --protocol "qiaseq" \
    --umitools_extract_method regex \
    --umitools_bc_pattern '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' \
    --save_umi_intermeds \
    --with_umi \
    -c extra_resources.config \
    -resume

@apeltzer apeltzer added question Further information is requested and removed bug Something isn't working labels Mar 7, 2024
@apeltzer
Copy link
Member

apeltzer commented Mar 7, 2024

This is weird:

image

And the reason is that you should just supply the pattern, not an additional "=" in between.

--umitools_bc_pattern = '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=....

Should be:

--umitools_bc_pattern '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)'

I'll close this as this should be fine

@apeltzer apeltzer closed this as completed Mar 7, 2024
@genesandbones
Copy link
Author

Thanks, this does seem to work, though now I'm getting the mirtrace error that others have (see #262).

Also, the umitools_bc_pattern code I used came from the intro page of smrna seq so this should probably be edited to reflect the suggestion above:
image
Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants