Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure trailing newline is included when parsing GFF3 region #1573

Merged
merged 1 commit into from
Feb 24, 2025

Conversation

ivan-aksamentov
Copy link
Member

Resolves #1572

When parsing GFF3, we split up the file on #sequence-region pragma blocks. As per specification, there could be multiple of them, and we want to know about this.

The bio crate's GffReader does not support regions, so we do the splitting manually and then pass each region's slice to bio GffReader. However, due to mistakenly trimmed trailing newlines in the regions on our side, bio GFF3 parser would fail when it encounters a commented line as a last line of the region. The error and repro is described in #1572.

In this PR I:

  • make sure that the region content string is properly extracted, including the trailing newline character
  • remove .trim() call such that this newline isn't removed

This way bio can understand our GFF3 blocks even when they have comments.

Resolves #1572

When parsing GFF3, we split up the file on `#sequence-region` pragma blocks. As per specification, there could be multiple of them, and we want to know about this.

The `bio` crate's `GffReader` does not support regions, so we do the splitting manually and then pass each region's slice to `bio` `GffReader`. However, due to mistakenly trimmed trailing newlines in the regions on our side, `bio` GFF3 parser would fail when it encounters a commented line as a last line of the region. The error and repro is described in #1572.

In this PR I:
 - [x] make sure that the region content string is properly extracted, including the trailing newline character
 - [x] remove `.trim()` call such that this newline isn't removed

This way `bio` can understand our GFF3 blocks even when they have comments.
@ivan-aksamentov ivan-aksamentov merged commit f2b4dc2 into master Feb 24, 2025
19 checks passed
@ivan-aksamentov ivan-aksamentov deleted the fix/gff-reader-comments branch February 24, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Starting a gff's line with # indicating a comment causes annotation parsing error
1 participant