Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.2.0 Release - Gibson Les Paul #51

Merged
merged 67 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f06f421
move shifted position logic to its own function
donaldcampbelljr Dec 5, 2024
853389c
add variable_shifted_bam_to_bw for shifted_position workflow
donaldcampbelljr Dec 5, 2024
90f4751
minor adjustment removing let
donaldcampbelljr Dec 5, 2024
1c73d6a
add bamshift argument to uniwig
donaldcampbelljr Dec 5, 2024
7bdc691
some refactoring for bamshift flag
donaldcampbelljr Dec 5, 2024
6709167
update uniwig README.md
donaldcampbelljr Dec 5, 2024
050b515
change arg to `no-bamshift` use references for Flags
donaldcampbelljr Dec 9, 2024
4bab465
fix bug when assigning "shift", add clarity in CLI
donaldcampbelljr Dec 9, 2024
26f5dbe
update readme
donaldcampbelljr Dec 9, 2024
4b8b89d
streamline control flow and messaging
donaldcampbelljr Dec 9, 2024
fa1ae56
Merge pull request #50 from databio/dev_position_shift
donaldcampbelljr Dec 9, 2024
ee70949
update changelog and version in prep for 0.1.2 release
donaldcampbelljr Dec 9, 2024
a6014b3
Merge branch 'master' into dev
donaldcampbelljr Dec 9, 2024
5cd0d98
account for -1 shift in bam_to_bed and variable_shift_bam workflows
donaldcampbelljr Dec 11, 2024
d960854
attempt accumulation fix
donaldcampbelljr Dec 12, 2024
5bd44ab
Attempt fix for #43
donaldcampbelljr Dec 12, 2024
bb34c5d
clamp start position for #43
donaldcampbelljr Dec 12, 2024
f12fd2f
clamp number of counts based on chromsize for #43
donaldcampbelljr Dec 12, 2024
8a12cd6
more work towards #56, skip count for start less than current position
donaldcampbelljr Dec 12, 2024
f26bfed
remove checking first record during bam to bed workflow
donaldcampbelljr Dec 13, 2024
3685f94
add bamscale argument for #53
donaldcampbelljr Dec 13, 2024
ebb598a
update changelog (again) for 0.1.2 release
donaldcampbelljr Dec 16, 2024
df511da
update gitignore
donaldcampbelljr Dec 16, 2024
f0d7f2a
refactor and add wig_shift variable to reduce code duplication
donaldcampbelljr Dec 16, 2024
9673d0e
fix for #34, overwrite zoom
donaldcampbelljr Dec 16, 2024
baeebaa
fix scaling for #53 by changing count and scale to f32
donaldcampbelljr Dec 16, 2024
4ce49dd
add ga4gh refget digest functionality
nsheff Dec 17, 2024
99576b4
minor cleanup
nsheff Dec 17, 2024
85d5ed9
add py init for module
nsheff Dec 17, 2024
9684cd3
register digests module correctly
nsheff Dec 17, 2024
d17c7da
begin adding more tests to cover igd workflow
donaldcampbelljr Dec 18, 2024
5c53208
change nCnts incrementing
donaldcampbelljr Dec 18, 2024
d28ff7d
do not reset nCnts, use it for tests
donaldcampbelljr Dec 18, 2024
93fef4c
add fields to igd_t struct to help with testing during creation
donaldcampbelljr Dec 18, 2024
af8bbbc
some clean up
donaldcampbelljr Dec 18, 2024
2998139
add new test_igd_create_then_load_from_disk
donaldcampbelljr Dec 18, 2024
6f383aa
attempt to read from buffer for test_igd_create_then_load_from_disk f…
donaldcampbelljr Dec 18, 2024
925c056
update test assertions
donaldcampbelljr Dec 19, 2024
e53e457
add igd test create then search
donaldcampbelljr Dec 19, 2024
8f3dc68
potential fix #45, comment out debugging lines
donaldcampbelljr Dec 19, 2024
abaeb96
update rstest, use cases for new test, rethink source bedfiles and qu…
donaldcampbelljr Dec 19, 2024
508c827
Fix for #61
donaldcampbelljr Dec 19, 2024
b8afd94
update changelog
donaldcampbelljr Dec 20, 2024
b33c233
Merge pull request #60 from databio/dev_igd_45
donaldcampbelljr Dec 20, 2024
f8c8d4b
cargo fmt
donaldcampbelljr Dec 20, 2024
f52c093
comment out second test case because it sometimes changes order of se…
donaldcampbelljr Dec 20, 2024
3d3bddf
attempt to lessen code cov reqs
donaldcampbelljr Dec 20, 2024
a05b2ed
Revert "attempt to lessen code cov reqs"
donaldcampbelljr Dec 20, 2024
86ffa77
consolidate get_dynamic_reader
nsheff Dec 20, 2024
47b6316
Merge pull request #58 from databio/digests
nsheff Dec 20, 2024
1536d3d
add newlines to readme
nsheff Jan 8, 2025
f008db5
add R bindings to readme
nsheff Jan 8, 2025
33d4851
update docs
nsheff Jan 9, 2025
5c66be9
potential fix for #64
donaldcampbelljr Jan 9, 2025
27d52f5
attempt to use shared hashmap for #65 does not work
donaldcampbelljr Jan 9, 2025
391ba68
Revert "attempt to use shared hashmap for #65 does not work"
donaldcampbelljr Jan 9, 2025
5f5973b
working solution for #65
donaldcampbelljr Jan 9, 2025
8ae3d41
cargo fmt
donaldcampbelljr Jan 9, 2025
bb5bc89
comment out r-devel test
donaldcampbelljr Jan 10, 2025
ce0967a
Merge pull request #66 from databio/dev_64
donaldcampbelljr Jan 10, 2025
91f2171
fix for #52
donaldcampbelljr Jan 10, 2025
81cde28
cargo fmt
donaldcampbelljr Jan 10, 2025
c4ebf15
update changelog for 0.2.0 release
donaldcampbelljr Jan 13, 2025
32f0580
add license
donaldcampbelljr Jan 13, 2025
ce77a20
rust bindings readme
sanghoonio Jan 13, 2025
d851ce3
Merge branch 'dev' of github.com:databio/gtars into dev
sanghoonio Jan 13, 2025
df443c2
bump versions
nleroy917 Jan 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
# - {os: windows-latest, r: 'release', rust-version: 'stable-msvc', rust-target: 'x86_64-pc-windows-gnu'}
- {os: macOS-latest, r: 'release', rust-version: 'stable'}
- {os: ubuntu-latest, r: 'release', rust-version: 'stable'}
- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
#- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
steps:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ bin/

.DS_Store
.Rhistory
/gtars/tests/data/out/region_scoring_count.csv.gz
9 changes: 9 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2024 gtars authors

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
9 changes: 9 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2024 gtars authors

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 changes: 20 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,73 @@

`gtars` is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, [`geniml`](https:github.com/databio/geniml), a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well.

`gtars` provides three things:
`gtars` provides these things:

1. A rust library crate.
2. A command-line interface, written in rust.
3. A Python package that provides bindings to the rust library.
3. A Python package that provides Python bindings to the rust library.
4. An R package that provides R bindings to the rust library

## Repository organization (for developers)

This repo is organized like so:

1. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
2. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
3. A rust crate (in `/bindings`) that provides Python bindings, and a resulting Python package, so that it can be used within Python.
1. The main gtars rust package in `/gtars`, which contains two crates:
1a. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
1b. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
2. Python bindings (in `/bindings/python`), which consists of a rust package with a library crate (no binary crate) and Python package.
3. R bindings (in `/bindinds/r`), which consists of an R package.

This repository is a work in progress, and still in early development.

## Installation

To install `gtars`, you must have the rust toolchain installed. You can install it by following the instructions [here](https://www.rust-lang.org/tools/install).

You may build the binary locally using `cargo build --release`. This will create a binary in `target/release/gtars`. You can then add this to your path, or run it directly.

## Usage

`gtars` is very early in development, and as such, it does not have a lot of functionality yet. However, it does have a few useful tools. To see the available tools, run `gtars --help`. To see the help for a specific tool, run `gtars <tool> --help`.

Alternatively, you can link `gtars` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:

```toml
[dependencies]
gtars = { git = "https://github.com/databio/gtars" }
```

## Testing

To run the tests, run `cargo test`.

## Contributing

### New internal library crate tools

If you'd like to add a new tool, you can do so by creating a new module within the src folder.

### New public library crate tools

If you want this to be available to users of `gtars`, you can add it to the `gtars` library crate as well. To do so, add the following to `src/lib.rs`:
```rust
pub mod <tool_name>;
```

### New binary crate tools

Finally, if you want to have command-line functionality, you can add it to the `gtars` binary crate. This requires two steps:

1. Create a new `cli` using `clap` inside the `interfaces` module of `src/cli.rs`:

```rust
pub fn make_new_tool_cli() -> Command {

}
```

2. Write your logic in a wrapper function. This will live inside the `functions` module of `src/cli.rs`:

```rust
// top of file:
use tool_name::{ ... }
Expand All @@ -73,6 +87,7 @@ pub fn new_tool_wrapper() -> Result<(), Box<dyn Error>> {
Please make sure you update the changelog and bump the version number in `Cargo.toml` when you add a new tool.

### VSCode users

If you are using VSCode, make sure you link to the `Cargo.toml` inside the `.vscode` folder, so that `rust-analyzer` can link it all together:
```json
{
Expand Down
2 changes: 1 addition & 1 deletion bindings/python/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "gtars-py"
version = "0.1.1"
version = "0.2.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
Expand Down
31 changes: 18 additions & 13 deletions bindings/python/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
# gtars
This is a python wrapper around the `gtars` crate. It provides an easy interface for using `gtars` in python. It is currently in early development, and as such, it does not have a lot of functionality yet, but new tools are being worked on right now.

## Installation
You can get `gtars` from PyPI:
```bash
pip install gtars
```
This is a Python package that wraps the `gtars` crate so you can call gtars code from Python.

Documentation for Python bindings is hosted at: https://docs.bedbase.org/gtars/

## Brief instructions

## Usage
Import the package, and use the tools:
```python
import gtars as gt
To install the development version, you'll have to build it locally. Build Python bindings like this:

gt.prune_universe(...)
```console
cd bindings/python
maturin build --interpreter 3.11 --release
```
## Developer docs
Write the develop docs here...

Then install the local wheel that was just built:

```console
gtars_version=`grep '^version =' Cargo.toml | cut -d '"' -f 2`
python_version=$(python --version | awk '{print $2}' | cut -d '.' -f1-2 | tr -d '.')
wheel_path=$(find target/wheels/gtars-${gtars_version}-cp${python_version}-cp${python_version}-*.whl)
pip install --force-reinstall ${wheel_path}
```
1 change: 1 addition & 0 deletions bindings/python/gtars/digests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.digests import * # noqa: F403
71 changes: 71 additions & 0 deletions bindings/python/src/digests/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
// This is intended to provide minimal Python bindings to functions in the `digests` module of the `gtars` crate.

use pyo3::prelude::*;
use gtars::digests::{sha512t24u, md5, DigestResult};

#[pyfunction]
pub fn sha512t24u_digest(readable: &str) -> String {
return sha512t24u(readable);
}

#[pyfunction]
pub fn md5_digest(readable: &str) -> String {
return md5(readable);
}

#[pyfunction]
pub fn digest_fasta(fasta: &str) -> PyResult<Vec<PyDigestResult>> {
match gtars::digests::digest_fasta(fasta) {
Ok(digest_results) => {
let py_digest_results: Vec<PyDigestResult> = digest_results.into_iter().map(PyDigestResult::from).collect();
Ok(py_digest_results)
},
Err(e) => Err(PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("Error processing FASTA file: {}", e))),
}
}

#[pyclass]
#[pyo3(name="DigestResult")]
pub struct PyDigestResult {
#[pyo3(get,set)]
pub id: String,
#[pyo3(get,set)]
pub length: usize,
#[pyo3(get,set)]
pub sha512t24u: String,
#[pyo3(get,set)]
pub md5: String
}

#[pymethods]
impl PyDigestResult {
fn __repr__(&self) -> String {
format!("<DigestResult for {}>", self.id)
}

fn __str__(&self) -> PyResult<String> {
Ok(format!("DigestResult for sequence {}\n length: {}\n sha512t24u: {}\n md5: {}", self.id, self.length, self.sha512t24u, self.md5))
}
}

impl From<DigestResult> for PyDigestResult {
fn from(value: DigestResult) -> Self {
PyDigestResult {
id: value.id,
length: value.length,
sha512t24u: value.sha512t24u,
md5: value.md5
}
}
}

// This represents the Python module to be created
#[pymodule]
pub fn digests(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(sha512t24u_digest, m)?)?;
m.add_function(wrap_pyfunction!(md5_digest, m)?)?;
m.add_function(wrap_pyfunction!(digest_fasta, m)?)?;
m.add_class::<PyDigestResult>()?;
Ok(())
}

4 changes: 4 additions & 0 deletions bindings/python/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ mod ailist;
mod models;
mod tokenizers;
mod utils;
mod digests;

pub const VERSION: &str = env!("CARGO_PKG_VERSION");

Expand All @@ -14,11 +15,13 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
let ailist_module = pyo3::wrap_pymodule!(ailist::ailist);
let utils_module = pyo3::wrap_pymodule!(utils::utils);
let models_module = pyo3::wrap_pymodule!(models::models);
let digests_module = pyo3::wrap_pymodule!(digests::digests);

m.add_wrapped(tokenize_module)?;
m.add_wrapped(ailist_module)?;
m.add_wrapped(utils_module)?;
m.add_wrapped(models_module)?;
m.add_wrapped(digests_module)?;

let sys = PyModule::import_bound(py, "sys")?;
let binding = sys.getattr("modules")?;
Expand All @@ -29,6 +32,7 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
sys_modules.set_item("gtars.ailist", m.getattr("ailist")?)?;
sys_modules.set_item("gtars.utils", m.getattr("utils")?)?;
sys_modules.set_item("gtars.models", m.getattr("models")?)?;
sys_modules.set_item("gtars.digests", m.getattr("digests")?)?;

// add constants
m.add("__version__", VERSION)?;
Expand Down
2 changes: 1 addition & 1 deletion bindings/r/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: gtars
Title: Performance critical genomic interval analysis using Rust, in R
Version: 0.0.0.9000
Version: 0.0.1
Authors@R:
person("Nathan", "LeRoy", , "nleroy917@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-7354-7213"))
Expand Down
4 changes: 3 additions & 1 deletion bindings/r/R/igd.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ NULL
#' @examples
#' \dontrun{
#' # Create database with default name
#' igd_create("path/to/output", "path/to/bed/files")
#' r_igd_create("path/to/output", "path/to/bed/files")
#' }
#'
#' @export
Expand Down Expand Up @@ -49,6 +49,8 @@ r_igd_create <- function(output_path, filelist, db_name = "igd_database") {
#'
#' @examples
#' \dontrun{
#' # Search database with default name
#' r_igd_search("path/to/database", "path/to/query/file")
#' }
#'
#' @export
Expand Down
18 changes: 18 additions & 0 deletions bindings/r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# gtars

This is an R package that wraps the `gtars` Rust crate so you can call gtars code from R.

## Brief instructions

To install the development version, you'll have to build it locally. Build R bindings like this:

```console
cd bindings
R CMD build r
```

Then install the package that was just built:

```console
R CMD INSTALL gtars_0.0.1.tar.gz
```
2 changes: 1 addition & 1 deletion bindings/r/man/r_igd_create.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions bindings/r/man/r_igd_search.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion bindings/r/src/rust/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = 'gtars-r'
version = '0.1.0'
version = '0.2.0'
edition = '2021'

[lib]
Expand Down
7 changes: 0 additions & 7 deletions bindings/r/tests/set_A.bed

This file was deleted.

3 changes: 0 additions & 3 deletions bindings/r/tests/set_AA.bed

This file was deleted.

Loading
Loading