Skip to content

Commit

Permalink
Merge pull request #51 from databio/dev
Browse files Browse the repository at this point in the history
0.2.0 Release - Gibson Les Paul
  • Loading branch information
nleroy917 authored Jan 13, 2025
2 parents bfcabdc + df443c2 commit bde7a3d
Show file tree
Hide file tree
Showing 51 changed files with 1,379 additions and 516 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
# - {os: windows-latest, r: 'release', rust-version: 'stable-msvc', rust-target: 'x86_64-pc-windows-gnu'}
- {os: macOS-latest, r: 'release', rust-version: 'stable'}
- {os: ubuntu-latest, r: 'release', rust-version: 'stable'}
- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
#- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
steps:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ bin/

.DS_Store
.Rhistory
/gtars/tests/data/out/region_scoring_count.csv.gz
9 changes: 9 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2024 gtars authors

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
9 changes: 9 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2024 gtars authors

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 changes: 20 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,73 @@

`gtars` is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, [`geniml`](https:github.com/databio/geniml), a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well.

`gtars` provides three things:
`gtars` provides these things:

1. A rust library crate.
2. A command-line interface, written in rust.
3. A Python package that provides bindings to the rust library.
3. A Python package that provides Python bindings to the rust library.
4. An R package that provides R bindings to the rust library

## Repository organization (for developers)

This repo is organized like so:

1. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
2. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
3. A rust crate (in `/bindings`) that provides Python bindings, and a resulting Python package, so that it can be used within Python.
1. The main gtars rust package in `/gtars`, which contains two crates:
1a. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
1b. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
2. Python bindings (in `/bindings/python`), which consists of a rust package with a library crate (no binary crate) and Python package.
3. R bindings (in `/bindinds/r`), which consists of an R package.

This repository is a work in progress, and still in early development.

## Installation

To install `gtars`, you must have the rust toolchain installed. You can install it by following the instructions [here](https://www.rust-lang.org/tools/install).

You may build the binary locally using `cargo build --release`. This will create a binary in `target/release/gtars`. You can then add this to your path, or run it directly.

## Usage

`gtars` is very early in development, and as such, it does not have a lot of functionality yet. However, it does have a few useful tools. To see the available tools, run `gtars --help`. To see the help for a specific tool, run `gtars <tool> --help`.

Alternatively, you can link `gtars` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:

```toml
[dependencies]
gtars = { git = "https://github.com/databio/gtars" }
```

## Testing

To run the tests, run `cargo test`.

## Contributing

### New internal library crate tools

If you'd like to add a new tool, you can do so by creating a new module within the src folder.

### New public library crate tools

If you want this to be available to users of `gtars`, you can add it to the `gtars` library crate as well. To do so, add the following to `src/lib.rs`:
```rust
pub mod <tool_name>;
```

### New binary crate tools

Finally, if you want to have command-line functionality, you can add it to the `gtars` binary crate. This requires two steps:

1. Create a new `cli` using `clap` inside the `interfaces` module of `src/cli.rs`:

```rust
pub fn make_new_tool_cli() -> Command {

}
```

2. Write your logic in a wrapper function. This will live inside the `functions` module of `src/cli.rs`:

```rust
// top of file:
use tool_name::{ ... }
Expand All @@ -73,6 +87,7 @@ pub fn new_tool_wrapper() -> Result<(), Box<dyn Error>> {
Please make sure you update the changelog and bump the version number in `Cargo.toml` when you add a new tool.

### VSCode users

If you are using VSCode, make sure you link to the `Cargo.toml` inside the `.vscode` folder, so that `rust-analyzer` can link it all together:
```json
{
Expand Down
2 changes: 1 addition & 1 deletion bindings/python/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "gtars-py"
version = "0.1.1"
version = "0.2.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
Expand Down
31 changes: 18 additions & 13 deletions bindings/python/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
# gtars
This is a python wrapper around the `gtars` crate. It provides an easy interface for using `gtars` in python. It is currently in early development, and as such, it does not have a lot of functionality yet, but new tools are being worked on right now.

## Installation
You can get `gtars` from PyPI:
```bash
pip install gtars
```
This is a Python package that wraps the `gtars` crate so you can call gtars code from Python.

Documentation for Python bindings is hosted at: https://docs.bedbase.org/gtars/

## Brief instructions

## Usage
Import the package, and use the tools:
```python
import gtars as gt
To install the development version, you'll have to build it locally. Build Python bindings like this:

gt.prune_universe(...)
```console
cd bindings/python
maturin build --interpreter 3.11 --release
```
## Developer docs
Write the develop docs here...

Then install the local wheel that was just built:

```console
gtars_version=`grep '^version =' Cargo.toml | cut -d '"' -f 2`
python_version=$(python --version | awk '{print $2}' | cut -d '.' -f1-2 | tr -d '.')
wheel_path=$(find target/wheels/gtars-${gtars_version}-cp${python_version}-cp${python_version}-*.whl)
pip install --force-reinstall ${wheel_path}
```
1 change: 1 addition & 0 deletions bindings/python/gtars/digests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.digests import * # noqa: F403
71 changes: 71 additions & 0 deletions bindings/python/src/digests/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
// This is intended to provide minimal Python bindings to functions in the `digests` module of the `gtars` crate.

use pyo3::prelude::*;
use gtars::digests::{sha512t24u, md5, DigestResult};

#[pyfunction]
pub fn sha512t24u_digest(readable: &str) -> String {
return sha512t24u(readable);
}

#[pyfunction]
pub fn md5_digest(readable: &str) -> String {
return md5(readable);
}

#[pyfunction]
pub fn digest_fasta(fasta: &str) -> PyResult<Vec<PyDigestResult>> {
match gtars::digests::digest_fasta(fasta) {
Ok(digest_results) => {
let py_digest_results: Vec<PyDigestResult> = digest_results.into_iter().map(PyDigestResult::from).collect();
Ok(py_digest_results)
},
Err(e) => Err(PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("Error processing FASTA file: {}", e))),
}
}

#[pyclass]
#[pyo3(name="DigestResult")]
pub struct PyDigestResult {
#[pyo3(get,set)]
pub id: String,
#[pyo3(get,set)]
pub length: usize,
#[pyo3(get,set)]
pub sha512t24u: String,
#[pyo3(get,set)]
pub md5: String
}

#[pymethods]
impl PyDigestResult {
fn __repr__(&self) -> String {
format!("<DigestResult for {}>", self.id)
}

fn __str__(&self) -> PyResult<String> {
Ok(format!("DigestResult for sequence {}\n length: {}\n sha512t24u: {}\n md5: {}", self.id, self.length, self.sha512t24u, self.md5))
}
}

impl From<DigestResult> for PyDigestResult {
fn from(value: DigestResult) -> Self {
PyDigestResult {
id: value.id,
length: value.length,
sha512t24u: value.sha512t24u,
md5: value.md5
}
}
}

// This represents the Python module to be created
#[pymodule]
pub fn digests(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(sha512t24u_digest, m)?)?;
m.add_function(wrap_pyfunction!(md5_digest, m)?)?;
m.add_function(wrap_pyfunction!(digest_fasta, m)?)?;
m.add_class::<PyDigestResult>()?;
Ok(())
}

4 changes: 4 additions & 0 deletions bindings/python/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ mod ailist;
mod models;
mod tokenizers;
mod utils;
mod digests;

pub const VERSION: &str = env!("CARGO_PKG_VERSION");

Expand All @@ -14,11 +15,13 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
let ailist_module = pyo3::wrap_pymodule!(ailist::ailist);
let utils_module = pyo3::wrap_pymodule!(utils::utils);
let models_module = pyo3::wrap_pymodule!(models::models);
let digests_module = pyo3::wrap_pymodule!(digests::digests);

m.add_wrapped(tokenize_module)?;
m.add_wrapped(ailist_module)?;
m.add_wrapped(utils_module)?;
m.add_wrapped(models_module)?;
m.add_wrapped(digests_module)?;

let sys = PyModule::import_bound(py, "sys")?;
let binding = sys.getattr("modules")?;
Expand All @@ -29,6 +32,7 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
sys_modules.set_item("gtars.ailist", m.getattr("ailist")?)?;
sys_modules.set_item("gtars.utils", m.getattr("utils")?)?;
sys_modules.set_item("gtars.models", m.getattr("models")?)?;
sys_modules.set_item("gtars.digests", m.getattr("digests")?)?;

// add constants
m.add("__version__", VERSION)?;
Expand Down
2 changes: 1 addition & 1 deletion bindings/r/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: gtars
Title: Performance critical genomic interval analysis using Rust, in R
Version: 0.0.0.9000
Version: 0.0.1
Authors@R:
person("Nathan", "LeRoy", , "nleroy917@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-7354-7213"))
Expand Down
4 changes: 3 additions & 1 deletion bindings/r/R/igd.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ NULL
#' @examples
#' \dontrun{
#' # Create database with default name
#' igd_create("path/to/output", "path/to/bed/files")
#' r_igd_create("path/to/output", "path/to/bed/files")
#' }
#'
#' @export
Expand Down Expand Up @@ -49,6 +49,8 @@ r_igd_create <- function(output_path, filelist, db_name = "igd_database") {
#'
#' @examples
#' \dontrun{
#' # Search database with default name
#' r_igd_search("path/to/database", "path/to/query/file")
#' }
#'
#' @export
Expand Down
18 changes: 18 additions & 0 deletions bindings/r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# gtars

This is an R package that wraps the `gtars` Rust crate so you can call gtars code from R.

## Brief instructions

To install the development version, you'll have to build it locally. Build R bindings like this:

```console
cd bindings
R CMD build r
```

Then install the package that was just built:

```console
R CMD INSTALL gtars_0.0.1.tar.gz
```
2 changes: 1 addition & 1 deletion bindings/r/man/r_igd_create.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions bindings/r/man/r_igd_search.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion bindings/r/src/rust/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = 'gtars-r'
version = '0.1.0'
version = '0.2.0'
edition = '2021'

[lib]
Expand Down
7 changes: 0 additions & 7 deletions bindings/r/tests/set_A.bed

This file was deleted.

3 changes: 0 additions & 3 deletions bindings/r/tests/set_AA.bed

This file was deleted.

Loading

0 comments on commit bde7a3d

Please sign in to comment.