From 07ff4c39994bdd535e992bfc53a5713818bfe640 Mon Sep 17 00:00:00 2001 From: denosawr Date: Wed, 4 Dec 2024 14:16:33 +1100 Subject: [PATCH] Updated documentation to reflect distribution vector of flexiplex-filter --- README.md | 11 ++++++++++- scripts/README.md | 11 +++++++++-- scripts/usage.md | 34 +++++++++++++++------------------- 3 files changed, 34 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index 11f122e..92f60d1 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,16 @@ For usage information type `flexiplex -h` and `flexiplex-filter -h`. ## Installation -Precompiled binaries are located in `bin/`. You can also install Flexiplex using Anaconda: `conda install -c bioconda -c conda-forge flexiplex` +Precompiled binaries for Flexiplex are located in `bin/`. You can also install Flexiplex using Anaconda: `conda install -c bioconda -c conda-forge flexiplex` + +flexiplex-filter can be installed locally using Make, but we recommend using the [`uv` package manager](https://docs.astral.sh/uv/getting-started/installation/). Just invoke flexiplex-filter using + +```sh +uvx --from git+https://github.com/davidsongroup/flexiplex.git#subdirectory=scripts \ + flexiplex-filter --help +``` + +### Compiling from source To compile flexiplex, ensure that gcc is installed, then run: `make` diff --git a/scripts/README.md b/scripts/README.md index 74ae3bd..cca26ce 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -1,8 +1,15 @@ # scripts -This directory contains a script, `filter-barcodes.py`, which can be used to: +This directory contains a Python tool, `flexiplex-filter`, which can be used to: * Find an inflection point automatically * Graph the knee plot of the barcode ranks/counts * Filter using a whitelist of known barcodes -More information, including installation instructions, is detailed in the [usage file](usage.md). \ No newline at end of file +It is recommended that you run `flexiplex-filter` using the [`uv` package manager](https://docs.astral.sh/uv/getting-started/installation/): + +```sh +uvx --from git+https://github.com/davidsongroup/flexiplex.git#subdirectory=scripts \ + flexiplex-filter --help +``` + +More information, including installation instructions, is detailed in the [usage file](usage.md). diff --git a/scripts/usage.md b/scripts/usage.md index 662d824..b2fd1c9 100644 --- a/scripts/usage.md +++ b/scripts/usage.md @@ -16,7 +16,7 @@ ## Arguments ``` -usage: filter-barcodes.py [-h] [-v] [-o ] [--dry-run] [--no-inflection] +usage: flexiplex-filter [-h] [-v] [-o ] [--dry-run] [--no-inflection] [-l ] [-u ] [-g] [--list-points ] [--use-predetermined-rank ] [-w ] [filename] @@ -62,20 +62,16 @@ filter by whitelist file: ``` ## Installation -The script is designed to be minimal and depends on only `numpy` and `pandas`. `matplotlib` is optional, and is used for the graphing functionality. +The recommended way to run flexiplex-filter is to use the [`uv` package manager](https://docs.astral.sh/uv/getting-started/installation/): -```bash -$ wget -O https://github.com/DavidsonGroup/flexiplex/blob/filters/scripts/filter-barcodes.py -$ python filter_barcodes.py +```sh +uvx --from git+https://github.com/davidsongroup/flexiplex.git#subdirectory=scripts \ + flexiplex-filter --help ``` -or even, to avoid downloading, - -```bash -$ curl https://github.com/DavidsonGroup/flexiplex/blob/filters/scripts/filter-barcodes.py | python - -``` +A `requirements.txt` is provided as well for reproducibility: `pip install -r requirements.txt` or `conda install --file requirements.txt`. The script is designed to be minimal and depends on only `numpy` and `pandas`. `matplotlib` is optional, and is used for the graphing functionality. -A `requirements.txt` is provided as well for reproducibility: `pip install -r requirements.txt` or `conda install --file requirements.txt`. +Python 3.7 or greater is required. ## Autopilot This script has sensible defaults which can automatically find an 'approximate' inflection point and perform whitelist filtering for you. It will never edit your files in-place and will always output to `stdout` or a given output file. @@ -94,7 +90,7 @@ This automatic inflection will, by default, use: These can be overriden using the `-l` and `-u` parameters, if your dataset is small or needs fine-tuning. ```bash -$ python filter_barcodes.py --outfile output.txt flexiplex_barcodes_counts.txt +$ flexiplex-filter --outfile output.txt flexiplex_barcodes_counts.txt Rank of inflection point: 182 ``` @@ -108,7 +104,7 @@ The `--whitelist ` argument will read a plaintext file of newline-separate As inflection point discovery is on by default, the `--no-inflection` argument should be used to disable it. This will just perform a whitelist filter. ```bash -$ python filter_barcodes.py --whitelist 3M-february-2018.txt --no-inflection --outfile output.txt flexiplex_barcodes_counts.txt +$ flexiplex-filter --whitelist 3M-february-2018.txt --no-inflection --outfile output.txt flexiplex_barcodes_counts.txt Filtered with whitelist, removed 656317 out of 766589 barcodes (85% of all barcodes) ``` @@ -121,7 +117,7 @@ $ sort <(gunzip -c 3M-february-2018.txt.gz) <(cut -f1 flexiplex_barcodes_counts. ### Whitelist filtering and inflection point discovery ```bash -$ python filter_barcodes.py --whitelist 3M-february-2018.txt --outfile output.txt flexiplex_barcodes_counts.txt +$ flexiplex-filter --whitelist 3M-february-2018.txt --outfile output.txt flexiplex_barcodes_counts.txt Filtered with whitelist, removed 656317 out of 766589 barcodes (85% of all barcodes) Rank of inflection point: 182 @@ -140,7 +136,7 @@ The extremes of the data tend to be sparsely populated and not very appropriate In the below example, the corresponding count is also given. Note that the minimum rank represents the largest count, and vice versa. Thus, the barcode with rank 20 will have a count of 72575, and the barcode with rank 1000 has a count of 365. The discovered inflection point will necessarily have a rank between 20 and 1000. ```bash -$ python filter-barcodes.py -l 20 -u 1000 --verbose [...] +$ flexiplex-filter -l 20 -u 1000 --verbose [...] --list-points not given, setting it to 10 No whitelist file given, skipping @@ -153,7 +149,7 @@ Bounds interval: ranks [20, 1000] -> counts [365, 72575] If an optimal inflection point has already been determined, passing the `--use-predetermined-rank` $r$ argument will disable inflection point discovery and will instead filter all ranks $\leq r$. ```bash -$ python filter-barcodes.py --use-predetermined-rank 532 ... +$ flexiplex-filter --use-predetermined-rank 532 ... Using predetermined rank, not initiating discovery Rank of inflection point: 532 @@ -168,7 +164,7 @@ $$\frac{d\log_{10}(\text{count})}{d\log_{10}(\text{rank})}\approx \frac{\Delta\l Observe that in the following example, there are two distinct 'neighbourhoods' which could contain an inflection point: $r \approx 805$ and $r \approx 830$. ``` -$ python filter-barcodes.py --list-points 10 ... +$ flexiplex-filter --list-points 10 ... Setting --dry-run as --list-points was given @@ -199,11 +195,11 @@ Passing the `--graph` option will create a knee plot of the ranks compared with This requires a windowed environment and is not suitable for a headless computing setup. ```bash -$ python filter-barcodes.py --graph ... +$ flexiplex-filter --graph ... [Figure 1] ``` ![](images/fig1.png) *Figure 1: The script running with `--graph` enabled.* -Note that this can be used in conjunction with `--use-predetermined-rank` to visualise the location, rank, and count of any given barcode. \ No newline at end of file +Note that this can be used in conjunction with `--use-predetermined-rank` to visualise the location, rank, and count of any given barcode.