From c6a15f5b8746bceef1f50e1718e34fbbc65fe780 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:06:02 -0400 Subject: [PATCH 1/9] Change changelog format; version bump for release --- docs/changelog.md | 327 +++++++++++++++++---------------------------- looper/_version.py | 2 +- 2 files changed, 122 insertions(+), 207 deletions(-) diff --git a/docs/changelog.md b/docs/changelog.md index 73c0e14ea..d67f19a14 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -1,233 +1,148 @@ # Changelog +## [0.12] - Unreleased -- **Unreleased** +## [0.11.1] - 2019-04-17 +### Changed +- Improved documentation +- Improved interaction with `peppy` and `divvy` dependencies -- **v0.11 (*2019-04-17*)** +## [0.11] - 2019-04-17 - - Added +### Added +- Implemented `looper rerun` command. +- Support use of custom `resources` in pipeline's `compute` section +- Listen for itemized compute resource specification on command-line with `--resources` +- Support pointing to `Project` config file with folder path rather than full filepath +- Add `selector-attribute` parameter for more generic sample selection. - - Implemented `looper rerun` command. - - - Support use of custom `resources` in pipeline's `compute` section - - - Listen for itemized compute resource specification on command-line with `--resources` - - - Support pointing to `Project` config file with folder path rather than full filepath - - - Changed - - - Switched to a Jinja-style templating system for summary output - - - Made various UI changes to adapt to `caravel` use. - - - Using `attmap` for "attribute-style key-vale store" implementation - - - Removed Python 3.4 support. - - - UI: change parameter names `in/exclude-samples` to `selector-in/exclude`. - - - New: - - - Add `selector-attribute` parameter for more generic sample selection. - - -- **v0.10.0 (*2018-12-20*)** - - - Changed - - - ``PipelineInterface`` now derives from ``peppy.AttributeDict``. - - - On ``PipelineInterface``, iteration over pipelines now is with ``iterpipes``. - - - Rename ``parse_arguments`` to ``build_parser``, which returns ``argparse.ArgumentParser`` object - - - Integers in HTML reports are made more human-readable by including commas. - - - Column headers in HTML reports are now stricly for sorting; there's a separate list for plottable columns. - - - More informative error messages - - - Fixed - - - HTML samples list is fully populated. - - - Existence of an object lacking an anchor image is no longer problematic for ``summarize``. - - - Basic package test in Python 3 now succeeds: ``python3 setup.py test``. - -- **v0.9.2** (*2018-11-12*): - - - Fixed - - - Fixed bugs with ``looper summarize`` when no summarizers were present - - - Added CLI flag to force ``looper destroy`` for programmatic access - - - Fixed a bug for samples with duplicate names - - - Added new display features (graphs, table display) for HTML summary output. - - -- **v0.9.1** (*2018-06-30*): - - - Fixed - - - Fixed several bugs with ``looper summarize`` that caused failure on edge cases. - - -- **v0.9.0** (*2018-06-25*): - - - New - - - Support for custom summarizers - - - Add ``allow-duplicate-names`` command-line options - - - Allow any variables in environment config files or other ``compute`` sections to be used in submission templates. This allows looper to be used with containers. - - - Add nice universal project-level HTML reporting - - -- **v0.8.1** (*2018-04-02*): - - - Changed - - - Minor documentation and packaging updates for first Pypi release. - - - Fix a bug that incorrectly mapped protocols due to case sensitive issues - - - Fix a bug with ``report_figure`` that made it output pandas code - - -- **v0.8.0** (*2018-01-19*): - - - Changed - - - Use independent `peppy` package, replacing ``models`` module for core data types. - - - Integrate ``ProtocolInterface`` functionality into ``PipelineInterface``. - -- **v0.7.2** (*2017-11-16*): - - - Fixed +### Changed +- Switched to a Jinja-style templating system for summary output +- Made various UI changes to adapt to `caravel` use. +- Using `attmap` for "attribute-style key-vale store" implementation +- Removed Python 3.4 support. +- UI: change parameter names `in/exclude-samples` to `selector-in/exclude`. - - Correctly count successful command submissions when not using `--dry-run`. +## [0.10.0] - 2018-12-20 -- **v0.7.1** (*2017-11-15*): +### Changed +- ``PipelineInterface`` now derives from ``peppy.AttributeDict``. +- On ``PipelineInterface``, iteration over pipelines now is with ``iterpipes``. +- Rename ``parse_arguments`` to ``build_parser``, which returns ``argparse.ArgumentParser`` object +- Integers in HTML reports are made more human-readable by including commas. +- Column headers in HTML reports are now stricly for sorting; there's a separate list for plottable columns. +- More informative error messages +- HTML samples list is fully populated. +- Existence of an object lacking an anchor image is no longer problematic for ``summarize``. +- Basic package test in Python 3 now succeeds: ``python3 setup.py test``. - - Fixed - - - No longer falsely display that there's a submission failure. - - - Allow non-string values to be unquoted in the ``pipeline_args`` section. +## [v0.9.2] - 2018-11-12 -- **v0.7** (*2017-11-15*): +### Changed +- Fixed bugs with ``looper summarize`` when no summarizers were present +- Added CLI flag to force ``looper destroy`` for programmatic access +- Fixed a bug for samples with duplicate names +- Added new display features (graphs, table display) for HTML summary output. - - New - - - Add ``--lump`` and ``--lumpn`` options - - - Catch submission errors from cluster resource managers - - - Implied columns can now be derived - - - Now protocols can be specified on the command-line `--include-protocols` - - - Add rudimentary figure summaries - - - Simplifies command-line help display - - - Allow wildcard protocol_mapping for catch-all pipeline assignment - - - Improve user messages - - - New sample_subtypes section in pipeline_interface - - - Changed - - - Sample child classes are now defined explicitly in the pipeline interface. Previously, they were guessed based on presence of a class extending Sample in a pipeline script. - - - Changed 'library' key sample attribute to 'protocol' - -- **v0.6** (*2017-07-21*): - - - New - - Add support for implied_column section of the project config file +## [0.9.1] - 2018-06-30 - - Add support for Python 3 +### Changed +- Fixed several bugs with ``looper summarize`` that caused failure on edge cases. - - Merges pipeline interface and protocol mappings. This means we now allow direct pointers to ``pipeline_interface.yaml`` files, increasing flexibility, so this relaxes the specified folder structure that was previously used for ``pipelines_dir`` (with ``config`` subfolder). - - Allow URLs as paths to sample sheets. - - - Allow tsv format for sample sheets. - - - Checks that the path to a pipeline actually exists before writing the submission script. +## [0.9.0] - 2018-06-25 - - Changed +### Added +- Support for custom summarizers +- Add ``allow-duplicate-names`` command-line options +- Allow any variables in environment config files or other ``compute`` sections to be used in submission templates. This allows looper to be used with containers. +- Add nice universal project-level HTML reporting - - Changed LOOPERENV environment variable to PEPENV, generalizing it to generic models +## [0.8.1] - 2018-04-02 - - Changed name of ``pipelines_dir`` to ``pipeline_interfaces`` (but maintained backwards compatibility for now). +### Changed +- Minor documentation and packaging updates for first Pypi release. +- Fix a bug that incorrectly mapped protocols due to case sensitive issues +- Fix a bug with ``report_figure`` that made it output pandas code - - Changed name of ``run`` column to ``toggle``, since ``run`` can also refer to a sequencing run. - - Relaxes many constraints (like resources sections, pipelines_dir columns), making project configuration files useful outside looper. This moves us closer to dividing models from looper, and improves flexibility. +## [0.8.0] - 2018-01-19 - - Various small bug fixes and dev improvements. +### Changed +- Use independent `peppy` package, replacing ``models`` module for core data types. +- Integrate ``ProtocolInterface`` functionality into ``PipelineInterface``. - - Require `setuptools` for installation, and `pandas 0.20.2`. If `numexpr` is installed, version `2.6.2` is required. +## [0.7.2] - 2017-11-16 +### Changed +- Correctly count successful command submissions when not using `--dry-run`. - - Allows tilde in ``pipeline_interfaces`` +## [0.7.1] - 2017-11-15 -- **v0.5** (*2017-03-01*): +### Changed +- No longer falsely display that there's a submission failure. +- Allow non-string values to be unquoted in the ``pipeline_args`` section. - - New - - - Add new looper version tracking, with `--version` and `-V` options and printing version at runtime - - - Add support for asterisks in file paths - - - Add support for multiple pipeline directories in priority order - - - Revamp of messages make more intuitive output - - - Colorize output - - - Complete rehaul of logging and test infrastructure, using logging and pytest packages - - - Changed - - - Removes pipelines_dir requirement for models, making it useful outside looper - - - Small bug fixes related to `all_input_files` and `required_input_files` attributes +## [0.7] - 2017-11-15 +### Added +- Add ``--lump`` and ``--lumpn`` options +- Catch submission errors from cluster resource managers +- Implied columns can now be derived +- Now protocols can be specified on the command-line `--include-protocols` +- Add rudimentary figure summaries +- Simplifies command-line help display +- Allow wildcard protocol_mapping for catch-all pipeline assignment +- Improve user messages +- New sample_subtypes section in pipeline_interface - - More robust installation and more explicit requirement of Python 2.7 - - -- **v0.4** (*2017-01-12*): - - - New - - - New command-line interface (CLI) based on sub-commands - - - New subcommand (``looper summarize``) replacing the ``summarizePipelineStats.R`` script - - - New subcommand (``looper check``) replacing the ``flagCheck.sh`` script - - - New command (``looper destroy``) to remove all output of a project - - - New command (``looper clean``) to remove intermediate files of a project flagged for deletion - - - Support for portable and pipeline-independent allocation of computing resources with Looperenv. - - - Changed - - - Removed requirement to have ``pipelines`` repository installed in order to extend base Sample objects - - - Maintenance of sample attributes as provided by user by means of reading them in as strings (to be improved further) - - - Improved serialization of Sample objects +### Changed +- Sample child classes are now defined explicitly in the pipeline interface. Previously, they were guessed based on presence of a class extending Sample in a pipeline script. +- Changed 'library' key sample attribute to 'protocol' + +## [0.6] - 2017-07-21 +### Added + - Add support for implied_column section of the project config file + - Add support for Python 3 + - Merges pipeline interface and protocol mappings. This means we now allow direct pointers to ``pipeline_interface.yaml`` files, increasing flexibility, so this relaxes the specified folder structure that was previously used for ``pipelines_dir`` (with ``config`` subfolder). + - Allow URLs as paths to sample sheets. + - Allow tsv format for sample sheets. + - Checks that the path to a pipeline actually exists before writing the submission script. + +### Changed +- Changed LOOPERENV environment variable to PEPENV, generalizing it to generic models +- Changed name of `pipelines_dir` to `pipeline_interfaces` (but maintained backwards compatibility for now). +- Changed name of `run` column to `toggle`, since `run` can also refer to a sequencing run. +- Relaxes many constraints (like resources sections, pipelines_dir columns), making project configuration files useful outside looper. This moves us closer to dividing models from looper, and improves flexibility. +- Various small bug fixes and dev improvements. +- Require `setuptools` for installation, and `pandas 0.20.2`. If `numexpr` is installed, version `2.6.2` is required. +- Allows tilde in ``pipeline_interfaces`` + +## [0.5] - 2017-03-01 +### Added +- Add new looper version tracking, with `--version` and `-V` options and printing version at runtime +- Add support for asterisks in file paths +- Add support for multiple pipeline directories in priority order +- Revamp of messages make more intuitive output +- Colorize output +- Complete rehaul of logging and test infrastructure, using logging and pytest packages + +### Changed +- Removes pipelines_dir requirement for models, making it useful outside looper +- Small bug fixes related to `all_input_files` and `required_input_files` attributes +- More robust installation and more explicit requirement of Python 2.7 + + +## [0.4] - 2017-01-12 +### Added +- New command-line interface (CLI) based on sub-commands +- New subcommand (``looper summarize``) replacing the ``summarizePipelineStats.R`` script +- New subcommand (``looper check``) replacing the ``flagCheck.sh`` script +- New command (``looper destroy``) to remove all output of a project +- New command (``looper clean``) to remove intermediate files of a project flagged for deletion +- Support for portable and pipeline-independent allocation of computing resources with Looperenv. + +### Changed +- Removed requirement to have ``pipelines`` repository installed in order to extend base Sample objects +- Maintenance of sample attributes as provided by user by means of reading them in as strings (to be improved further) +- Improved serialization of Sample objects diff --git a/looper/_version.py b/looper/_version.py index b19938576..008a1d204 100644 --- a/looper/_version.py +++ b/looper/_version.py @@ -1,2 +1,2 @@ -__version__ = "0.12dev" +__version__ = "0.11.1" From 8ac35233ade2df83eedba40c2888e27c1a2b1971 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:07:45 -0400 Subject: [PATCH 2/9] backticks to markdown --- docs/changelog.md | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/changelog.md b/docs/changelog.md index d67f19a14..aab4142ea 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -27,21 +27,21 @@ ## [0.10.0] - 2018-12-20 ### Changed -- ``PipelineInterface`` now derives from ``peppy.AttributeDict``. -- On ``PipelineInterface``, iteration over pipelines now is with ``iterpipes``. -- Rename ``parse_arguments`` to ``build_parser``, which returns ``argparse.ArgumentParser`` object +- `PipelineInterface` now derives from `peppy.AttributeDict`. +- On `PipelineInterface`, iteration over pipelines now is with `iterpipes`. +- Rename `parse_arguments` to `build_parser`, which returns `argparse.ArgumentParser` object - Integers in HTML reports are made more human-readable by including commas. - Column headers in HTML reports are now stricly for sorting; there's a separate list for plottable columns. - More informative error messages - HTML samples list is fully populated. -- Existence of an object lacking an anchor image is no longer problematic for ``summarize``. -- Basic package test in Python 3 now succeeds: ``python3 setup.py test``. +- Existence of an object lacking an anchor image is no longer problematic for `summarize`. +- Basic package test in Python 3 now succeeds: `python3 setup.py test`. ## [v0.9.2] - 2018-11-12 ### Changed -- Fixed bugs with ``looper summarize`` when no summarizers were present -- Added CLI flag to force ``looper destroy`` for programmatic access +- Fixed bugs with `looper summarize` when no summarizers were present +- Added CLI flag to force `looper destroy` for programmatic access - Fixed a bug for samples with duplicate names - Added new display features (graphs, table display) for HTML summary output. @@ -49,15 +49,15 @@ ## [0.9.1] - 2018-06-30 ### Changed -- Fixed several bugs with ``looper summarize`` that caused failure on edge cases. +- Fixed several bugs with `looper summarize` that caused failure on edge cases. ## [0.9.0] - 2018-06-25 ### Added - Support for custom summarizers -- Add ``allow-duplicate-names`` command-line options -- Allow any variables in environment config files or other ``compute`` sections to be used in submission templates. This allows looper to be used with containers. +- Add `allow-duplicate-names` command-line options +- Allow any variables in environment config files or other `compute` sections to be used in submission templates. This allows looper to be used with containers. - Add nice universal project-level HTML reporting ## [0.8.1] - 2018-04-02 @@ -65,14 +65,14 @@ ### Changed - Minor documentation and packaging updates for first Pypi release. - Fix a bug that incorrectly mapped protocols due to case sensitive issues -- Fix a bug with ``report_figure`` that made it output pandas code +- Fix a bug with `report_figure` that made it output pandas code ## [0.8.0] - 2018-01-19 ### Changed -- Use independent `peppy` package, replacing ``models`` module for core data types. -- Integrate ``ProtocolInterface`` functionality into ``PipelineInterface``. +- Use independent `peppy` package, replacing `models` module for core data types. +- Integrate `ProtocolInterface` functionality into `PipelineInterface`. ## [0.7.2] - 2017-11-16 ### Changed @@ -82,11 +82,11 @@ ### Changed - No longer falsely display that there's a submission failure. -- Allow non-string values to be unquoted in the ``pipeline_args`` section. +- Allow non-string values to be unquoted in the `pipeline_args` section. ## [0.7] - 2017-11-15 ### Added -- Add ``--lump`` and ``--lumpn`` options +- Add `--lump` and `--lumpn` options - Catch submission errors from cluster resource managers - Implied columns can now be derived - Now protocols can be specified on the command-line `--include-protocols` @@ -104,7 +104,7 @@ ### Added - Add support for implied_column section of the project config file - Add support for Python 3 - - Merges pipeline interface and protocol mappings. This means we now allow direct pointers to ``pipeline_interface.yaml`` files, increasing flexibility, so this relaxes the specified folder structure that was previously used for ``pipelines_dir`` (with ``config`` subfolder). + - Merges pipeline interface and protocol mappings. This means we now allow direct pointers to `pipeline_interface.yaml` files, increasing flexibility, so this relaxes the specified folder structure that was previously used for `pipelines_dir` (with `config` subfolder). - Allow URLs as paths to sample sheets. - Allow tsv format for sample sheets. - Checks that the path to a pipeline actually exists before writing the submission script. @@ -116,7 +116,7 @@ - Relaxes many constraints (like resources sections, pipelines_dir columns), making project configuration files useful outside looper. This moves us closer to dividing models from looper, and improves flexibility. - Various small bug fixes and dev improvements. - Require `setuptools` for installation, and `pandas 0.20.2`. If `numexpr` is installed, version `2.6.2` is required. -- Allows tilde in ``pipeline_interfaces`` +- Allows tilde in `pipeline_interfaces` ## [0.5] - 2017-03-01 ### Added @@ -136,13 +136,13 @@ ## [0.4] - 2017-01-12 ### Added - New command-line interface (CLI) based on sub-commands -- New subcommand (``looper summarize``) replacing the ``summarizePipelineStats.R`` script -- New subcommand (``looper check``) replacing the ``flagCheck.sh`` script -- New command (``looper destroy``) to remove all output of a project -- New command (``looper clean``) to remove intermediate files of a project flagged for deletion +- New subcommand (`looper summarize`) replacing the `summarizePipelineStats.R` script +- New subcommand (`looper check`) replacing the `flagCheck.sh` script +- New command (`looper destroy`) to remove all output of a project +- New command (`looper clean`) to remove intermediate files of a project flagged for deletion - Support for portable and pipeline-independent allocation of computing resources with Looperenv. ### Changed -- Removed requirement to have ``pipelines`` repository installed in order to extend base Sample objects +- Removed requirement to have `pipelines` repository installed in order to extend base Sample objects - Maintenance of sample attributes as provided by user by means of reading them in as strings (to be improved further) - Improved serialization of Sample objects From 44dfdeb8c8786be24f84f06fb7cc5f89790a8ee2 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:17:55 -0400 Subject: [PATCH 3/9] docs updates --- docs/README.md | 2 +- docs/changelog.md | 5 +- docs_jupyter/hello-world.md | 97 ------------------------------------- 3 files changed, 4 insertions(+), 100 deletions(-) delete mode 100644 docs_jupyter/hello-world.md diff --git a/docs/README.md b/docs/README.md index 7d130a515..60678ad7c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -4,7 +4,7 @@ ## What is looper? -`Looper` is a pipeline submitting engine. `Looper` deploys any command-line pipeline across samples organized in [standard PEP format](https://pepkit.github.io/docs/home/). You can think of `looper` as providing a single user interface to running, summarizing, monitoring, and otherwise managing *all* of your sample-intensive research projects. +`Looper` is a pipeline submitting engine. `Looper` deploys any command-line pipeline for each sample in a project organized in [standard PEP format](https://pepkit.github.io/docs/home/). You can think of `looper` as providing a single user interface to running, summarizing, monitoring, and otherwise managing all of your sample-intensive research projects the same way, regardless of data type or pipeline used. ## What makes looper better? diff --git a/docs/changelog.md b/docs/changelog.md index aab4142ea..a7d8c0e1b 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -1,6 +1,8 @@ # Changelog -## [0.12] - Unreleased +This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. + +## [Unreleased] ## [0.11.1] - 2019-04-17 @@ -51,7 +53,6 @@ ### Changed - Fixed several bugs with `looper summarize` that caused failure on edge cases. - ## [0.9.0] - 2018-06-25 ### Added diff --git a/docs_jupyter/hello-world.md b/docs_jupyter/hello-world.md deleted file mode 100644 index 1dc9bde11..000000000 --- a/docs_jupyter/hello-world.md +++ /dev/null @@ -1,97 +0,0 @@ - -# Hello World! example for looper - -`Looper` is a pipeline submission engine (see [looper source code](https://github.com/pepkit/looper); [looper documentation](http://looper.readthedocs.org)). This repository contains a basic functional example project (in [/project](/project)) and a looper-compatible pipeline (in [/pipeline](/pipeline)) that can run on that project. This repository demonstrates how to install `looper` and use it to run the included pipeline on the included PEP project. - -## 1. Install the latest version of looper: - -```console -pip install --user --upgrade https://github.com/pepkit/looper/zipball/master -``` - -## 2. Download and unzip the hello_looper repository - - - -```python -!wget https://github.com/pepkit/hello_looper/archive/master.zip -``` - - --2019-04-11 18:14:45-- https://github.com/pepkit/hello_looper/archive/master.zip - Resolving github.com (github.com)... 192.30.253.113, 192.30.253.112 - Connecting to github.com (github.com)|192.30.253.113|:443... connected. - HTTP request sent, awaiting response... 302 Found - Location: https://codeload.github.com/pepkit/hello_looper/zip/master [following] - --2019-04-11 18:14:45-- https://codeload.github.com/pepkit/hello_looper/zip/master - Resolving codeload.github.com (codeload.github.com)... 192.30.253.121, 192.30.253.120 - Connecting to codeload.github.com (codeload.github.com)|192.30.253.121|:443... connected. - HTTP request sent, awaiting response... 200 OK - Length: unspecified [application/zip] - Saving to: ‘master.zip’ - - master.zip [ <=> ] 5.24K --.-KB/s in 0.004s - - 2019-04-11 18:14:45 (1.18 MB/s) - ‘master.zip’ saved [5366] - - - - -```python -!unzip master.zip -``` - - Archive: master.zip - 47b9584b59841d54418699aafc8d8d13f201dac3 - creating: hello_looper-master/ - inflating: hello_looper-master/README.md - creating: hello_looper-master/data/ - inflating: hello_looper-master/data/frog1_data.txt - inflating: hello_looper-master/data/frog2_data.txt - inflating: hello_looper-master/looper_pipelines.md - inflating: hello_looper-master/output.txt - creating: hello_looper-master/pipeline/ - inflating: hello_looper-master/pipeline/count_lines.sh - inflating: hello_looper-master/pipeline/pipeline_interface.yaml - creating: hello_looper-master/project/ - inflating: hello_looper-master/project/project_config.yaml - inflating: hello_looper-master/project/sample_annotation.csv - - -## 3. Run it - - -```python -!cd hello_looper-master -``` - - -```python -!looper run project/project_config.yaml -``` - - Command: run (Looper version: 0.11.0dev) - Using default config file, no global config file provided in environment variable(s): ['DIVCFG', 'PEPENV'] - Loading divvy config file: /home/nsheff/.local/lib/python3.5/site-packages/divvy/submit_templates/default_compute_settings.yaml - Activating compute package 'default' - Setting sample sheet from file '/home/nsheff/code/looper/docs_jupyter/hello_looper-master/project/sample_annotation.csv' - Finding pipelines for protocol(s): anySampleType - Known protocols: anySampleType - ## [1 of 2] frog_1 (anySampleType) - Writing script to /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_1.sub - Job script (n=1; 0.00 Gb): /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_1.sub - Compute node: puma - Start time: 2019-04-11 18:14:46 - Number of lines: 4 - ## [2 of 2] frog_2 (anySampleType) - Writing script to /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_2.sub - Job script (n=1; 0.00 Gb): /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_2.sub - Compute node: puma - Start time: 2019-04-11 18:14:46 - Number of lines: 7 - - Looper finished - Samples valid for job generation: 2 of 2 - Successful samples: 2 of 2 - Commands submitted: 2 of 2 - Jobs submitted: 2 -  From 5abb39775b2fa481efb62f95c73bb0c699479ae5 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:36:24 -0400 Subject: [PATCH 4/9] rerun notebook --- docs_jupyter/hello-world.ipynb | 78 ++++++++++++++-------------------- 1 file changed, 31 insertions(+), 47 deletions(-) diff --git a/docs_jupyter/hello-world.ipynb b/docs_jupyter/hello-world.ipynb index 928490f2c..45dc9dd49 100644 --- a/docs_jupyter/hello-world.ipynb +++ b/docs_jupyter/hello-world.ipynb @@ -24,7 +24,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, @@ -35,21 +35,21 @@ "name": "stdout", "output_type": "stream", "text": [ - "--2019-04-11 18:14:45-- https://github.com/pepkit/hello_looper/archive/master.zip\n", - "Resolving github.com (github.com)... 192.30.253.113, 192.30.253.112\n", - "Connecting to github.com (github.com)|192.30.253.113|:443... connected.\n", + "--2019-04-24 08:35:57-- https://github.com/pepkit/hello_looper/archive/master.zip\n", + "Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113\n", + "Connecting to github.com (github.com)|192.30.253.112|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://codeload.github.com/pepkit/hello_looper/zip/master [following]\n", - "--2019-04-11 18:14:45-- https://codeload.github.com/pepkit/hello_looper/zip/master\n", - "Resolving codeload.github.com (codeload.github.com)... 192.30.253.121, 192.30.253.120\n", - "Connecting to codeload.github.com (codeload.github.com)|192.30.253.121|:443... connected.\n", + "--2019-04-24 08:35:57-- https://codeload.github.com/pepkit/hello_looper/zip/master\n", + "Resolving codeload.github.com (codeload.github.com)... 192.30.253.120, 192.30.253.121\n", + "Connecting to codeload.github.com (codeload.github.com)|192.30.253.120|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [application/zip]\n", "Saving to: ‘master.zip’\n", "\n", - "master.zip [ <=> ] 5.24K --.-KB/s in 0.004s \n", + "master.zip [ <=> ] 5.24K --.-KB/s in 0.005s \n", "\n", - "2019-04-11 18:14:45 (1.18 MB/s) - ‘master.zip’ saved [5366]\n", + "2019-04-24 08:35:57 (981 KB/s) - ‘master.zip’ saved [5366]\n", "\n" ] } @@ -60,7 +60,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, @@ -107,7 +107,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, @@ -120,7 +120,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, @@ -131,31 +131,15 @@ "name": "stdout", "output_type": "stream", "text": [ - "Command: run (Looper version: 0.11.0dev)\r\n", - "Using default config file, no global config file provided in environment variable(s): ['DIVCFG', 'PEPENV']\r\n", - "Loading divvy config file: /home/nsheff/.local/lib/python3.5/site-packages/divvy/submit_templates/default_compute_settings.yaml\r\n", - "Activating compute package 'default'\r\n", - "Setting sample sheet from file '/home/nsheff/code/looper/docs_jupyter/hello_looper-master/project/sample_annotation.csv'\r\n", - "Finding pipelines for protocol(s): anySampleType\r\n", - "Known protocols: anySampleType\r\n", - "\u001b[36m## [1 of 2] frog_1 (anySampleType)\u001b[0m\r\n", - "Writing script to /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_1.sub\r\n", - "Job script (n=1; 0.00 Gb): /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_1.sub\r\n", - "Compute node: puma\r\n", - "Start time: 2019-04-11 18:14:46\r\n", - "Number of lines: 4\r\n", - "\u001b[36m## [2 of 2] frog_2 (anySampleType)\u001b[0m\r\n", - "Writing script to /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_2.sub\r\n", - "Job script (n=1; 0.00 Gb): /home/nsheff/hello_looper_results/submission/count_lines.sh_frog_2.sub\r\n", - "Compute node: puma\r\n", - "Start time: 2019-04-11 18:14:46\r\n", - "Number of lines: 7\r\n", - "\r\n", - "Looper finished\r\n", - "Samples valid for job generation: 2 of 2\r\n", - "Successful samples: 2 of 2\r\n", - "Commands submitted: 2 of 2\r\n", - "Jobs submitted: 2\r\n", + "Command: run (Looper version: 0.11.0)\r\n", + "Traceback (most recent call last):\r\n", + " File \"/home/nsheff/.local/bin/looper\", line 10, in \r\n", + " sys.exit(main())\r\n", + " File \"/home/nsheff/.local/lib/python3.5/site-packages/looper/looper.py\", line 802, in main\r\n", + " determine_config_path(args.config_file), subproject=args.subproject,\r\n", + " File \"/home/nsheff/.local/lib/python3.5/site-packages/looper/utils.py\", line 104, in determine_config_path\r\n", + " raise ValueError(\"Path doesn't exist: {}\".format(root))\r\n", + "ValueError: Path doesn't exist: project/project_config.yaml\r\n", "\u001b[0m" ] } @@ -188,7 +172,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, @@ -199,13 +183,13 @@ "name": "stdout", "output_type": "stream", "text": [ - "\u001b[01;34mhello_looper-master/data/\u001b[00m\r\n", + "hello_looper-master/data/\r\n", "├── frog1_data.txt\r\n", "└── frog2_data.txt\r\n", - "\u001b[01;34mhello_looper-master/pipeline/\u001b[00m\r\n", - "├── \u001b[01;32mcount_lines.sh\u001b[00m\r\n", + "hello_looper-master/pipeline/\r\n", + "├── count_lines.sh\r\n", "└── pipeline_interface.yaml\r\n", - "\u001b[01;34mhello_looper-master/project/\u001b[00m\r\n", + "hello_looper-master/project/\r\n", "├── project_config.yaml\r\n", "└── sample_annotation.csv\r\n", "\r\n", @@ -265,7 +249,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, @@ -299,7 +283,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, @@ -310,9 +294,9 @@ "name": "stdout", "output_type": "stream", "text": [ - "\u001b[01;34m/home/nsheff/hello_looper_results\u001b[00m\r\n", - "├── \u001b[01;34mresults_pipeline\u001b[00m\r\n", - "└── \u001b[01;34msubmission\u001b[00m\r\n", + "/home/nsheff/hello_looper_results\r\n", + "├── results_pipeline\r\n", + "└── submission\r\n", " ├── count_lines.sh_frog_1.log\r\n", " ├── count_lines.sh_frog_1.sub\r\n", " ├── count_lines.sh_frog_2.log\r\n", From 2214f8924f81a858ebc297171c34a78c1faa43d1 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:37:49 -0400 Subject: [PATCH 5/9] fix links --- docs/config-files.md | 4 +-- docs/config-files.rst | 30 --------------------- docs/define-your-project.md | 2 +- docs/extended-tutorial.md | 52 ------------------------------------- 4 files changed, 3 insertions(+), 85 deletions(-) delete mode 100644 docs/config-files.rst delete mode 100644 docs/extended-tutorial.md diff --git a/docs/config-files.md b/docs/config-files.md index 1fe4463a5..af175771a 100644 --- a/docs/config-files.md +++ b/docs/config-files.md @@ -19,7 +19,7 @@ or a **pipeline *developer*** (building your own pipeline). Users (non-developers) of pipelines only need to be aware of one or two config files: -- The [project config](project-config.md): This file is specific to each project and +- The [project config](define-your-project): This file is specific to each project and contains information about the project's metadata, where the processed files should be saved, and other variables that allow to configure the pipelines specifically for this project. It follows the standard `looper` format (now referred to as `PEP`, or "*portable encapsulated project*" format). @@ -48,4 +48,4 @@ it uses a pipeline-specific configuration file, which is detailed in the [`pypip Essentially, each pipeline may provide a configuration file describing where software is, and parameters to use for tasks within the pipeline. This configuration file is by default named like pipeline name, with a `.yaml` extension instead of `.py`. For example, by default `rna_seq.py` looks for an accompanying `rna_seq.yaml` file. -These files can be changed on a per-project level using the `pipeline_config` section of a [project configuration file](project-config.md). +These files can be changed on a per-project level using the `pipeline_config` section of a [project configuration file](define-your-project). diff --git a/docs/config-files.rst b/docs/config-files.rst deleted file mode 100644 index 9eec4996b..000000000 --- a/docs/config-files.rst +++ /dev/null @@ -1,30 +0,0 @@ - -Configuration files -========================= - -Looper uses `YAML `_ configuration files for several purposes. Looper is designed to be organized, modular, and very configurable, so there are several configuration files. We've organized the configuration files so they each handle a different level of infrastructure: environment, project, sample, or pipeline. This makes the system very adaptable and portable, but for a newcomer, it is easy to confuse what the different configuration files are used for. So, here's an explanation of each for you to use as a reference until you are familiar with the whole ecosystem. Which ones you need to know about will depend on whether you're a pipeline user (running pipelines on your project) or a pipeline developer (building your own pipeline). - - -Pipeline users -***************** - -Users (non-developers) of pipelines only need to be aware of one or two YAML files: - -- :ref:`project config file `: This file is specific to each project and contains information about the project's metadata, where the processed files should be saved, and other variables that allow to configure the pipelines specifically for this project. This file follows the standard looper format (now referred to as ``PEP`` format). - -If you are planning to submit jobs to a cluster, then you need to know about a second YAML file: - -- :ref:`PEPENV environment config `: This file tells looper how to use compute resource managers, like SLURM. This file doesn't require much editing or maintenance beyond initial setup. - -That should be all you need to worry about as a pipeline user. If you need to adjust compute resources or want to develop a pipeline or have more advanced project-level control over pipelines, then you'll need to know about a few others: - -Pipeline developers -********************** - -If you want to add a new pipeline to looper, tweak the way looper interacts with a pipeline for a given project, or change the default cluster resources requested by a pipeline, then you need to know about a configuration file that coordinates linking your pipeline in to your looper project. - -- :doc:`pipeline interface file `: Has two sections: 1) ``protocol_mapping`` tells looper which pipelines exist, and how to map each protocol (sample data type) to its pipelines; 2) ``pipelines`` links looper to the pipelines by describing options, arguments, and compute resources that the pipeline needs to run. - -Finally, if you're using Pypiper to develop pipelines, it uses a pipeline-specific configuration file (detailed in the Pypiper documentation): - -- `Pypiper pipeline config file `_: Each pipeline may (optionally) provide a configuration file describing where software is, and parameters to use for tasks within the pipeline. This configuration file is by default named identical to the pypiper script name, with a `.yaml` extension instead of `.py` (So `rna_seq.py` looks for an accompanying `rna_seq.yaml` file by default). These files can be changed on a per-project level using the `pipeline_config` section in your project config file. diff --git a/docs/define-your-project.md b/docs/define-your-project.md index 73b4e09d7..e35c463f4 100644 --- a/docs/define-your-project.md +++ b/docs/define-your-project.md @@ -2,4 +2,4 @@ Most pipelines require a unique way to organize samples, but `looper` subscribes to [standard Portable Encapsulated Project (PEP) format](http://pepkit.github.io). PEP is a standardized way to represent metadata about your project and each of its samples. If you follow this format, then your project can be read not only by `looper`, but also by other software, like the [pepr R package](http://github.com/pepkit/pepr), or the [peppy python package](http://github.com/pepkit/peppy). You should read the instructions on [how to create a PEP](https://pepkit.github.io/docs/simple_example/) to use with `looper`. -So, the first thing you should do is follow the [instructions for how to make a PEP](https://pepkit.github.io/docs/simple_example/). Once you've have a basic PEP created, the next section shows you [how to add looper-specific configuration to the PEP config file](project-config-looper.md), or you can jump ahead to [linking a project to a pipeline](link-a-pipeline.md). +So, the first thing you should do is follow the [instructions for how to make a PEP](https://pepkit.github.io/docs/simple_example/). Once you've have a basic PEP created, the next section shows you [how to add looper-specific configuration to the PEP config file](project-config-looper.md), or you can jump ahead to [linking a project to a pipeline](linking-a-pipeline.md). diff --git a/docs/extended-tutorial.md b/docs/extended-tutorial.md deleted file mode 100644 index 960e1b562..000000000 --- a/docs/extended-tutorial.md +++ /dev/null @@ -1,52 +0,0 @@ -# Extended tutorial - -The best way to learn is by example, so here's an extended tutorial to get you started using looper to run pre-made pipelines on a pre-made project. - -First, install looper and pypiper. [`pypiper`](http://pypiper.readthedocs.io) is our pipeline development framework. While `pypiper` is not required to use `looper` (which can work with any command-line tool), we install it now since this tutorial uses `pypiper` pipelines: - -```bash -pip install --user https://github.com/epigen/looper/zipball/master -pip install --user https://github.com/epigen/pypiper/zipball/master -``` - -Now, you will need to grab a project to run, and some pipelines to run on it. We have a functional working project example and an open source pipeline repository on github. - - -```bash -git clone https://github.com/epigen/microtest.git -git clone https://github.com/epigen/open_pipelines.git -``` - -Now you can run this project with looper! Just use `looper run`: - -```bash -looper run microtest/config/microtest_config.tutorial.yaml -``` - -***Hint:*** You can add the `looper` executable to your shell path: -```bash -export PATH=~/.local/bin:$PATH -``` - - -## Pipeline outputs - -Outputs of pipeline runs will be under the directory specified in the `output_dir` variable under the `paths` section -in the project config file (see the [config files page](config-files.md)). - -Inside of an `output_dir` there will be two directories: -- `results_pipeline` - a directory with output of the pipeline(s), for each sample/pipeline combination (often one per sample) -- `submissions` - which holds a YAML representation of each sample and a log file for each submitted job - -In this example, we just ran one example sample (an amplicon sequencing library) through a pipeline that processes amplicon data -(to determine percentage of indels in amplicon) - -From here to running hundreds of samples of various sample types is virtually the same effort! - - -## On your own -To use `looper` on your own, you will need to prepare 2 things: a **project** (metadata that define *what* you want to process), and **pipelines** (*how* to process data). -The next sections provide detailed instructions on how to define these: -1. **Project**. To link your project to `looper`, you will need to [define your project](project-config.md) using a standard format. -2. **Pipelines**. You will want to either use pre-made `looper`-compatible pipelines or link your own custom-built pipelines. -Either way, the next section includes detailed instructions on how to [connect your pipeline](pipeline-interface.md) to `looper`. From 90d9579002245fb9f1b3bb37b1543531f3ca0daa Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:44:46 -0400 Subject: [PATCH 6/9] try adding peppy tutorial --- docs_jupyter/tutorial.ipynb | 188 ++++++++++++++++++++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 docs_jupyter/tutorial.ipynb diff --git a/docs_jupyter/tutorial.ipynb b/docs_jupyter/tutorial.ipynb new file mode 100644 index 000000000..0bd5d1a1a --- /dev/null +++ b/docs_jupyter/tutorial.ipynb @@ -0,0 +1,188 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "# Basic PEP example\n", + "\n", + "This vignette will show you a simple example PEP-formatted project, and how to read it into python using the `peppy` package. This example comes from the [example_peps repsitory](https://github.com/pepkit/example_peps) in the [example_basic](https://github.com/pepkit/example_peps/tree/master/example_basic) folder.\n", + "\n", + "\n", + "Start by importing `peppy`, and then let's take a look at the configuration file that defines our project:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": true, + "deletable": true, + "editable": true + }, + "outputs": [], + "source": [ + "import peppy" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "metadata:\n", + " sample_annotation: sample_annotation.csv\n", + " output_dir: $HOME/hello_looper_results\n", + " pipeline_dir: $HOME/pipeline_dir\n", + "\n" + ] + } + ], + "source": [ + "project_config_file = \"../examples/example_peps-master/example_basic/project_config.yaml\"\n", + "with open(project_config_file) as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "It's a basic `yaml` file with one section, *metadata*, with just two variables. This is about the simplest possible PEP project configuration file. The *sample_annotation* points at the annotation file, which is stored in the same folder as `project_config.yaml`. Let's now glance at that annotation file: " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sample_name,library,file\n", + "frog_1,anySampleType,data/frog1_data.txt\n", + "frog_2,anySampleType,data/frog2_data.txt\n", + "\n" + ] + } + ], + "source": [ + "project_config_file = \"../examples/example_peps-master/example_basic/sample_annotation.csv\"\n", + "with open(project_config_file) as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "This *sample_annotation* file is a basic *csv* file, with rows corresponding to samples, and columns corresponding to sample attributes. Let's read this simple example project into python using `peppy`:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "No local config file was provided\n", + "Found global config file in DIVCFG: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml\n", + "Loading divvy config file: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml\n", + "Use 'compute_packages' instead of 'compute'\n", + "Available packages: set(['singularity_local', 'default', 'largemem', 'singularity_slurm', 'sigterm', 'local', 'parallel'])\n", + "Activating compute package 'default'\n" + ] + } + ], + "source": [ + "proj = peppy.Project(\"../examples/example_peps-master/example_basic/project_config.yaml\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "deletable": true, + "editable": true + }, + "source": [ + "Now, we have access to all the project metadata in easy-to-use form using python objects. We can browse the samples in the project like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'data/frog1_data.txt'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "proj.samples[0].file" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From a5348ed43810e42ed6a140cf9852f932e08798d6 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:50:35 -0400 Subject: [PATCH 7/9] test removing mkdocs values --- docs_jupyter/tutorial.ipynb | 188 ------------------------------------ mkdocs.yml | 4 +- 2 files changed, 1 insertion(+), 191 deletions(-) delete mode 100644 docs_jupyter/tutorial.ipynb diff --git a/docs_jupyter/tutorial.ipynb b/docs_jupyter/tutorial.ipynb deleted file mode 100644 index 0bd5d1a1a..000000000 --- a/docs_jupyter/tutorial.ipynb +++ /dev/null @@ -1,188 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "# Basic PEP example\n", - "\n", - "This vignette will show you a simple example PEP-formatted project, and how to read it into python using the `peppy` package. This example comes from the [example_peps repsitory](https://github.com/pepkit/example_peps) in the [example_basic](https://github.com/pepkit/example_peps/tree/master/example_basic) folder.\n", - "\n", - "\n", - "Start by importing `peppy`, and then let's take a look at the configuration file that defines our project:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "collapsed": true, - "deletable": true, - "editable": true - }, - "outputs": [], - "source": [ - "import peppy" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true, - "scrolled": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "metadata:\n", - " sample_annotation: sample_annotation.csv\n", - " output_dir: $HOME/hello_looper_results\n", - " pipeline_dir: $HOME/pipeline_dir\n", - "\n" - ] - } - ], - "source": [ - "project_config_file = \"../examples/example_peps-master/example_basic/project_config.yaml\"\n", - "with open(project_config_file) as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "It's a basic `yaml` file with one section, *metadata*, with just two variables. This is about the simplest possible PEP project configuration file. The *sample_annotation* points at the annotation file, which is stored in the same folder as `project_config.yaml`. Let's now glance at that annotation file: " - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sample_name,library,file\n", - "frog_1,anySampleType,data/frog1_data.txt\n", - "frog_2,anySampleType,data/frog2_data.txt\n", - "\n" - ] - } - ], - "source": [ - "project_config_file = \"../examples/example_peps-master/example_basic/sample_annotation.csv\"\n", - "with open(project_config_file) as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "This *sample_annotation* file is a basic *csv* file, with rows corresponding to samples, and columns corresponding to sample attributes. Let's read this simple example project into python using `peppy`:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "No local config file was provided\n", - "Found global config file in DIVCFG: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml\n", - "Loading divvy config file: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml\n", - "Use 'compute_packages' instead of 'compute'\n", - "Available packages: set(['singularity_local', 'default', 'largemem', 'singularity_slurm', 'sigterm', 'local', 'parallel'])\n", - "Activating compute package 'default'\n" - ] - } - ], - "source": [ - "proj = peppy.Project(\"../examples/example_peps-master/example_basic/project_config.yaml\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": true, - "editable": true - }, - "source": [ - "Now, we have access to all the project metadata in easy-to-use form using python objects. We can browse the samples in the project like this:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true, - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/plain": [ - "'data/frog1_data.txt'" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "proj.samples[0].file" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python", - "name": "python2" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/mkdocs.yml b/mkdocs.yml index fd14733aa..777230aa1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -29,9 +29,7 @@ nav: theme: databio plugins: - - databio: - jupyter_source: "docs_jupyter" - jupyter_build: "docs_jupyter/build" + - databio - search From 1827fa95b2b23068d42e3eee4dff51c222d39664 Mon Sep 17 00:00:00 2001 From: nsheff Date: Wed, 24 Apr 2019 08:53:29 -0400 Subject: [PATCH 8/9] try adding hello world md --- docs_jupyter/build/hello-world.md | 214 ++++++++++++++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 docs_jupyter/build/hello-world.md diff --git a/docs_jupyter/build/hello-world.md b/docs_jupyter/build/hello-world.md new file mode 100644 index 000000000..3c0e55396 --- /dev/null +++ b/docs_jupyter/build/hello-world.md @@ -0,0 +1,214 @@ + +# Hello World! example for looper + +This tutorial demonstrates how to install `looper` and use it to run a pipeline on a PEP project. + +## 1. Install the latest version of looper: + +```console +pip install --user --upgrade https://github.com/pepkit/looper/zipball/master +``` + +## 2. Download and unzip the hello_looper repository + +The [hello looper repository](http://github.com/pepkit/hello_looper) contains a basic functional example project (in `/project`) and a looper-compatible pipeline (in `/pipeline`) that can run on that project. Let's download and unzip it: + + + +```python +!wget https://github.com/pepkit/hello_looper/archive/master.zip +``` + +```.output +--2019-04-24 08:35:57-- https://github.com/pepkit/hello_looper/archive/master.zip +Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113 +Connecting to github.com (github.com)|192.30.253.112|:443... connected. +HTTP request sent, awaiting response... 302 Found +Location: https://codeload.github.com/pepkit/hello_looper/zip/master [following] +--2019-04-24 08:35:57-- https://codeload.github.com/pepkit/hello_looper/zip/master +Resolving codeload.github.com (codeload.github.com)... 192.30.253.120, 192.30.253.121 +Connecting to codeload.github.com (codeload.github.com)|192.30.253.120|:443... connected. +HTTP request sent, awaiting response... 200 OK +Length: unspecified [application/zip] +Saving to: ‘master.zip’ + +master.zip [ <=> ] 5.24K --.-KB/s in 0.005s + +2019-04-24 08:35:57 (981 KB/s) - ‘master.zip’ saved [5366] + + +``` + + +```python +!unzip master.zip +``` + +```.output +Archive: master.zip +47b9584b59841d54418699aafc8d8d13f201dac3 + creating: hello_looper-master/ + inflating: hello_looper-master/README.md + creating: hello_looper-master/data/ + inflating: hello_looper-master/data/frog1_data.txt + inflating: hello_looper-master/data/frog2_data.txt + inflating: hello_looper-master/looper_pipelines.md + inflating: hello_looper-master/output.txt + creating: hello_looper-master/pipeline/ + inflating: hello_looper-master/pipeline/count_lines.sh + inflating: hello_looper-master/pipeline/pipeline_interface.yaml + creating: hello_looper-master/project/ + inflating: hello_looper-master/project/project_config.yaml + inflating: hello_looper-master/project/sample_annotation.csv + +``` + +## 3. Run it + +Run it by changing to the directory and then invoking `looper run` on the project configuration file. + + +```python +!cd hello_looper-master +``` + + +```python +!looper run project/project_config.yaml +``` + +```.output +Command: run (Looper version: 0.11.0) +Traceback (most recent call last): + File "/home/nsheff/.local/bin/looper", line 10, in + sys.exit(main()) + File "/home/nsheff/.local/lib/python3.5/site-packages/looper/looper.py", line 802, in main + determine_config_path(args.config_file), subproject=args.subproject, + File "/home/nsheff/.local/lib/python3.5/site-packages/looper/utils.py", line 104, in determine_config_path + raise ValueError("Path doesn't exist: {}".format(root)) +ValueError: Path doesn't exist: project/project_config.yaml + +``` + +Voila! You've run your very first pipeline across multiple samples using `looper`! + +# Exploring the results + +Now, let's inspect the `hello_looper` repository you downloaded. It has 3 components, each in a subfolder: + + +```python +!tree hello_looper-master/*/ +``` + +```.output +hello_looper-master/data/ +├── frog1_data.txt +└── frog2_data.txt +hello_looper-master/pipeline/ +├── count_lines.sh +└── pipeline_interface.yaml +hello_looper-master/project/ +├── project_config.yaml +└── sample_annotation.csv + +0 directories, 6 files + +``` + +These are: + + * `/data` -- contains 2 data files for 2 samples. These input files were each passed to the pipeline. + * `/pipeline` -- contains the script we want to run on each sample in our project. Our pipeline is a very simple shell script named `count_lines.sh`, which (duh!) counts the number of lines in an input file. + * `/project` -- contains 2 files that describe metadata for the project (`project_config.yaml`) and the samples (`sample_annotation.csv`). This particular project describes just two samples listed in the annotation file. These files together make up a [PEP](http://pepkit.github.io)-formatted project, and can therefore be read by any PEP-compatible tool, including `looper`. + + + + +When we invoke `looper` from the command line we told it to `run project/project_config.yaml`. `looper` reads the [project/project_config.yaml](https://github.com/pepkit/hello_looper/blob/master/project/project_config.yaml) file, which points to a few things: + + * the [project/sample_annotation.csv](https://github.com/pepkit/hello_looper/blob/master/project/sample_annotation.csv) file, which specifies a few samples, their type, and path to data file + * the `output_dir`, which is where looper results are saved. Results will be saved in `$HOME/hello_looper_results`. + * the `pipeline_interface.yaml` file, ([pipeline/pipeline_interface.yaml](https://github.com/pepkit/hello_looper/blob/master/pipeline/pipeline_interface.yaml)), which tells looper how to connect to the pipeline ([pipeline/count_lines.sh](https://github.com/pepkit/hello_looper/blob/master/pipeline/)). + +The 3 folders (`data`, `project`, and `pipeline`) are modular; there is no need for these to live in any predetermined folder structure. For this example, the data and pipeline are included locally, but in practice, they are usually in a separate folder; you can point to anything (so data, pipelines, and projects may reside in distinct spaces on disk). You may also include more than one pipeline interface in your `project_config.yaml`, so in a looper project, many-to-many relationships are possible. + + + +## Pipeline outputs + +Outputs of pipeline runs will be under the directory specified in the `output_dir` variable under the `paths` section in the project config file (see the [config files page](config-files.md)). Let's inspect that `project_config.yaml` file to see what it says under `output_dir`: + + + +```python +!cat hello_looper-master/project/project_config.yaml +``` + +```.output +metadata: + sample_annotation: sample_annotation.csv + output_dir: $HOME/hello_looper_results + pipeline_interfaces: ../pipeline/pipeline_interface.yaml + +``` + +Alright, next let's explore what this pipeline stuck into our `output_dir`: + + + +```python +!tree $HOME/hello_looper_results +``` + +```.output +/home/nsheff/hello_looper_results +├── results_pipeline +└── submission + ├── count_lines.sh_frog_1.log + ├── count_lines.sh_frog_1.sub + ├── count_lines.sh_frog_2.log + ├── count_lines.sh_frog_2.sub + ├── frog_1.yaml + └── frog_2.yaml + +2 directories, 6 files + +``` + + +Inside of an `output_dir` there will be two directories: + +- `results_pipeline` - a directory with output of the pipeline(s), for each sample/pipeline combination (often one per sample) +- `submissions` - which holds a YAML representation of each sample and a log file for each submitted job + +From here to running hundreds of samples of various sample types is virtually the same effort! + + + +## A few more basic looper options + +Looper also provides a few other simple arguments that let you adjust what it does. You can find a [complete reference of usage](usage) in the docs. Here are a few of the more common options: + +For `looper run`: + +- `-d`: Dry run mode (creates submission scripts, but does not execute them) +- `--limit`: Only run a few samples +- `--lumpn`: Run several commands together as a single job. This is useful when you have a quick pipeline to run on many samples and want to group them. + +There are also other commands: + +- `looper check`: checks on the status (running, failed, completed) of your jobs +- `looper summarize`: produces an output file that summarizes your project results +- `looper destroy`: completely erases all results so you can restart +- `looper rerun`: rerun only jobs that have failed. + + +## On your own + +To use `looper` on your own, you will need to prepare 2 things: a **project** (metadata that define *what* you want to process), and **pipelines** (*how* to process data). +The next sections define these: + +1. **Project**. To link your project to `looper`, you will need to [define your project](define-your-project.md) using PEP format. +2. **Pipelines**. You will want to either use pre-made `looper`-compatible pipelines or link your own custom-built pipelines. Read how to [connect your pipeline](linking-a-pipeline.md) to `looper`. + From 844041de7f712a5b7afbe03a7daeb350086e9c86 Mon Sep 17 00:00:00 2001 From: Vince Reuter Date: Wed, 24 Apr 2019 11:03:21 -0400 Subject: [PATCH 9/9] adjust reqs --- requirements/requirements-all.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/requirements/requirements-all.txt b/requirements/requirements-all.txt index 3f5ff6a88..b144de7cd 100644 --- a/requirements/requirements-all.txt +++ b/requirements/requirements-all.txt @@ -3,4 +3,6 @@ colorama>=0.3.9 logmuse>=0.0.2 pandas>=0.20.2 pyyaml>=3.12 +divvy>=0.3.1 peppy>=0.20 +