Skip to content
This repository has been archived by the owner on Apr 11, 2023. It is now read-only.

Commit

Permalink
Merge pull request #62 from ewels/v0.4devel
Browse files Browse the repository at this point in the history
v0.4devel Merge
  • Loading branch information
ewels committed May 11, 2015
2 parents 58d80cf + 1c66165 commit 39e35fd
Show file tree
Hide file tree
Showing 136 changed files with 39,593 additions and 4,116 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# /
/dev
/scripts
/clusterflow.config
/genomes.config
/genomes.config
*.pyc
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[submodule "docs/_site/parsedown"]
path = docs/_site/parsedown
url = https://github.com/erusev/parsedown
[submodule "docs/_site/parsedown-extra"]
path = docs/_site/parsedown-extra
url = https://github.com/erusev/parsedown-extra
33 changes: 33 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Cluster Flow: How to Contribute

### Making an Issue
First - before you start working on a change to the repository, please make
sure that there are no exisiting
[issues](https://github.com/ewels/clusterflow/issues) relating to
whatever change you intend to make. If there aren't, please create one
so that others know that you're working on something.

### Workflow
The workflow for adding to this repository should be as follows:

1. [Create an issue](https://github.com/ewels/clusterflow/issues)
describing what you intend to work on
2. Fork the [development branch](https://github.com/ewels/clusterflow/branches) of the repository to your own GitHub account
1. Any changes you make will be merged into this fork. If you make changes to the stable master branch instead, this will be a much more painful process.
3. Make your changes. Remember to note these in the `README.md` changelog.
5. Submit a Pull Request describing your changes. I will review your code and merge.

### Retrospective Workflow
This is all well and good if you haven't already started hacking the code. If you downloaded a static version of the code and made your changes, this is the ideal workflow:

1. Register with [github.com](https://github.com/) if you haven't already
1. There are excellent github tutorials about [forking repositories](https://help.github.com/articles/fork-a-repo/) and creating [pull-requests](https://help.github.com/articles/using-pull-requests/) .
2. Fork the [development branch](https://github.com/ewels/clusterflow/branches) of Cluster Flow to your own GitHub account
3. Pull this fork to your system using [`git clone`](https://help.github.com/articles/fetching-a-remote/)
4. Replace the downloaded files with your modified versions.
5. Push your updates using `git commit -a -m "Message"` and `git push`.
6. Create a [pull request](https://github.com/ewels/clusterflow/pulls) so that the changes can be merged with the development branch.


### Getting Help
If you have any queries, please get in touch with [Phil Ewels](https://github.com/ewels). Thanks for contributing!
110 changes: 93 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,85 @@
Cluster Flow
============
# Cluster Flow

Cluster Flow is a pipelining tool to automate and standardise bioinformatics analyses on high-performance cluster environments. It is designed to be easy to use, quick to set up and flexible to configure.

## Cluster Flow Website - [http://ewels.github.io/clusterflow/](http://ewels.github.io/clusterflow/)
## Documentation
For Cluster Flow documentation with information and examples, see: **[http://clusterflow.io](http://clusterflow.io)**

There's a new website which for the Cluster Flow documentation which has loads of helpful information and examples. You can see it here: [http://ewels.github.io/clusterflow/](http://ewels.github.io/clusterflow/)
## Download
You can find stable versions to download on the [releases page](https://github.com/ewels/clusterflow/releases).

If you're anxious to just get your hands on the code, check out the [releases page](https://github.com/ewels/clusterflow/releases)
You can get the development version of the code by cloning this repository:
```
git clone https://github.com/ewels/clusterflow.git
```
Alternatively, you can download a [.zip file](https://github.com/ewels/clusterflow/archive/master.zip)

Licence
-------
Cluster Flow is released with a GPL v3 licence. Cluster Flow is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. For more information, see the licence that comes bundled with Cluster Flow.
## Change Log

#### v0.4 devel
* **Warning: Break of backwards compatability**
* The way that genome references are handled has been rewritten.
* Genome references are no longer tied to specific types, they are now agnostic.
to the type of reference, making it far easier to whatever type of reference you need.
Additionally, the wizard to add genome paths has been written and is now largely automated,
making it super fast to add new genomes.
* A consequence of this change is any `genomes.config` files written before v0.3 of
Cluster Flow will no longer work. Thankfully the fix is easy! Replace `@bowtie_path`
with `@reference bowtie`. `@gtf_path` changes to `@reference gtf` and so on.
`@genome_path` changes to `@reference fasta`.
* If you have any custom pipelines these will also need to be updated. `@require_bowtie`
changes to `@require_reference bowtie`and so on. See updated example module files
for examples on how to update custom modules.
* Apologies for any inconvenience that this change incurs. Feel free to [get in touch](https://github.com/ewels)
if you have any problems.
* `~/clusterflow/` directory moved to `~/.clusterflow/` to reduce home directory clutter.
* Cluster Flow won't find your old config file - run `mv ~/clusterflow/ ~/.clusterflow/` to fix.

* New Stuff
* You can now run Cluster Flow locally (new `@cluster_environment` `local` )
* Tested on Mac OSX and Linux. Includes `--qstat` and `--qdel` functionality
* Allows easy testing and use of pipelines for those without access to a HPC cluster.
* New `--environment` command line option allows you to set this at run time.
* The `--make_config` wizard has been renamed to `--setup` and does a lot more stuff
* Should make first-run of Cluster Flow much easier - just download and run `cf --setup`
* Support for STAR RNA-seq aligner (thanks to [@stu2](https://github.com/stu2))
* Modules are given more information via the run file to help
decide the amount of memory and cores they bid for (eg. number of files, reference)
* All perl scripts now have `env perl` in shebang to increase portability
* Modules can now have file extensions, as long as they have `.cfmod` at the end of the basename
* This helps editing tools with syntax highlighting, amongst other things
* Python comes to Cluster Flow! The first Python module is up and running, along with a `Helpers.py` module file
* See the `example_module.py` file for help in writing your own modules in Python
* The basic Perl module helpers are now available in the Python packages as well, more translation to follow
* Now using GRIDEngine `h_vmem` memory option instead of `vf`
* Gives a hard memory limit instead of a request limit at job submission time
* Thanks to [@stu2](https://github.com/stu2) and [@s-andrews](https://github.com/s-andrews)
* Support for explicit GRIDEngine queue nomination on the command line
* Modules now print their software versions to the log where possible.
* New `--merge` (command line) and `@merge_regex` (config file) options to automatically merge input files.
* This is implemented using a new module, `cf_merge_files`, which can also be used in pipelines
* If the supplied regexes only match single files, the module can be used to simply rename files
* New `--runfile_prefix` option to help avoid potential filename clashes.
* New `@cluster_project` config option to specify project for cluster jobs.
* Added compatability with GRIDEngine `~/.sge_request files` (by ignoring them).
Thanks to [@s-andrews](https://github.com/s-andrews)
* New tophat module called `tophat` which introduces a workaround for buggy MAPQ
reporting by tophat whilst keeping unique alignments. Thanks to [@FelixKrueger](https://github.com/FelixKrueger).
* The previous tophat module is still available if you're not interested in MAPQ scores and
would like slightly faster processing. This is now called `tophat_broken_MAPQ.cfmod`.
* Pipeline completion e-mails are now written to disk as well (HTML and plain text)
* New log file containing the job submission commands as well as the output received from the cluster at submission (usually numeric job identifiers)
* Removed the `--qstatcols` command line option and added the `@colourful` config option to replace it
* The config wizard is also updated to add this to your personal config
* Added checks to make sure that we have at least one config file, and that the cluster environment is set
* Added new `@environment_module_always` config option to _always_ load certain environment modules at run time.
* Added new `@require_python_package` pipeline option to check that a Python pacakge is installed before pipeline launch
* Bugs Squashed
* Fixed output filename problem in tophat with output cleaning
* Fixed bugs causing minimum memory allocation regardless of availability
* Fixed bug causing Bowtie2 to fail if Bowtie1 index absent
* Cleaned up some unrecognised output that always made it into the log file

Change Log
----------
#### [v0.3](https://github.com/ewels/clusterflow/releases/tag/v0.3) - 2014-07-11
* New Stuff
* Awesome new HTML report e-mails
Expand All @@ -40,23 +105,34 @@ Change Log
* Fixed issue where modules using the CF::Constants Perl Module couldn't load the central config file
* Fixed typo in environment module loading in `bismark_align` module
* Reordered loading of the environment modules in `trim_galore` so that FastQC is loaded first, fixing dependency issues

#### [v0.2](https://github.com/ewels/clusterflow/releases/tag/v0.2) - 2014-05-29
* New Stuff
* Now compatable with SLURM
* Customise batch job commands in the config (see the [docs](http://ewels.github.io/clusterflow/installation/#making_cluster_flow_work_with_your_environment)
* Customise batch job commands in the config (see the
[docs](http://ewels.github.io/clusterflow/installation/#making_cluster_flow_work_with_your_environment))
* Created new GitHub pages website to hold documentation: http://ewels.github.io/clusterflow
* Updates
* Ported repository to github: https://github.com/ewels/clusterflow
* Wrote new readme for github
* Bugs Squashed
* Custom modules in `~/clusterflow/modules/` weren't being found
* General code clean-ups all over the place
* Custom modules in `~/.clusterflow/modules/` weren't being found
* General code clean-ups all over the place

#### [v0.1](https://github.com/ewels/clusterflow/releases/tag/v0.1) - 2014-04-25
* The first public release of Cluster Flow, although it's been in use at the Babraham Institute for around 6 months. It's been in heavy development throughout that time and is now approaching a state of being relatively stable.


Credits
-------
Cluster Flow was written by [Phil Ewels](http://phil.ewels.co.uk) whilst working in the [Babraham Bioinformatics](http://www.bioinformatics.babraham.ac.uk/) group in Cambridge, UK. He now maintains it whilst working at the [Science for Life Laboratory](http://www.scilifelab.se/) in Stockholm, Sweden.
## Credits

Cluster Flow was written by [Phil Ewels](http://phil.ewels.co.uk) whilst working in the
[Babraham Bioinformatics](http://www.bioinformatics.babraham.ac.uk/) group in Cambridge, UK.
He now maintains it whilst working at the [Science for Life Laboratory](http://www.scilifelab.se/) in Stockholm, Sweden.

Cluster Flow has also had contributions from [@stu2](https://github.com/stu2), [@orzechoj](https://github.com/orzechoj),
[@s-andrews](https://github.com/s-andrews) and [@FelixKrueger](https://github.com/FelixKrueger), amongst others.

## License
Cluster Flow is released with a GPL v3 licence. Cluster Flow is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version. For more information, see the licence that comes bundled with Cluster Flow.
Loading

0 comments on commit 39e35fd

Please sign in to comment.