-
Notifications
You must be signed in to change notification settings - Fork 29
Add checkformat #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hechth
wants to merge
9
commits into
master
Choose a base branch
from
hechth/issue325
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add checkformat #355
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
31fb478
initially added files for checkformat tool
hechth d5f417a
updated version and license
hechth d0333f8
changed idnentation to 4
hechth 7c63e3e
removed outdated links
hechth d4318a4
removed stdio section
hechth 74284f9
lint
hechth f9be6e8
moved things to configfile, pinned R version
hechth ba90e14
added help texts
hechth 72a73fe
Moved news section to changelog
hechth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| name: checkformat | ||
| owner: ethevenot | ||
| description: '[W4M][Metabolomics][LC-MS][GC-MS][NMR] Checks the formats of the dataMatrix, sampleMetadata, and variableMetadata files.' | ||
| homepage_url: http://workflow4metabolomics.org | ||
| long_description: 'For all post-processing steps of the peak table, W4M uses a 3 table format for the data and metadata. This module therefore checks that the formats of the 3 files "dataMatrix.tsv", "sampleMetadata.tsv", and "variableMetadata.tsv" are correct. It can be used before any post-processing step (such as normalization or statistical analysis). Potential warnings or errors in the formats are returned in the "information.txt" output file.' | ||
| remote_repository_url: https://github.com/workflow4metabolomics/checkformat.git | ||
| categories: | ||
| - Metabolomics |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # Changelog | ||
| All notable changes to this project will be documented in this file. | ||
|
|
||
| The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). | ||
|
|
||
| ## [3.0.0+galaxy0] - 2025-07-16 | ||
| ### Changed | ||
| - migrated from Gitlab to GitHub | ||
|
|
||
| ## [3.0.0] - 2018-03-01 | ||
| ### Added | ||
| - Automated re-ordering (if necessary) of sample and/or variable names from `dataMatrix` based on `sampleMetadata` and `variableMetadata`. | ||
| - New argument to make sample and variable names syntactically valid. | ||
| - Output of `dataMatrix`, `sampleMetadata`, and `variableMetadata` files, whether they have been modified or not. | ||
|
|
||
| ## [2.0.4] - 2017-06-06 | ||
| ### Changed | ||
| - Minor internal modifications. | ||
|
|
||
| ## [2.0.2] - 2016-07-30 | ||
| ### Changed | ||
| - Test for R code. | ||
| - Planemo running validation. | ||
| - Planemo installing validation. | ||
| - Travis automated testing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| <tool id="checkFormat" name="Check Format" version="3.0.0+galaxy0" license="CECILL-2.1" profile="23.0"> | ||
| <description>Checking/formatting the sample and variable names of the dataMatrix, sampleMetadata, and variableMetadata files</description> | ||
|
|
||
| <requirements> | ||
| <requirement type="package" version="4.3.3">r-base</requirement> | ||
| </requirements> | ||
|
|
||
| <required_files> | ||
| <include path="checkformat_script.R" /> | ||
| </required_files> | ||
|
|
||
| <command detect_errors="aggressive"><![CDATA[ | ||
| Rscript -e 'source("$__tool_directory__/checkformat_script.R")' -e "source('$run_script')" | ||
| ]]></command> | ||
|
|
||
| <configfiles> | ||
| <configfile name="run_script"><![CDATA[ | ||
| sink("$information", append=TRUE, split=TRUE) | ||
|
|
||
| resLs <- readAndCheckF( | ||
| '$dataMatrix_in', | ||
| '$sampleMetadata_in', | ||
| '$variableMetadata_in', | ||
| $makeNameL | ||
| ) | ||
|
|
||
| write_dataMatrix(resLs[["datMN"]], "$dataMatrix_out") | ||
| write_metadata(resLs[["samDF"]], "$sampleMetadata_out") | ||
| write_metadata(resLs[["varDF"]], "$variableMetadata_out") | ||
|
|
||
| if (resLs[["chkL"]]) { | ||
| if (resLs[["newL"]]) { | ||
| cat("\nWarning: The sample and/or variable names or orders from the input tables have been modified\n(see the information file for details); please use the new output tables for your analyses.\n") | ||
| } else { | ||
| cat("\nThe input tables have a correct format and can be used for your analyses.\n") | ||
| } | ||
| } | ||
| ]]></configfile> | ||
| </configfiles> | ||
|
|
||
| <inputs> | ||
| <param name="dataMatrix_in" type="data" label="Data matrix file" format="tabular" | ||
| help="Tabular file containing the numeric data matrix (variables as rows, samples as columns). Row and column names must match those in the sample and variable metadata files. Use '.' as decimal and 'NA' for missing values. No extra metadata should be present." /> | ||
| <param name="sampleMetadata_in" type="data" label="Sample metadata file" format="tabular" | ||
| help="Tabular file with sample metadata (samples as rows, metadata fields as columns). Row names must match the column names of the data matrix. Use '.' as decimal and 'NA' for missing values." /> | ||
| <param name="variableMetadata_in" type="data" label="Variable metadata file" format="tabular" | ||
| help="Tabular file with variable metadata (variables as rows, metadata fields as columns). Row names must match the row names of the data matrix. Use '.' as decimal and 'NA' for missing values." /> | ||
| <param name="makeNameL" label="Make syntactically valid sample and variable names" type="select" | ||
| help="If set to 'yes', sample and variable names will be converted to syntactically valid R names using the 'make.names' function (e.g., names starting with a digit will be prefixed with 'X', spaces will be replaced by '.', etc.)."> | ||
| <option value="TRUE">yes</option> | ||
| <option value="FALSE" selected="true">no</option> | ||
| </param> | ||
| </inputs> | ||
|
|
||
| <outputs> | ||
| <data name="dataMatrix_out" label="${tool.name}_${dataMatrix_in.name}" format="tabular"/> | ||
| <data name="sampleMetadata_out" label="${tool.name}_${sampleMetadata_in.name}" format="tabular"/> | ||
| <data name="variableMetadata_out" label="${tool.name}_${variableMetadata_in.name}" format="tabular"/> | ||
| <data name="information" label="${tool.name}_information.txt" format="txt"/> | ||
| </outputs> | ||
|
|
||
| <tests> | ||
| <test> | ||
| <param name="dataMatrix_in" value="input-dataMatrix.tsv"/> | ||
| <param name="sampleMetadata_in" value="input-sampleMetadata.tsv"/> | ||
| <param name="variableMetadata_in" value="input-variableMetadata.tsv"/> | ||
| <param name="makeNameL" value="TRUE"/> | ||
| <output name="information"> | ||
| <assert_contents> | ||
| <has_text text="Message: Converting sample and variable names to the standard R format" /> | ||
| <has_text text="Warning: The sample and/or variable names or orders from the input tables have been modified" /> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
|
|
||
| <help><![CDATA[ | ||
|
|
||
| .. class:: infomark | ||
|
|
||
| **Author** Etienne Thevenot (W4M Core Development Team, MetaboHUB Paris, CEA) | ||
|
|
||
| --------------------------------------------------- | ||
|
|
||
| ============ | ||
| Check Format | ||
| ============ | ||
|
|
||
| ----------- | ||
| Description | ||
| ----------- | ||
|
|
||
| | **Checks the format (row and column names)** of the dataMatrix, sampleMetadata and variableMetadata tables; in case of difference of orders of the samples and/or variables between (some of) the tables, the **orders from the dataMatrix are permuted** to match those of the sampleMetadata and/or the variableMetadata; sample and variables names can also be modified to be **syntactically valid** for R by selecting the corresponding argument (e.g. an 'X' is added to names starting with a digit, blanks will be converted to '.', etc.). | ||
|
|
||
|
|
||
| ----------------- | ||
| Workflow position | ||
| ----------------- | ||
|
|
||
| .. image:: ./static/images/checkFormat_workflowPositionImage.png | ||
|
|
||
|
|
||
| ----------- | ||
| Input files | ||
| ----------- | ||
|
|
||
| +----------------------------+---------+ | ||
| | Parameter : num + label | Format | | ||
| +============================+=========+ | ||
| | 1 : Data matrix file | tabular | | ||
| +----------------------------+---------+ | ||
| | 2 : Sample metadata file | tabular | | ||
| +----------------------------+---------+ | ||
| | 3 : Variable metadata file | tabular | | ||
| +----------------------------+---------+ | ||
|
|
||
| | The **required formats** for the dataMatrix, sampleMetadata, and variableMetadata files are described in the **HowTo** entitled 'Format Data For Postprocessing' available on the main page of Workflow4Metabolomics.org (https://nextcloud.inrae.fr/s/qLkNZRf84QQ5YLY). | ||
|
|
||
|
|
||
| ---------- | ||
| Parameters | ||
| ---------- | ||
|
|
||
| Data matrix file | ||
| | variable x sample **dataMatrix** tabular separated file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and variable metadata, respectively (see below) | ||
| | | ||
|
|
||
| Sample metadata file | ||
| | sample x metadata **sampleMetadata** tabular separated file of the numeric and/or character sample metadata, with . as decimal and NA for missing values | ||
| | | ||
|
|
||
| Variable metadata file | ||
| | variable x metadata **variableMetadata** tabular separated file of the numeric and/or character variable metadata, with . as decimal and NA for missing values | ||
| | | ||
|
|
||
| Make syntactically valid sample and variable names | ||
| | if set to 'yes', sample and variable names will converted to syntactically valid names with the 'make.names' R function when required (e.g. an 'X' is added to names starting with a digit, blanks will be converted to '.', etc.) | ||
| | | ||
|
|
||
| ------------ | ||
| Output files | ||
| ------------ | ||
|
|
||
| dataMatrix_out.tabular | ||
| | dataMatrix data file; may be identical to the input dataMatrix in case no renaming of sample/variable names nor re-ordering of samples/variables (see the 'information' file for the presence/absence of modifications) | ||
| | | ||
|
|
||
| sampleMetadata_out.tabular | ||
| | sampleMetadata data file; may be identical to the input sampleMetadata in case no renaming of sample names nor re-ordering of samples (see the 'information' file for the presence/absence of modifications) | ||
| | | ||
|
|
||
| variableMetadata_out.tabular | ||
| | variableMetadata data file; may be identical to the input variableMetadata in case no renaming of variable names nor re-ordering of variables (see the 'information' file for the presence/absence of modifications) | ||
| | | ||
|
|
||
| information.txt | ||
| | Text file with all messages when error(s) in formats are detected | ||
| | | ||
|
|
||
| --------------------------------------------------- | ||
|
|
||
| --------------- | ||
| Working example | ||
| --------------- | ||
|
|
||
| .. class:: infomark | ||
|
|
||
| See the **W4M00001a_sacurine-subset-statistics**, **W4M00001b_sacurine-complete**, **W4M00002_mtbls2**, or **W4M00003_diaplasma** shared histories in the **Shared Data/Published Histories** menu. | ||
|
|
||
| ]]></help> | ||
|
|
||
| <citations> | ||
| <citation type="doi">10.1021/acs.jproteome.5b00354</citation> | ||
| <citation type="doi">10.1016/j.biocel.2017.07.002</citation> | ||
| <citation type="doi">10.1093/bioinformatics/btu813</citation> | ||
| </citations> | ||
| </tool> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.