-
Notifications
You must be signed in to change notification settings - Fork 1
Filter Metadata Program
The Filter Metadata program allows different filtering operations to be performed sequentially on the previously generated Metadata Table based on the filtering information provided by the Filter Table. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need to filter your metadata file.
The Filter Table file must contain five columns of interest:
-
Variable. Indicates the names of the columns in the Metadata Table to be used during the filtering process. This column must be indicated as “variable” in the table header.
-
Filter Type. Indicates the filter type to be applied. This column must be indicated as “filter_type” in the table header. Valid options are “categorical” or “numerical”. Note that numeric values can also be treated as categorical.
-
Action. Indicates the action to be applied with the filter. This column must be indicated as “action” in the table header. For the categorical filter, the permitted actions are “keep” and “drop”. Whereas for the numerical filter, the permitted actions are “greater”, “greater_equal”, “equal”, “less” and “less_equal”.
-
NA Treatment. Indicates the action to be applied with the filter for not available (NA) values. This column must be indicated as “NA_treatment” in the table header. The permitted actions are “no” (no NA Treatment will be applied for filtering), “keep” (apply the Action filter and keep NAs) and “drop” (apply the Action filter and drop NAs).
-
Values. Indicates the values of the associated columns of the Metadata Table to be used during the filtering process. This column must be indicated as “values” in the table header. The values are expected to be formatted as a Python list. For numerical filters, only one numeric value is expected. Whereas for categorical filters, multiple values can be provided.
For instance, see the PRJEB10949_filterfile_example.tsv
test file, which contains the following information:
variable | filter_type | action | NA_treatment | values |
---|---|---|---|---|
read_count | numerical | greater_equal | no | [10000] |
scientific_name | categorical | drop | no | ['synthetic metagenome'] |
miseq_kit | numerical | equal | keep | [3] |
replicate | categorical | keep | keep | ['new'] |
tissue | categorical | drop | keep | ['Brain', 'Heart', 'Muscle'] |
Input Elements:
Input | Type | Description |
---|---|---|
PROJECT_metadata.tsv |
File |
Metadata Table. One of the Metadata Tables generated in the different steps of the workflow by Download Metadata ENA program (PROJECT_ENA_metadata.tsv ), Merge Metadata program (PROJECT_merged_metadata.tsv ) or Filter Metadata program (PROJECT_filtered_metadata.tsv ) |
PROJECT_filter_file.tsv |
File |
Filter Table |
Output Elements:
Output | Type | Description |
---|---|---|
PROJECT_filtered_metadata.tsv |
File |
Filtered Metadata Table |
The resulting PROJECT_filtered_metadata.tsv
file is the one that will be used in the next workflow step, namely the Check Metadata ENA program. Nevertheless, depending on your particular case it could also be used in other workflow steps, including the Download Fastqs, Check Fastqs and Make Treatment Template programs. To get an idea of what the next step would be in your particular case, check the workflow's diagram.
Usage:
filter_metadata [-h] -t METADATA_TABLE -f FILTER_TABLE [-o OUTPUT_DIRECTORY] [-x] [-v]
Options:
Parameter | Description |
---|---|
-h, --help |
Show help message and exit. |
-t, --metadata_table |
Metadata Table [Expected sep=TABS]. Indicate the path to the Metadata Table file. |
-f, --filter_table |
Filter Table [Expected sep=TABS]. Indicate the path to the Filter Table file. |
-o, --output_directory |
Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated. |
-x, --plain_text |
Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors. |
-v, --version |
Show program's version number and exit. |
Commands:
- Filter metadata with colored text stdout:
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv
- Filter metadata with plain text stdout:
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv --plain_text
- Filter metadata and save results in the specified directory (Example):
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv -o Example
To see a full and detailed example of dataset curation, see the Tutorial Full Example page.