Skip to content

Filter Metadata Program

sarpiens edited this page Mar 13, 2024 · 8 revisions

Description

The Filter Metadata program allows different filtering operations to be performed sequentially on the previously generated Metadata Table based on the filtering information provided by the Filter Table. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need to filter your metadata file.

The Filter Table file must contain five columns of interest:

  • Variable. Indicates the names of the columns in the Metadata Table to be used during the filtering process. This column must be indicated as “variable” in the table header.

  • Filter Type. Indicates the filter type to be applied. This column must be indicated as “filter_type” in the table header. Valid options are “categorical” or “numerical”. Note that numeric values can also be treated as categorical.

  • Action. Indicates the action to be applied with the filter. This column must be indicated as “action” in the table header. For the categorical filter, the permitted actions are “keep” and “drop”. Whereas for the numerical filter, the permitted actions are “greater”, “greater_equal”, “equal”, “less” and “less_equal”.

  • NA Treatment. Indicates the action to be applied with the filter for not available (NA) values. This column must be indicated as “NA_treatment” in the table header. The permitted actions are “no” (no NA Treatment will be applied for filtering), “keep” (apply the Action filter and keep NAs) and “drop” (apply the Action filter and drop NAs).

  • Values. Indicates the values of the associated columns of the Metadata Table to be used during the filtering process. This column must be indicated as “values” in the table header. The values are expected to be formatted as a Python list. For numerical filters, only one numeric value is expected. Whereas for categorical filters, multiple values can be provided.

For instance, see the PRJEB10949_filterfile_example.tsv test file, which contains the following information:

variable filter_type action NA_treatment values
read_count numerical greater_equal no [10000]
scientific_name categorical drop no ['synthetic metagenome']
miseq_kit numerical equal keep [3]
replicate categorical keep keep ['new']
tissue categorical drop keep ['Brain', 'Heart', 'Muscle']

Input Elements:

Input Type Description
PROJECT_metadata.tsv File Metadata Table. One of the Metadata Tables generated in the different steps of the workflow by Download Metadata ENA program (PROJECT_ENA_metadata.tsv), Merge Metadata program (PROJECT_merged_metadata.tsv) or Filter Metadata program (PROJECT_filtered_metadata.tsv)
PROJECT_filter_file.tsv File Filter Table

Output Elements:

Output Type Description
PROJECT_filtered_metadata.tsv File Filtered Metadata Table

The resulting PROJECT_filtered_metadata.tsv file is the one that will be used in the next workflow step, namely the Check Metadata ENA program. Nevertheless, depending on your particular case it could also be used in other workflow steps, including the Download Fastqs, Check Fastqs and Make Treatment Template programs. To get an idea of what the next step would be in your particular case, check the workflow's diagram.

Arguments

Usage:

filter_metadata [-h] -t METADATA_TABLE -f FILTER_TABLE [-o OUTPUT_DIRECTORY] [-x] [-v]

Options:

Parameter Description
-h, --help Show help message and exit.
-t, --metadata_table Metadata Table [Expected sep=TABS]. Indicate the path to the Metadata Table file.
-f, --filter_table Filter Table [Expected sep=TABS]. Indicate the path to the Filter Table file.
-o, --output_directory Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated.
-x, --plain_text Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors.
-v, --version Show program's version number and exit.

Examples

Commands:

  • Filter metadata with colored text stdout:
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv
  • Filter metadata with plain text stdout:
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv --plain_text
  • Filter metadata and save results in the specified directory (Example):
filter_metadata -t PRJEB10949_merged_metadata.tsv -f PRJEB10949_filterfile_example.tsv -o Example

To see a full and detailed example of dataset curation, see the Tutorial Full Example page.