pl-bulk-rename
is a ChRIS
ds plugin which copies files from an input directory to an
output directory under different names using regular expressions.
pl-bulk-rename
is a ChRIS plugin, meaning it can
run from either within ChRIS or the command-line.
Regular expression syntax is based on the regex crate. See https://docs.rs/regex/1.5.5/regex/#grouping-and-flags
To get started with local command-line usage, use Apptainer
(a.k.a. Singularity) to run pl-bulk-rename
as a container.
To print its available options, run:
singularity exec docker://fnndsc/pl-bulk-rename bulkrename --help
bulkrename
copies data from an input directory to an output directory.
Consider the data in examples/input
:
examples/input
├── a
│ ├── food.txt
│ └── log
├── b
│ ├── food.txt
│ └── log
└── c
├── food.txt
└── log
To rename every .txt
file to have the name of their parent directory:
bulkrename --filter '.*\.txt' \
--expression '^(.*?)/(.*?)\.txt$' \
--replace '$1.txt' \
examples/input examples/filewise
To rename the subdirectories a
, b
, c
, of the input directory to have a prefix pear_
:
bulkrename --filter '^[abc]$' \
--expression '([abc])' \
--replace 'pear_$1' \
examples/input examples/dirwise
Pracitcal walkthroughs of how to use pl-bulk-rename
in ChRIS feeds.
Factor levels (i.e. the independent variable) of a data experiment are represented in a ChRIS feed as parallel/sibling branches of a tree.
Consider this example data experiment:
- We create a feed using
pl-dircopy
which produces our input filessubject1.txt
,subject2.txt
,subject3.txt
, ... - We run 3 instances of a ds plugin
pl-experiment
which, for every*.txt
in its input directory, creates a resulting*.dat
in the output directory. The plugin instance IDs are 4, 5, 6. - Over all the results we want to perform aggregation and data visualization to arrive at a scientific conclusion.
Joining the branches is done using
pl-topologicalcopy
.
Since the outputs of pl-experiment
for each instance is the same
(something like subject1.dat
, subject2.dat
, subject3.dat
, ...)
we use --groupByInstance
so that the output of the feed-join looks like
pl-topologicalcopy/data
├── 4
│ ├── subject1.dat
│ ├── subject2.dat
│ └── subject3.dat
├── 5
│ ├── subject1.dat
│ ├── subject2.dat
│ └── subject3.dat
└── 6
├── subject1.dat
├── subject2.dat
└── subject3.dat
Next, we use pl-bulk-rename
to reorganize the results by subject,
which makes more sense in the context of our downstream analysis.
pl-bulk-rename/data
├── subject1
│ ├── plinst4.dat
│ ├── plinst5.dat
│ └── plinst6.dat
├── subject2
│ ├── plinst4.dat
│ ├── plinst5.dat
│ └── plinst6.dat
└── subject3
├── plinst4.dat
├── plinst5.dat
└── plinst6.dat
In the ChRIS user interface,
pl-bulk-rename
would be configured like this:
The whole feed for the experiment will look something like
O id=3 pl-dircopy --dir uploads/user/inputData
/ | \
/ | \
O | | id=4 pl-experiment --factor 0.05
| O | id=5 pl-experiment --factor 0.10
| | O id=6 pl-experiment --factor 0.15
\ | /
\ | /
O id=7 pl-topologicalcopy --plugininstances 4,5,6 --groupByInstance
|
O id=8 pl-bulk-rename --filter '.*\.dat' \
| --expression '^(\d+)/(.*)(\.dat)$' \
| --replace '$2/plinst$1$3'
|
O id=9 pl-aggregate-stats
|
O id=10 pl-plot-data
Continuing from above, you might want to rename the data files in each subdirectory.
inputdir/
├── subject1
│ ├── plinst4.dat
│ ├── plinst5.dat
│ └── plinst6.dat
├── subject2
│ ├── plinst4.dat
│ ├── plinst5.dat
│ └── plinst6.dat
└── subject3
├── plinst4.dat
├── plinst5.dat
└── plinst6.dat
These file names aren't too helpful. We want to rename them
to be descriptive of how they were produced instead of referring
to them by ChRIS plugin instance IDs,
e.g. rename plinst4.dat
to experiment_factor_0.05_result.dat
.
inputdir/
├── subject1
│ ├── experiment_factor_0.05_result.dat
│ ├── experiment_factor_0.10_result.dat
│ └── experiment_factor_0.15_result.dat
├── subject2
│ ├── experiment_factor_0.05_result.dat
│ ├── experiment_factor_0.10_result.dat
│ └── experiment_factor_0.15_result.dat
└── subject3
├── experiment_factor_0.05_result.dat
├── experiment_factor_0.10_result.dat
└── experiment_factor_0.15_result.dat
For each factor, call pl-bulk-rename
to map plinstN
to experiment_factor_M_result
.
O id=8
/ | \
/ | \
O | | id=11 pl-bulk-rename --filter '.*/plinst4\.dat$' \
| | | --expression '^(.*)/(.*)$' \
| | | --replace '$1/experiment_factor_0.05_result.dat'
| | |
| O | id=12 pl-bulk-rename --filter '.*/plinst5\.dat$' \
| | | --expression '^(.*)/(.*)$' \
| | | --replace '$1/experiment_factor_0.10_result.dat'
| | |
| | O id=13 pl-bulk-rename --filter '.*/plinst6\.dat$' \
| | | --expression '^(.*)/(.*)$' \
| | | --replace '$1/experiment_factor_0.15_result.dat'
\ | /
\ | /
O id=14 pl-topologicalcopy --plugininstances 11,12,13