Skip to content

Commit

Permalink
Merge branch 'dev' of github.com:biocore/qiita into multiple-inputs-a…
Browse files Browse the repository at this point in the history
…dd_default_workflow
  • Loading branch information
antgonza committed Feb 20, 2024
2 parents f063ca0 + 57b84cf commit 94fa00b
Show file tree
Hide file tree
Showing 4 changed files with 111 additions and 114 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/qiita-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ jobs:
echo "5. Setting up qiita"
conda activate qiita
# adapt environment_script for private qiita plugins from travis to github actions.
sed 's#export PATH="/home/travis/miniconda3/bin:$PATH"; source #source /home/runner/.profile; conda #' -i qiita_db/support_files/patches/54.sql
qiita-env make --no-load-ontologies
qiita-test-install
qiita plugins update
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Deployed on January 8th, 2024
* Workflow definitions can now use sample or preparation information columns/values to differentiate between them.
* Updated the Adapter and host filtering plugin (qp-fastp-minimap2) to v2023.12 addressing a bug in adapter filtering; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/qp-fastp-minimap2.html).
* Other fixes: [3334](https://github.com/qiita-spots/qiita/pull/3334), [3338](https://github.com/qiita-spots/qiita/pull/3338). Thank you @sjanssen2.
* The internal Sequence Processing Pipeline is now using the human pan-genome reference, together with the GRCh38 genome + PhiX and CHM13 genome for human host filtering.
* The internal Sequence Processing Pipeline is now using the human pan-genome reference, together with the GRCh38 genome + PhiX and T2T-CHM13v2.0 genome for human host filtering.


Version 2023.10
Expand Down
20 changes: 10 additions & 10 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,9 +162,9 @@ Navigate to the cloned directory and ensure your conda environment is active:
cd qiita
source activate qiita
```
If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. The following commands will install a C++ compiler and `libpq-dev`:
If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. If you use the the GNU Compiler Collection, make sure to have `gcc` and `g++` available. The following commands will install a C++ compiler and `libpq-dev`:
```bash
sudo apt install gcc # alternatively, you can install clang instead
sudo apt install gcc g++ # alternatively, you can install clang instead
sudo apt-get install libpq-dev
```
Install Qiita (this occurs through setuptools' `setup.py` file in the qiita directory):
Expand All @@ -178,7 +178,7 @@ At this point, Qiita will be installed and the system will start. However,
you will need to install plugins in order to process any kind of data. For a list
of available plugins, visit the [Qiita Spots](https://github.com/qiita-spots)
github organization. Each of the plugins have their own installation instructions, we
suggest looking at each individual .travis.yml file to see detailed installation
suggest looking at each individual .github/workflows/qiita-plugin-ci.yml file to see detailed installation
instructions. Note that the most common plugins are:
- [qtp-biom](https://github.com/qiita-spots/qtp-biom)
- [qtp-sequencing](https://github.com/qiita-spots/qtp-sequencing)
Expand Down Expand Up @@ -224,15 +224,15 @@ export REDBIOM_HOST=http://my_host.com:7379

## Configure NGINX and supervisor

(NGINX)[https://www.nginx.com/] is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us
to have multiple workers. Note that we are already installing (NGINX)[https://www.nginx.com/] within the Qiita conda environment; also,
that Qiita comes with an example (NGINX)[https://www.nginx.com/] config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds.
[NGINX](https://www.nginx.com/) is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us
to have multiple workers. Note that we are already installing [NGINX](https://www.nginx.com/) within the Qiita conda environment; also,
that Qiita comes with an example [NGINX](https://www.nginx.com/) config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds.

Now, (supervisor)[https://github.com/Supervisor/supervisor] will allow us to start all the workers we want based on its configuration file; and we
need that both the (NGINX)[https://www.nginx.com/] and (supervisor)[https://github.com/Supervisor/supervisor] config files to match. For our Travis
Now, [supervisor](https://github.com/Supervisor/supervisor) will allow us to start all the workers we want based on its configuration file; and we
need that both the [NGINX](https://www.nginx.com/) and [supervisor](https://github.com/Supervisor/supervisor) config files to match. For our Travis
testing we are creating 3 workers: 21174 for master and 21175-6 as a regular workers.

If you are using (NGINX)[https://www.nginx.com/] via conda, you are going to need to create the NGINX folder within the environment; thus run:
If you are using [NGINX](https://www.nginx.com/) via conda, you are going to need to create the NGINX folder within the environment; thus run:

```bash
mkdir -p ${CONDA_PREFIX}/var/run/nginx/
Expand All @@ -256,7 +256,7 @@ Start the qiita server:
qiita pet webserver start
```

If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `test@foo.bar` and `password` as the credentials. (In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).)
If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `test@foo.bar` and `password` as the credentials. (Login as `admin@foo.bar` with `password` to see admin functionality. In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).)



Expand Down
201 changes: 98 additions & 103 deletions qiita_db/metadata_template/prep_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -893,117 +893,112 @@ def _get_predecessors(workflow, node):

# let's just keep one, let's give it preference to the one with the
# most total_conditions_satisfied
workflows = sorted(workflows, key=lambda x: x[0], reverse=True)[:1]
_, wk = sorted(workflows, key=lambda x: x[0], reverse=True)[0]
missing_artifacts = dict()
for _, wk in workflows:
missing_artifacts[wk] = dict()
for node, degree in wk.graph.out_degree():
if degree != 0:
continue
mscheme = _get_node_info(wk, node)
if mscheme not in merging_schemes:
missing_artifacts[wk][mscheme] = node
if not missing_artifacts[wk]:
del missing_artifacts[wk]
for node, degree in wk.graph.out_degree():
if degree != 0:
continue
mscheme = _get_node_info(wk, node)
if mscheme not in merging_schemes:
missing_artifacts[mscheme] = node
if not missing_artifacts:
# raises option b.
raise ValueError('This preparation is complete')

# 3.
for wk, wk_data in missing_artifacts.items():
previous_jobs = dict()
for ma, node in wk_data.items():
predecessors = _get_predecessors(wk, node)
predecessors.reverse()
cmds_to_create = []
init_artifacts = None
for i, (pnode, cnode, cxns) in enumerate(predecessors):
cdp = cnode.default_parameter
cdp_cmd = cdp.command
params = cdp.values.copy()

icxns = {y: x for x, y in cxns.items()}
reqp = {x: icxns[y[1][0]]
for x, y in cdp_cmd.required_parameters.items()}
cmds_to_create.append([cdp_cmd, params, reqp])

info = _get_node_info(wk, pnode)
if info in merging_schemes:
if set(merging_schemes[info]) >= set(cxns):
init_artifacts = merging_schemes[info]
break
if init_artifacts is None:
pdp = pnode.default_parameter
pdp_cmd = pdp.command
params = pdp.values.copy()
# verifying that the workflow.artifact_type is included
# in the command input types or raise an error
wkartifact_type = wk.artifact_type
reqp = dict()
for x, y in pdp_cmd.required_parameters.items():
if wkartifact_type not in y[1]:
raise ValueError(f'{wkartifact_type} is not part '
'of this preparation and cannot '
'be applied')
reqp[x] = wkartifact_type

cmds_to_create.append([pdp_cmd, params, reqp])

if starting_job is not None:
init_artifacts = {
wkartifact_type: f'{starting_job.id}:'}
else:
init_artifacts = {wkartifact_type: self.artifact.id}

cmds_to_create.reverse()
current_job = None
loop_starting_job = starting_job
for i, (cmd, params, rp) in enumerate(cmds_to_create):
if loop_starting_job is not None:
previous_job = loop_starting_job
loop_starting_job = None
else:
previous_job = current_job
if previous_job is None:
req_params = dict()
for iname, dname in rp.items():
if dname not in init_artifacts:
msg = (f'Missing Artifact type: "{dname}" in '
'this preparation; this might be due '
'to missing steps or not having the '
'correct raw data.')
# raises option c.
raise ValueError(msg)
req_params[iname] = init_artifacts[dname]
else:
req_params = dict()
connections = dict()
for iname, dname in rp.items():
req_params[iname] = f'{previous_job.id}{dname}'
connections[dname] = iname
params.update(req_params)
job_params = qdb.software.Parameters.load(
cmd, values_dict=params)

if params in previous_jobs.values():
for x, y in previous_jobs.items():
if params == y:
current_job = x
previous_jobs = dict()
for ma, node in missing_artifacts.items():
predecessors = _get_predecessors(wk, node)
predecessors.reverse()
cmds_to_create = []
init_artifacts = None
for i, (pnode, cnode, cxns) in enumerate(predecessors):
cdp = cnode.default_parameter
cdp_cmd = cdp.command
params = cdp.values.copy()

icxns = {y: x for x, y in cxns.items()}
reqp = {x: icxns[y[1][0]]
for x, y in cdp_cmd.required_parameters.items()}
cmds_to_create.append([cdp_cmd, params, reqp])

info = _get_node_info(wk, pnode)
if info in merging_schemes:
if set(merging_schemes[info]) >= set(cxns):
init_artifacts = merging_schemes[info]
break
if init_artifacts is None:
pdp = pnode.default_parameter
pdp_cmd = pdp.command
params = pdp.values.copy()
# verifying that the workflow.artifact_type is included
# in the command input types or raise an error
wkartifact_type = wk.artifact_type
reqp = dict()
for x, y in pdp_cmd.required_parameters.items():
if wkartifact_type not in y[1]:
raise ValueError(f'{wkartifact_type} is not part '
'of this preparation and cannot '
'be applied')
reqp[x] = wkartifact_type

cmds_to_create.append([pdp_cmd, params, reqp])

if starting_job is not None:
init_artifacts = {
wkartifact_type: f'{starting_job.id}:'}
else:
init_artifacts = {wkartifact_type: self.artifact.id}

cmds_to_create.reverse()
current_job = None
loop_starting_job = starting_job
for i, (cmd, params, rp) in enumerate(cmds_to_create):
if loop_starting_job is not None:
previous_job = loop_starting_job
loop_starting_job = None
else:
previous_job = current_job
if previous_job is None:
req_params = dict()
for iname, dname in rp.items():
if dname not in init_artifacts:
msg = (f'Missing Artifact type: "{dname}" in '
'this preparation; this might be due '
'to missing steps or not having the '
'correct raw data.')
# raises option c.
raise ValueError(msg)
req_params[iname] = init_artifacts[dname]
else:
req_params = dict()
connections = dict()
for iname, dname in rp.items():
req_params[iname] = f'{previous_job.id}{dname}'
connections[dname] = iname
params.update(req_params)
job_params = qdb.software.Parameters.load(
cmd, values_dict=params)

if params in previous_jobs.values():
for x, y in previous_jobs.items():
if params == y:
current_job = x
else:
if workflow is None:
PW = qdb.processing_job.ProcessingWorkflow
workflow = PW.from_scratch(user, job_params)
current_job = [
j for j in workflow.graph.nodes()][0]
else:
if workflow is None:
PW = qdb.processing_job.ProcessingWorkflow
workflow = PW.from_scratch(user, job_params)
current_job = [
j for j in workflow.graph.nodes()][0]
if previous_job is None:
current_job = workflow.add(
job_params, req_params=req_params)
else:
if previous_job is None:
current_job = workflow.add(
job_params, req_params=req_params)
else:
current_job = workflow.add(
job_params, req_params=req_params,
connections={previous_job: connections})
previous_jobs[current_job] = params
current_job = workflow.add(
job_params, req_params=req_params,
connections={previous_job: connections})
previous_jobs[current_job] = params

return workflow

Expand Down

0 comments on commit 94fa00b

Please sign in to comment.