feat: add new covid_symptom__nlp_results_term_exists task #275

mikix · 2023-09-26T20:18:19Z

Adds a new covid_symptom__nlp_results_term_exists task, which uses
the "termexists" model for polarity checking cTAKES rather than the
previous "negation" model. This task will largely be used to compare
the performance of the two models.
Rename some docker compose targets, like the etl-support profile
into the covid-symptom profile (the thinking is that we'll have
study-specific sets of services that you might want on or off,
depending on what you're doing)
Refactors some NLP code to use shared base classes.
Remove covid_symptom__nlp_results from the default set of tasks.
Study-specific tasks should have to be requested.

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added

mikix · 2023-09-29T12:36:03Z

compose.yaml

-      - etl-support
-      - etl-support-gpu
+      - covid-symptom
+      - covid-symptom-gpu


OK notable change number 1: how do you feel about this direction?

I think I mentioned wanting to toy with this - not a global "run all the services I might ever need" but rather a more targeted "run the services for the studies I'm caring about right now".

It's a more complicated user story, but I feel like the "run everything" approach quickly falls over as we add studies.

You had mentioned anticipating moving to a more comprehensive meta-runner like Kubernetes or something yeah? I'm not opposed, but in the meantime, I'm thinking something this could be a happy medium.

yeah i don't hate this given the addition of a new task, and also that folks may not often be running this now.

mikix · 2023-09-29T12:39:06Z

cumulus_etl/etl/studies/covid_symptom/covid_tasks.py

-class CovidSymptomNlpResultsTask(tasks.EtlTask):
-    """Covid Symptom study task, to generate symptom lists from ED notes using NLP"""
+class BaseCovidSymptomNlpResultsTask(tasks.BaseNlpTask):
+    """Covid Symptom study task, to generate symptom lists from ED notes using cTAKES + a polarity check"""


Notable change number 2: A lot of this PR is refactoring, sorry.

I make a new BaseNlpTask for all NLP jobs (shared by the hugging face test task and the covid tasks) PLUS a base covid symptoms test here that both the (nearly-identical) covid tasks inherit from).

There's no meaningful actual change in the refactoring, except that this covid base task can now pass in a different polarity model param to the NLP code. But the end result of the task is identical (though I did bump its task_version number because I also bumped the negation transformer to 0.6.1).

And the tests when you get to them are also refactored a bit - I added a new shared BaseEtlTest class for all the tests that care about running a full ETL and some refactoring to handle the changes from dropping the covid tests from the default list of ETL jobs (keep reading for that change).

mikix · 2023-09-29T12:41:43Z

cumulus_etl/etl/tasks/factory.py

@@ -52,7 +54,6 @@ def get_default_tasks() -> list[type[AnyTask]]:
        ObservationTask,
        ProcedureTask,
        ServiceRequestTask,
-        covid_symptom.CovidSymptomNlpResultsTask,  # TODO: remove from default list at some point


Notable change number 3: dropping this default task. It feels natural to drop study-specific tasks from the default set. (I don't know if folks really even use the default vs listing the tasks they care about, but still.)

mikix · 2023-09-29T12:43:03Z

cumulus_etl/etl/tasks/nlp_task.py

+from cumulus_etl.etl.tasks.base import EtlTask, OutputTable
+
+
+class BaseNlpTask(EtlTask):


This is the base NLP task I mentioned before. It basically just handles looping over the DocRefs and extracting the notes. With some boilerplate code for handling group fields if the subclass wants that.

mikix · 2023-09-29T12:44:29Z

pyproject.toml

-    "pyarrow < 13",
+    "pyarrow < 14",


Unrelated, just bumped it while I was in here because I saw they put out a 13.0.0 that dropped python 3.7 support.

docs/studies/covid-symptom.md

cumulus_etl/etl/cli.py

dogversioning · 2023-09-29T14:17:15Z

compose.yaml

-      - etl-support
-      - etl-support-gpu
+      - covid-symptom
+      - covid-symptom-gpu


yeah i don't hate this given the addition of a new task, and also that folks may not often be running this now.

cumulus_etl/etl/tasks/nlp_task.py

dogversioning · 2023-09-29T14:37:49Z

docs/studies/covid-symptom.md

+There is also a second optional task `--task covid_symptom__nlp_results_term_exists`,
+which just uses a different polarity cNLP transformer (`termexists` rather than `negation`).
+You likely don't need both, but they may be interesting to compare.


Q: do we want to treat this as a in development style command? i.e. maybe it doesn't live here until we've decided to use it in a non-experimental way?

Good question.

So this task is not like, in-development or experimental in the sense of a WIP that's likely to change. Once this lands, I feel like it's a full, normal task. But it is "less than" a normal task because it's kind of a special case task to help Tim compare the new transformer's performance. But I guess all study tasks are kind of special case tasks like that.

But here's this task's current status in this PR:

is not run by default

can only be run by name (i.e. not tagged at all -- --task-filter=covid_symptom or --task-filter=gpu do not hit it)

is documented on the study's docs

Is your thinking that 1 and 2 are good (i.e. "lowlightling" this task) and 3 should also not mention the task to lowlight it even further?

(I'm open to that, just trying to clarify what you are thinking about when marking experimental -- or do you want like active barriers like a flag --enable-experimental or something)

i think in my head, i'm drawing the line at 'this is something we are using internally while doing study validation' versus 'this is something a partner site might be explicitly asked to run' - and I think this is mostly a 3 level comment.

Fair yeah - I dropped mention from the docs 👍

- Adds a new covid_symptom__nlp_results_term_exists task, which uses the "termexists" model for polarity checking cTAKES rather than the previous "negation" model. This task will largely be used to compare the performance of the two models. - Rename some docker compose targets, like the etl-support profile into the covid-symptom profile (the thinking is that we'll have study-specific sets of services that you might want on or off, depending on what you're doing) - Refactors some NLP code to use shared base classes. - Remove covid_symptom__nlp_results from the default set of tasks. Study-specific tasks should have to be requested.

mikix force-pushed the mikix/term-exists branch 3 times, most recently from 7625b54 to 61e9761 Compare September 29, 2023 12:24

mikix commented Sep 29, 2023

View reviewed changes

mikix force-pushed the mikix/term-exists branch 5 times, most recently from 44cdbb4 to dc25c5e Compare September 29, 2023 13:31

mikix commented Sep 29, 2023

View reviewed changes

docs/studies/covid-symptom.md Show resolved Hide resolved

mikix commented Sep 29, 2023

View reviewed changes

cumulus_etl/etl/cli.py Show resolved Hide resolved

mikix changed the title ~~WIP: feat: add new covid_symptom__nlp_results_term_exists task~~ feat: add new covid_symptom__nlp_results_term_exists task Sep 29, 2023

mikix marked this pull request as ready for review September 29, 2023 13:33

mikix force-pushed the mikix/term-exists branch from dc25c5e to c97b07f Compare September 29, 2023 14:06

dogversioning approved these changes Sep 29, 2023

View reviewed changes

mikix force-pushed the mikix/term-exists branch from c97b07f to 62aa293 Compare September 29, 2023 16:22

mikix merged commit de2bcb9 into main Sep 29, 2023
2 checks passed

mikix deleted the mikix/term-exists branch September 29, 2023 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add new covid_symptom__nlp_results_term_exists task #275

feat: add new covid_symptom__nlp_results_term_exists task #275

mikix commented Sep 26, 2023 •

edited

Loading

mikix Sep 29, 2023

dogversioning Sep 29, 2023

mikix Sep 29, 2023 •

edited

Loading

mikix Sep 29, 2023

mikix Sep 29, 2023

mikix Sep 29, 2023

dogversioning Sep 29, 2023

dogversioning Sep 29, 2023

mikix Sep 29, 2023

mikix Sep 29, 2023

dogversioning Sep 29, 2023

mikix Sep 29, 2023

		from cumulus_etl.etl.tasks.base import EtlTask, OutputTable


		class BaseNlpTask(EtlTask):

feat: add new covid_symptom__nlp_results_term_exists task #275

feat: add new covid_symptom__nlp_results_term_exists task #275

Conversation

mikix commented Sep 26, 2023 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix commented Sep 26, 2023 •

edited

Loading

mikix Sep 29, 2023 •

edited

Loading