Skip to content

Commit

Permalink
Merge pull request #5 from jenojp/develop
Browse files Browse the repository at this point in the history
Updated for issue #4, allow users to specify own negation dictionaries.
  • Loading branch information
jenojp authored Aug 18, 2019
2 parents fed33e3 + 62af13b commit 2856b8e
Show file tree
Hide file tree
Showing 11 changed files with 233 additions and 59 deletions.
3 changes: 2 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
:tada: Thanks for your interest in this project :tada:

* Please submit an issue request for any bugs, feature requests, or questions.
* Feel free to fork the repo and submit a pull request.
* Feel free to fork the repo and submit a pull request.
* Please use [Black](https://github.com/ambv/black) to format code before submitting.
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# negspacy: negation for spaCy

[![Build Status](https://travis-ci.org/jenojp/negspacy.svg?branch=master)](https://travis-ci.org/jenojp/negspacy) [![Built with spaCy](https://img.shields.io/badge/made%20with%20❤%20and-spaCy-09a3d5.svg)](https://spacy.io) [![pypi Version](https://img.shields.io/pypi/v/negspacy.svg?style=flat-square)](https://pypi.org/project/negspacy/)
[![Build Status](https://travis-ci.org/jenojp/negspacy.svg?branch=master)](https://travis-ci.org/jenojp/negspacy) [![Built with spaCy](https://img.shields.io/badge/made%20with%20❤%20and-spaCy-09a3d5.svg)](https://spacy.io) [![pypi Version](https://img.shields.io/pypi/v/negspacy.svg?style=flat-square)](https://pypi.org/project/negspacy/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)

spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.

Expand Down Expand Up @@ -41,6 +41,28 @@ Steve Jobs True
Apple False
```

Consider pairing with [scispacy](https://allenai.github.io/scispacy/) to find UMLS concepts in text and process negations.

## NegEx Patterns

* **psuedo_negations** - phrases that are false triggers, ambiguous negations, or double negatives
* **preceeding_negations** - negation phrases that preceed an entity
* **following_negations** - negation phrases that follow an entity
* **termination** - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

### Use own patterns or view patterns in use

Use own patterns
```python
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, termination=["but", "however", "nevertheless", "except"])
```

View patterns in use
```python
patterns_dict = negex.get_patterns
```

## Contributing
[contributing](https://github.com/jenojp/negspacy/blob/master/CONTRIBUTING.md)

Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/negspacy.doctree
Binary file not shown.
11 changes: 10 additions & 1 deletion docs/build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,20 @@ <h3>Navigation</h3>
<h1 id="index">Index</h1>

<div class="genindex-jumpbox">
<a href="#N"><strong>N</strong></a>
<a href="#G"><strong>G</strong></a>
| <a href="#N"><strong>N</strong></a>
| <a href="#P"><strong>P</strong></a>
| <a href="#T"><strong>T</strong></a>

</div>
<h2 id="G">G</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="negspacy.html#negspacy.negation.Negex.get_patterns">get_patterns() (negspacy.negation.Negex method)</a>
</li>
</ul></td>
</tr></table>

<h2 id="N">N</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
Expand Down
20 changes: 19 additions & 1 deletion docs/build/html/negspacy.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ <h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this
<span id="negspacy-negation-module"></span><h2>negspacy.negation module<a class="headerlink" href="#module-negspacy.negation" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="negspacy.negation.Negex">
<em class="property">class </em><code class="sig-prename descclassname">negspacy.negation.</code><code class="sig-name descname">Negex</code><span class="sig-paren">(</span><em class="sig-param">nlp</em>, <em class="sig-param">ent_types=[]</em><span class="sig-paren">)</span><a class="headerlink" href="#negspacy.negation.Negex" title="Permalink to this definition"></a></dt>
<em class="property">class </em><code class="sig-prename descclassname">negspacy.negation.</code><code class="sig-name descname">Negex</code><span class="sig-paren">(</span><em class="sig-param">nlp</em>, <em class="sig-param">ent_types=[]</em>, <em class="sig-param">psuedo_negations=[]</em>, <em class="sig-param">preceeding_negations=[]</em>, <em class="sig-param">following_negations=[]</em>, <em class="sig-param">termination=[]</em><span class="sig-paren">)</span><a class="headerlink" href="#negspacy.negation.Negex" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<blockquote>
<div><p>A spaCy pipeline component which identifies negated tokens in text.</p>
Expand All @@ -54,9 +54,27 @@ <h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this
<dd class="field-odd"><ul class="simple">
<li><p><strong>nlp</strong> (<em>object</em>) – spaCy language object</p></li>
<li><p><strong>ent_types</strong> (<em>list</em>) – list of entity types to negate</p></li>
<li><p><strong>psuedo_negations</strong> (<em>list</em>) – list of phrases that cancel out a negation, if empty, defaults are used</p></li>
<li><p><strong>preceeding_negations</strong> (<em>list</em>) – negations that appear before an entity, if empty, defaults are used</p></li>
<li><p><strong>following_negations</strong> (<em>list</em>) – negations that appear after an entity, if empty, defaults are used</p></li>
<li><p><strong>termination</strong> (<em>list</em>) – phrases that “terminate” a sentence for processing purposes such as “but”. If empty, defaults are used</p></li>
</ul>
</dd>
</dl>
<dl class="method">
<dt id="negspacy.negation.Negex.get_patterns">
<code class="sig-name descname">get_patterns</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#negspacy.negation.Negex.get_patterns" title="Permalink to this definition"></a></dt>
<dd><p>returns phrase patterns used for various negation dictionaries</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dd class="field-odd"><p><strong>patterns</strong> – pattern_type: [patterns]</p>
</dd>
<dt class="field-even">Return type</dt>
<dd class="field-even"><p>dict</p>
</dd>
</dl>
</dd></dl>

<dl class="method">
<dt id="negspacy.negation.Negex.negex">
<code class="sig-name descname">negex</code><span class="sig-paren">(</span><em class="sig-param">doc</em><span class="sig-paren">)</span><a class="headerlink" href="#negspacy.negation.Negex.negex" title="Permalink to this definition"></a></dt>
Expand Down
Binary file modified docs/build/html/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

191 changes: 139 additions & 52 deletions negspacy/negation.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,69 +16,151 @@ class Negex:
spaCy language object
ent_types: list
list of entity types to negate
psuedo_negations: list
list of phrases that cancel out a negation, if empty, defaults are used
preceeding_negations: list
negations that appear before an entity, if empty, defaults are used
following_negations: list
negations that appear after an entity, if empty, defaults are used
termination: list
phrases that "terminate" a sentence for processing purposes such as "but". If empty, defaults are used
"""

def __init__(self, nlp, ent_types=[]):
def __init__(
self,
nlp,
ent_types=list(),
psuedo_negations=list(),
preceeding_negations=list(),
following_negations=list(),
termination=list(),
):
if not Span.has_extension("negex"):
Span.set_extension("negex", default=False, force=True)
psuedo_negations = [
"gram negative",
"no further",
"not able to be",
"not certain if",
"not certain whether",
"not necessarily",
"not rule out",
"not ruled out",
"not been ruled out",
"without any further",
"without difficulty",
"without further",
]
preceeding_negations = [
"absence of",
"declined",
"denied",
"denies",
"denying",
"did not exhibit",
"no sign of",
"no signs of",
"not",
"not demonstrate",
"patient was not",
"rules out",
"doubt",
"negative for",
"no",
"no cause of",
"no complaints of",
"no evidence of",
"versus",
"without",
"without indication of",
"without sign of",
"without signs of",
"ruled out",
]
following_negations = ["declined", "unlikely"]
termination = ["but", "however"]
if not psuedo_negations:
psuedo_negations = [
"gram negative",
"no further",
"not able to be",
"not certain if",
"not certain whether",
"not necessarily",
"not rule out",
"not ruled out",
"not been ruled out",
"without any further",
"without difficulty",
"without further",
]
if not preceeding_negations:
preceeding_negations = [
"absence of",
"declined",
"denied",
"denies",
"denying",
"did not exhibit",
"no sign of",
"no signs of",
"not",
"not demonstrate",
"patient was not",
"rules out",
"doubt",
"negative for",
"no",
"no cause of",
"no complaints of",
"no evidence of",
"versus",
"without",
"without indication of",
"without sign of",
"without signs of",
"ruled out",
]
if not following_negations:
following_negations = [
"declined",
"unlikely",
"was ruled out",
"were ruled out",
"was not",
"were not",
]
if not termination:
termination = [
"although",
"apart from",
"as there are",
"aside from",
"but",
"cause for",
"cause of",
"causes for",
"causes of",
"etiology for",
"etiology of",
"except",
"however",
"involving",
"nevertheless",
"origin for",
"origin of",
"origins for",
"origins of",
"other possibilities of",
"reason for",
"reason of",
"reasons for",
"reasons of",
"secondary to",
"source for",
"source of",
"sources for",
"sources of",
"still",
"though",
"trigger event for",
"which",
"yet",
]

# efficiently build spaCy matcher patterns
psuedo_patterns = list(nlp.tokenizer.pipe(psuedo_negations))
preceeding_patterns = list(nlp.tokenizer.pipe(preceeding_negations))
following_patterns = list(nlp.tokenizer.pipe(following_negations))
termination_patterns = list(nlp.tokenizer.pipe(termination))
self.psuedo_patterns = list(nlp.tokenizer.pipe(psuedo_negations))
self.preceeding_patterns = list(nlp.tokenizer.pipe(preceeding_negations))
self.following_patterns = list(nlp.tokenizer.pipe(following_negations))
self.termination_patterns = list(nlp.tokenizer.pipe(termination))

self.matcher = PhraseMatcher(nlp.vocab, attr="LOWER")
self.matcher.add("Psuedo", None, *psuedo_patterns)
self.matcher.add("Preceeding", None, *preceeding_patterns)
self.matcher.add("Following", None, *following_patterns)
self.matcher.add("Termination", None, *termination_patterns)
self.matcher.add("Psuedo", None, *self.psuedo_patterns)
self.matcher.add("Preceeding", None, *self.preceeding_patterns)
self.matcher.add("Following", None, *self.following_patterns)
self.matcher.add("Termination", None, *self.termination_patterns)
self.keys = [k for k in self.matcher._docs.keys()]
self.ent_types = ent_types

def get_patterns(self):
"""
returns phrase patterns used for various negation dictionaries
Returns
-------
patterns: dict
pattern_type: [patterns]
"""
patterns = {
"psuedo_patterns": self.psuedo_patterns,
"preceeding_patterns": self.preceeding_patterns,
"following_patterns": self.following_patterns,
"termination_patterns": self.termination_patterns,
}
for pattern in patterns:
logging.info(pattern)
return patterns

def process_negations(self, doc):
"""
Find negations in doc and clean candidate negations to remove pseudo negations
Expand All @@ -98,7 +180,12 @@ def process_negations(self, doc):
list of tuples of terminating phrases
"""

if not doc.is_nered:
raise ValueError(
"Negations are evaluated for Named Entities found in text. "
"Your SpaCy pipeline does not included Named Entity resolution. "
"Please ensure it is enabled or choose a different language model that includes it."
)
preceeding = list()
following = list()
terminating = list()
Expand Down
39 changes: 38 additions & 1 deletion negspacy/test.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import pytest
import spacy
from negation import Negex

Expand Down Expand Up @@ -30,13 +31,15 @@ def build_med_docs():
docs = list()
docs.append(
(
"Patient denies cardiovascular disease but has headaches. No history of smoking.",
"Patient denies cardiovascular disease but has headaches. No history of smoking. Alcoholism unlikely. Smoking not ruled out.",
[
("Patient", False),
("denies", False),
("cardiovascular disease", True),
("headaches", False),
("smoking", True),
("Alcoholism", True),
("Smoking", False),
],
)
)
Expand All @@ -53,6 +56,13 @@ def build_med_docs():
],
)
)

docs.append(
(
"Alcoholism was not the cause of liver disease.",
[("Alcoholism", True), ("liver disease", False)],
)
)
return docs


Expand All @@ -78,6 +88,33 @@ def test_umls():
assert (e.text, e._.negex) == d[1][i]


def test_no_ner():
nlp = spacy.load("en_core_web_sm", disable=["ner"])
negex = Negex(nlp)
nlp.add_pipe(negex, last=True)
with pytest.raises(ValueError):
doc = nlp("this doc has not been NERed")


def test_own_terminology():
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, termination=["whatever"])
nlp.add_pipe(negex, last=True)
doc = nlp("He does not like Steve Jobs whatever he says about Barack Obama.")
assert doc.ents[1]._.negex == False


def test_get_patterns():
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp)
patterns = negex.get_patterns()
assert type(patterns) == dict
assert len(patterns) == 4


if __name__ == "__main__":
test()
test_umls()
test_bad_beharor()
test_own_terminology()
test_get_patterns()
Loading

0 comments on commit 2856b8e

Please sign in to comment.