Skip to content

Commit 7174588

Browse files
authored
Merge pull request #13107 from adrianeboyd/chore/update-develop-from-master-v3.8-1
Update develop from master for v3.8
2 parents 4ec41e9 + 92f1d0a commit 7174588

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+884
-810
lines changed

.github/workflows/tests.yml

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ jobs:
5858
fail-fast: true
5959
matrix:
6060
os: [ubuntu-latest, windows-latest, macos-latest]
61-
python_version: ["3.11", "3.12.0-rc.2"]
61+
python_version: ["3.12"]
6262
include:
6363
- os: windows-latest
6464
python_version: "3.7"
@@ -68,6 +68,8 @@ jobs:
6868
python_version: "3.9"
6969
- os: windows-latest
7070
python_version: "3.10"
71+
- os: macos-latest
72+
python_version: "3.11"
7173

7274
runs-on: ${{ matrix.os }}
7375

@@ -115,22 +117,22 @@ jobs:
115117
- name: Test import
116118
run: python -W error -c "import spacy"
117119

118-
# - name: "Test download CLI"
119-
# run: |
120-
# python -m spacy download ca_core_news_sm
121-
# python -m spacy download ca_core_news_md
122-
# python -c "import spacy; nlp=spacy.load('ca_core_news_sm'); doc=nlp('test')"
123-
# if: matrix.python_version == '3.9'
124-
#
125-
# - name: "Test download_url in info CLI"
126-
# run: |
127-
# python -W error -m spacy info ca_core_news_sm | grep -q download_url
128-
# if: matrix.python_version == '3.9'
129-
#
130-
# - name: "Test no warnings on load (#11713)"
131-
# run: |
132-
# python -W error -c "import ca_core_news_sm; nlp = ca_core_news_sm.load(); doc=nlp('test')"
133-
# if: matrix.python_version == '3.9'
120+
- name: "Test download CLI"
121+
run: |
122+
python -m spacy download ca_core_news_sm
123+
python -m spacy download ca_core_news_md
124+
python -c "import spacy; nlp=spacy.load('ca_core_news_sm'); doc=nlp('test')"
125+
if: matrix.python_version == '3.9'
126+
127+
- name: "Test download_url in info CLI"
128+
run: |
129+
python -W error -m spacy info ca_core_news_sm | grep -q download_url
130+
if: matrix.python_version == '3.9'
131+
132+
- name: "Test no warnings on load (#11713)"
133+
run: |
134+
python -W error -c "import ca_core_news_sm; nlp = ca_core_news_sm.load(); doc=nlp('test')"
135+
if: matrix.python_version == '3.9'
134136

135137
- name: "Test convert CLI"
136138
run: |
@@ -154,17 +156,17 @@ jobs:
154156
python -m spacy train ner.cfg --paths.train ner-token-per-line-conll2003.spacy --paths.dev ner-token-per-line-conll2003.spacy --training.max_steps 10 --gpu-id -1
155157
if: matrix.python_version == '3.9'
156158

157-
# - name: "Test assemble CLI"
158-
# run: |
159-
# python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_sm'}; config.to_disk('ner_source_sm.cfg')"
160-
# PYTHONWARNINGS="error,ignore::DeprecationWarning" python -m spacy assemble ner_source_sm.cfg output_dir
161-
# if: matrix.python_version == '3.9'
162-
#
163-
# - name: "Test assemble CLI vectors warning"
164-
# run: |
165-
# python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_md'}; config.to_disk('ner_source_md.cfg')"
166-
# python -m spacy assemble ner_source_md.cfg output_dir 2>&1 | grep -q W113
167-
# if: matrix.python_version == '3.9'
159+
- name: "Test assemble CLI"
160+
run: |
161+
python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_sm'}; config.to_disk('ner_source_sm.cfg')"
162+
PYTHONWARNINGS="error,ignore::DeprecationWarning" python -m spacy assemble ner_source_sm.cfg output_dir
163+
if: matrix.python_version == '3.9'
164+
165+
- name: "Test assemble CLI vectors warning"
166+
run: |
167+
python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_md'}; config.to_disk('ner_source_md.cfg')"
168+
python -m spacy assemble ner_source_md.cfg output_dir 2>&1 | grep -q W113
169+
if: matrix.python_version == '3.9'
168170

169171
- name: "Install test requirements"
170172
run: |

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MIT License (MIT)
22

3-
Copyright (C) 2016-2022 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
3+
Copyright (C) 2016-2023 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ model packaging, deployment and workflow management. spaCy is commercial
1616
open-source software, released under the
1717
[MIT license](https://github.com/explosion/spaCy/blob/master/LICENSE).
1818

19-
💫 **Version 3.6 out now!**
19+
💫 **Version 3.7 out now!**
2020
[Check out the release notes here.](https://github.com/explosion/spaCy/releases)
2121

2222
[![tests](https://github.com/explosion/spaCy/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/spaCy/actions/workflows/tests.yml)

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ wasabi>=0.9.1,<1.2.0
1010
srsly>=2.4.3,<3.0.0
1111
catalogue>=2.0.6,<2.1.0
1212
typer>=0.3.0,<0.10.0
13-
pathy>=0.10.0
1413
smart-open>=5.2.1,<7.0.0
1514
weasel>=0.1.0,<0.4.0
1615
# Third party dependencies

setup.cfg

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ install_requires =
5656
weasel>=0.1.0,<0.4.0
5757
# Third-party dependencies
5858
typer>=0.3.0,<0.10.0
59-
pathy>=0.10.0
6059
smart-open>=5.2.1,<7.0.0
6160
tqdm>=4.38.0,<5.0.0
6261
numpy>=1.15.0; python_version < "3.9"

spacy/__init__.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from . import pipeline # noqa: F401
1414
from . import util
1515
from .about import __version__ # noqa: F401
16+
from .cli.info import info # noqa: F401
1617
from .errors import Errors
1718
from .glossary import explain # noqa: F401
1819
from .language import Language
@@ -76,9 +77,3 @@ def blank(
7677
# We should accept both dot notation and nested dict here for consistency
7778
config = util.dot_to_dict(config)
7879
return LangClass.from_config(config, vocab=vocab, meta=meta)
79-
80-
81-
def info(*args, **kwargs):
82-
from .cli.info import info as cli_info
83-
84-
return cli_info(*args, **kwargs)

spacy/about.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# fmt: off
22
__title__ = "spacy"
3-
__version__ = "3.7.0"
3+
__version__ = "3.7.2"
44
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
55
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"

spacy/cli/__init__.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,17 @@
2222
from .package import package # noqa: F401
2323
from .pretrain import pretrain # noqa: F401
2424
from .profile import profile # noqa: F401
25-
from .train import train_cli # noqa: F401
26-
from .validate import validate # noqa: F401
25+
from .project.assets import project_assets # type: ignore[attr-defined] # noqa: F401
26+
from .project.clone import project_clone # type: ignore[attr-defined] # noqa: F401
27+
from .project.document import ( # type: ignore[attr-defined] # noqa: F401
28+
project_document,
29+
)
30+
from .project.dvc import project_update_dvc # type: ignore[attr-defined] # noqa: F401
31+
from .project.pull import project_pull # type: ignore[attr-defined] # noqa: F401
32+
from .project.push import project_push # type: ignore[attr-defined] # noqa: F401
33+
from .project.run import project_run # type: ignore[attr-defined] # noqa: F401
34+
from .train import train_cli # type: ignore[attr-defined] # noqa: F401
35+
from .validate import validate # type: ignore[attr-defined] # noqa: F401
2736

2837

2938
@app.command("link", no_args_is_help=True, deprecated=True, hidden=True)

spacy/cli/_util.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,6 @@
4141
run_command,
4242
)
4343

44-
if TYPE_CHECKING:
45-
from pathy import FluidPath # noqa: F401
46-
47-
4844
SDIST_SUFFIX = ".tar.gz"
4945
WHEEL_SUFFIX = "-py3-none-any.whl"
5046

spacy/cli/project/__init__.py

Whitespace-only changes.

spacy/cli/project/assets.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.assets import *

spacy/cli/project/clone.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.clone import *

spacy/cli/project/document.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.document import *

spacy/cli/project/dvc.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.dvc import *

spacy/cli/project/pull.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.pull import *

spacy/cli/project/push.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.push import *

spacy/cli/project/remote_storage.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.remote_storage import *

spacy/cli/project/run.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from weasel.cli.run import *

spacy/displacy/render.py

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,25 @@ def render_spans(
142142
spans (list): Individual entity spans and their start, end, label, kb_id and kb_url.
143143
title (str / None): Document title set in Doc.user_data['title'].
144144
"""
145-
per_token_info = []
145+
per_token_info = self._assemble_per_token_info(tokens, spans)
146+
markup = self._render_markup(per_token_info)
147+
markup = TPL_SPANS.format(content=markup, dir=self.direction)
148+
if title:
149+
markup = TPL_TITLE.format(title=title) + markup
150+
return markup
151+
152+
@staticmethod
153+
def _assemble_per_token_info(
154+
tokens: List[str], spans: List[Dict[str, Any]]
155+
) -> List[Dict[str, List[Dict[str, Any]]]]:
156+
"""Assembles token info used to generate markup in render_spans().
157+
tokens (List[str]): Tokens in text.
158+
spans (List[Dict[str, Any]]): Spans in text.
159+
RETURNS (List[Dict[str, List[Dict, str, Any]]]): Per token info needed to render HTML markup for given tokens
160+
and spans.
161+
"""
162+
per_token_info: List[Dict[str, List[Dict[str, Any]]]] = []
163+
146164
# we must sort so that we can correctly describe when spans need to "stack"
147165
# which is determined by their start token, then span length (longer spans on top),
148166
# then break any remaining ties with the span label
@@ -154,29 +172,35 @@ def render_spans(
154172
s["label"],
155173
),
156174
)
175+
157176
for s in spans:
158177
# this is the vertical 'slot' that the span will be rendered in
159178
# vertical_position = span_label_offset + (offset_step * (slot - 1))
160179
s["render_slot"] = 0
180+
161181
for idx, token in enumerate(tokens):
162182
# Identify if a token belongs to a Span (and which) and if it's a
163183
# start token of said Span. We'll use this for the final HTML render
164184
token_markup: Dict[str, Any] = {}
165185
token_markup["text"] = token
166-
concurrent_spans = 0
186+
intersecting_spans: List[Dict[str, Any]] = []
167187
entities = []
168188
for span in spans:
169189
ent = {}
170190
if span["start_token"] <= idx < span["end_token"]:
171-
concurrent_spans += 1
172191
span_start = idx == span["start_token"]
173192
ent["label"] = span["label"]
174193
ent["is_start"] = span_start
175194
if span_start:
176195
# When the span starts, we need to know how many other
177196
# spans are on the 'span stack' and will be rendered.
178197
# This value becomes the vertical render slot for this entire span
179-
span["render_slot"] = concurrent_spans
198+
span["render_slot"] = (
199+
intersecting_spans[-1]["render_slot"]
200+
if len(intersecting_spans)
201+
else 0
202+
) + 1
203+
intersecting_spans.append(span)
180204
ent["render_slot"] = span["render_slot"]
181205
kb_id = span.get("kb_id", "")
182206
kb_url = span.get("kb_url", "#")
@@ -193,11 +217,8 @@ def render_spans(
193217
span["render_slot"] = 0
194218
token_markup["entities"] = entities
195219
per_token_info.append(token_markup)
196-
markup = self._render_markup(per_token_info)
197-
markup = TPL_SPANS.format(content=markup, dir=self.direction)
198-
if title:
199-
markup = TPL_TITLE.format(title=title) + markup
200-
return markup
220+
221+
return per_token_info
201222

202223
def _render_markup(self, per_token_info: List[Dict[str, Any]]) -> str:
203224
"""Render the markup from per-token information"""

spacy/kb/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
11
from .candidate import Candidate, get_candidates, get_candidates_batch
22
from .kb import KnowledgeBase
33
from .kb_in_memory import InMemoryLookupKB
4+
5+
__all__ = [
6+
"Candidate",
7+
"KnowledgeBase",
8+
"InMemoryLookupKB",
9+
"get_candidates",
10+
"get_candidates_batch",
11+
]

spacy/matcher/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
from .matcher import Matcher
44
from .phrasematcher import PhraseMatcher
55

6-
__all__ = ["Matcher", "PhraseMatcher", "DependencyMatcher", "levenshtein"]
6+
__all__ = ["DependencyMatcher", "Matcher", "PhraseMatcher", "levenshtein"]

spacy/pipeline/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
__all__ = [
2323
"AttributeRuler",
2424
"DependencyParser",
25+
"EditTreeLemmatizer",
2526
"EntityLinker",
2627
"EntityRecognizer",
2728
"EntityRuler",

spacy/tests/doc/test_span.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,3 +731,12 @@ def test_for_no_ent_sents():
731731
sents = list(doc.ents[0].sents)
732732
assert len(sents) == 1
733733
assert str(sents[0]) == str(doc.ents[0].sent) == "ENTITY"
734+
735+
736+
def test_span_api_richcmp_other(en_tokenizer):
737+
doc1 = en_tokenizer("a b")
738+
doc2 = en_tokenizer("b c")
739+
assert not doc1[1:2] == doc1[1]
740+
assert not doc1[1:2] == doc2[0]
741+
assert not doc1[1:2] == doc2[0:1]
742+
assert not doc1[0:1] == doc2

spacy/tests/doc/test_token_api.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,3 +294,12 @@ def test_missing_head_dep(en_vocab):
294294
assert aligned_heads[0] == ref_heads[0]
295295
assert aligned_deps[5] == ref_deps[5]
296296
assert aligned_heads[5] == ref_heads[5]
297+
298+
299+
def test_token_api_richcmp_other(en_tokenizer):
300+
doc1 = en_tokenizer("a b")
301+
doc2 = en_tokenizer("b c")
302+
assert not doc1[1] == doc1[0:1]
303+
assert not doc1[1] == doc2[1:2]
304+
assert not doc1[1] == doc2[0]
305+
assert not doc1[0] == doc2

spacy/tests/test_cli.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212

1313
import spacy
1414
from spacy import about
15-
from spacy import info as spacy_info
1615
from spacy.cli import info
1716
from spacy.cli._util import parse_config_overrides, string_to_list, walk_directory
1817
from spacy.cli.apply import apply
@@ -193,9 +192,6 @@ def test_cli_info():
193192
raw_data = info(tmp_dir, exclude=[""])
194193
assert raw_data["lang"] == "nl"
195194
assert raw_data["components"] == ["textcat"]
196-
raw_data = spacy_info(tmp_dir, exclude=[""])
197-
assert raw_data["lang"] == "nl"
198-
assert raw_data["components"] == ["textcat"]
199195

200196

201197
def test_cli_converters_conllu_to_docs():
@@ -538,7 +534,6 @@ def test_string_to_list_intify(value):
538534
assert string_to_list(value, intify=True) == [1, 2, 3]
539535

540536

541-
@pytest.mark.skip(reason="Temporarily skip before 3.7 models are published")
542537
def test_download_compatibility():
543538
spec = SpecifierSet("==" + about.__version__)
544539
spec.prereleases = False
@@ -549,7 +544,6 @@ def test_download_compatibility():
549544
assert get_minor_version(about.__version__) == get_minor_version(version)
550545

551546

552-
@pytest.mark.skip(reason="Temporarily skip before 3.7 models are published")
553547
def test_validate_compatibility_table():
554548
spec = SpecifierSet("==" + about.__version__)
555549
spec.prereleases = False
@@ -1067,3 +1061,8 @@ def test_debug_data_trainable_lemmatizer_not_annotated():
10671061

10681062
data = _compile_gold(train_examples, ["trainable_lemmatizer"], nlp, True)
10691063
assert data["no_lemma_annotations"] == 2
1064+
1065+
1066+
def test_project_api_imports():
1067+
from spacy.cli import project_run
1068+
from spacy.cli.project.run import project_run # noqa: F401, F811

spacy/tests/test_displacy.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import pytest
33

44
from spacy import displacy
5-
from spacy.displacy.render import DependencyRenderer, EntityRenderer
5+
from spacy.displacy.render import DependencyRenderer, EntityRenderer, SpanRenderer
66
from spacy.lang.en import English
77
from spacy.lang.fa import Persian
88
from spacy.tokens import Doc, Span
@@ -468,3 +468,23 @@ def test_issue12816(en_vocab) -> None:
468468
# Verify that the HTML tag is still escaped
469469
html = displacy.render(doc, style="span")
470470
assert "&lt;TEST&gt;" in html
471+
472+
473+
@pytest.mark.issue(13056)
474+
def test_displacy_span_stacking():
475+
"""Test whether span stacking works properly for multiple overlapping spans."""
476+
spans = [
477+
{"start_token": 2, "end_token": 5, "label": "SkillNC"},
478+
{"start_token": 0, "end_token": 2, "label": "Skill"},
479+
{"start_token": 1, "end_token": 3, "label": "Skill"},
480+
]
481+
tokens = ["Welcome", "to", "the", "Bank", "of", "China", "."]
482+
per_token_info = SpanRenderer._assemble_per_token_info(spans=spans, tokens=tokens)
483+
484+
assert len(per_token_info) == len(tokens)
485+
assert all([len(per_token_info[i]["entities"]) == 1 for i in (0, 3, 4)])
486+
assert all([len(per_token_info[i]["entities"]) == 2 for i in (1, 2)])
487+
assert per_token_info[1]["entities"][0]["render_slot"] == 1
488+
assert per_token_info[1]["entities"][1]["render_slot"] == 2
489+
assert per_token_info[2]["entities"][0]["render_slot"] == 2
490+
assert per_token_info[2]["entities"][1]["render_slot"] == 3

0 commit comments

Comments
 (0)