Skip to content

Failed to load default rush_rules.tsv on Windows 10 traditional chinese version #2

@ivantyj

Description

@ivantyj

Program throws by calling medspacy.load() with default config.

(test_cxr2) λ python
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import medspacy
>>> nlp = medspacy.load()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\medspacy\util.py", line 100, in load
    nlp.add_pipe("medspacy_pyrush", config={"rules_path": pyrush_path})
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\spacy\language.py", line 801, in add_pipe
    pipe_component = self.create_pipe(
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\spacy\language.py", line 680, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\confection\__init__.py", line 728, in resolve
    resolved, _ = cls._make(
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\confection\__init__.py", line 777, in _make
    filled, _, resolved = cls._fill(
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\confection\__init__.py", line 849, in _fill
    getter_result = getter(*args, **kwargs)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyRuSH\PyRuSHSentencizer.py", line 45, in __init__
    self.rush = RuSH(rules=rules_path, max_repeat=max_repeat, auto_fix_gaps=auto_fix_gaps)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyRuSH\RuSH.py", line 84, in __init__
    self.fastner = FastCNER(rules, max_repeat)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\FastCNER.py", line 84, in __init__
    self.initiate(rules)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\FastCNER.py", line 96, in initiate
    io_utils = IOUtils(rule_str)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\IOUtils.py", line 30, in __init__
    self.read(rules, '\t')
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\IOUtils.py", line 47, in read
    self.parse(csvfile, delimiter)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\IOUtils.py", line 55, in parse
    self.parse_iterator(spamreader)
  File "C:\Users\ivantsai\.virtualenvs\test_cxr2\lib\site-packages\PyFastNER\IOUtils.py", line 60, in parse_iterator
    for row in iterator:
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 4290: illegal multibyte sequence

It appears that defualt codec cp950 cannot load default rush_rules.tsv from PyRuSH. Got workaround by manually remove special characters and replace default rush_rules.tsv. Still hope devs could helps to fix that.

Modified rush_rules.tsv is attached for someone like me. Replace the one located in your site-packages/resources with the file in rush_rules.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions