Problem with user defined dictionary

I am making use of sudachipy via ginza, and am trying to annotate the following sentences.

```
プロ野球の中日で選手、監督を務め、１月４日に70歳で死去した星野仙一氏をしのび、３日、名古屋市東区のナゴヤドームで行われた中日―楽天のオープン戦は追悼試合として開催された。
明治大の後輩、島内宏明外野手は「改めてすごい人だったんだなと思った」と話した。
```

And in my dictionary I have the following lines, which match `明治` and `楽天` in the above.
There are no other lines in the dictionary that match any substrings in the sentence.

```
楽天,1288,1288,100,楽天_4755-2018,名詞,固有名詞,組織,上場会社,*,*,RAKUTEN,楽天,*,*,*,*,*
明治,1288,1288,100,明治_2261-2009,名詞,固有名詞,組織,上場会社,*,*,MEIJI,明治,*,*,*,*,*
```

When I try and run annotations with this configuration, i get the below error:

```
... 

  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/language.py", line 441, in __call__
    doc = self.make_doc(text)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 281, in make_doc
    return self.tokenizer(text)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 144, in __call__
    dtokens = self._get_dtokens(sudachipy_tokens)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 182, in _get_dtokens
    ) for idx, token in enumerate(sudachipy_tokens) if len(token.surface()) > 0
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 182, in <listcomp>
    ) for idx, token in enumerate(sudachipy_tokens) if len(token.surface()) > 0
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/morpheme.py", line 36, in part_of_speech
    return self.list.grammar.get_part_of_speech_string(wi.pos_id)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionarylib/grammar.py", line 55, in get_part_of_speech_string
    return self.pos_list[pos_id]
IndexError: list index out of range
```

Could someone advise me as to what is causing this error please?

I am quite certain the sentence with `明治` is causing the issue,as if i remove the second sentence, the annotation works fine. It therefore seems like `楽天` is being picked up by SudachiPy with the dictionary, but `明治` is not.

Why is this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Problem with user defined dictionary #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Problem with user defined dictionary #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions