Skip to content

<unk> and <eps> #3

@chenguoguo

Description

@chenguoguo

Hey gusy, I finally got some spare time to look into this now. Thanks a lot for putting this together!

I'm looking at the symbol tables fro words and characters. I noticed that 0 was reserved for in words.txt, but was used for in characters.txt. As a results, in the resulting SG.fst graph, on the output side you have separate and symbols, while on the input side, you have a mixed and symbol. This is because OpenFST treat 0 as epsilon in all algorithms by default.

Shall we reserve 0 for as long as OpenFST is involved? This requires changes to both Athena and Athena-decoder. Correct me if I'm wrong though. @tjadamlee @godjealous

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions