You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently a sentence consisting of a single space is translated as "I know" in cs->en and as "Ano" in en->cs (you need to enter another non-empty sentence/word on the previous line).
I admit I should fix the NMT model (its training data), but meanwhile the web interface should never query the backend model with whitespace-only sentence, I think.
I think the input sentences should be normalized for (all Unicode) whitespaces:
s/^\s+//;
s/\s+$//;
s/\s+/ /g;
The current MT models will merge multiple spaces into a single space anyway. I am not sure about other whitespace than space, but I doubt we have enough vertical tabs, thin/non-breakable spaces etc. in the training data, so it seems wiser to normalize such whitespace characters to a single space.
The text was updated successfully, but these errors were encountered:
Currently a sentence consisting of a single space is translated as "I know" in cs->en and as "Ano" in en->cs (you need to enter another non-empty sentence/word on the previous line).
I admit I should fix the NMT model (its training data), but meanwhile the web interface should never query the backend model with whitespace-only sentence, I think.
I think the input sentences should be normalized for (all Unicode) whitespaces:
The current MT models will merge multiple spaces into a single space anyway. I am not sure about other whitespace than space, but I doubt we have enough vertical tabs, thin/non-breakable spaces etc. in the training data, so it seems wiser to normalize such whitespace characters to a single space.
The text was updated successfully, but these errors were encountered: