We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
看到在TokenTruncation.process()中构造input_ids时,拼完a和b之后,在句尾添加了两个。
请问: 1.为什么需要两个呢,一个会怎么样? 2.如果我在句子a中需要一个特殊token来分隔一下a中的上下两句,请问选哪个好一些呢?我看ChatGLM tokenizer的特殊token只有<eop> <pad> <sop> <unk>和[MASK]
感谢🙏
The text was updated successfully, but these errors were encountered:
一个两个都可以,只是加强下结束符。
Sorry, something went wrong.
一个
谢谢大佬,那请问第二个问题呢?不用换行符的话,更好一点吗?
No branches or pull requests
看到在TokenTruncation.process()中构造input_ids时,拼完a和b之后,在句尾添加了两个。
请问:
1.为什么需要两个呢,一个会怎么样?
2.如果我在句子a中需要一个特殊token来分隔一下a中的上下两句,请问选哪个好一些呢?我看ChatGLM tokenizer的特殊token只有<eop> <pad> <sop> <unk>和[MASK]
感谢🙏
The text was updated successfully, but these errors were encountered: