Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The size of tensor a (538) must match the size of tensor b (512) at non-singleton dimension 1 #31

Open
j4ffle opened this issue Jun 8, 2022 · 0 comments

Comments

@j4ffle
Copy link

j4ffle commented Jun 8, 2022

I'm parsing conference calls and run into this error a couple of times. I used NLTK to parse the text components into sentences and then pass those sentences into the classifier following your example. It largely works, but I ran into this issue. From what I read, it arises because there are too many tokens (words) in the sentence. I manually inspect where I think the issue is occurring to identify a piece that is extra long. It occurs when there is a lot of semi-colons. So I could break up sentences with semi-colons, but that doesn't seem quite right. Using word_tokenize from nltk, there are only 488 tokens. How do you tokenize the words? I'm thinking I will truncate the sentence before passing to the model, but to do so accurately, I need to know how many tokens are created by the model.

Is my assessment of why this is happening correct and do you have a better solution than truncating? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant