Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chap 15, pg 513 : ModuleNotFoundError: No module named 'torchdata.datapipes' #199

Open
Emmanuel-Ibekwe opened this issue Dec 24, 2024 · 14 comments

Comments

@Emmanuel-Ibekwe
Copy link

Emmanuel-Ibekwe commented Dec 24, 2024

`
from torchtext.datasets import IMDB
train_dataset = IMDB(split='train')
test_dataset = IMDB(split='test')

`
I keep getting this error despite manually installing torchdata. When I tried installing the exact version of torchtext used in the chapter, version 0.10.0, pip couldn't recognize as a valid version.

I can't find any solution to it online

@kostuyn
Copy link

kostuyn commented Jan 4, 2025

@Emmanuel-Ibekwe I installed 0.17.0 version the package and it work (for colab)
!pip install portalocker --quiet
!pip install torchtext==0.17.0 --quiet

after installed - Runtime -> Restart runtime option in the Colab menu

(last version of torchtext has a problem pytorch/text#2272)

@rasbt
Copy link
Owner

rasbt commented Jan 4, 2025

@Emmanuel-Ibekwe It looks like you are right, and the PyTorch maintainers removed torchtext 0.10.0 from PyPi for some reason. The ch15 notebook here on GitHub should be updated to work with newer versions of torchtext though as @kostuyn mentioned. It would require installing portalocker as well as described above. Let us know in case this still doesn't work.

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 7, 2025

Thanks @rasbt and @kostuyn for the responses. I did find out through chatgpt (great tool) that the datasets package from the Huggingface community has the imdb dataset. So I used it.
Using the datasets package I got values for the various training and validation accuracies of different epochs that were different from the ones in the text. The model overfitted. At some point both accuracies maintained an accuracy score of 100%. But the model performed terribly on the test dataset. I got an accuracy of 68.5%.
Thanks one more time.

Edit: built a custom dataset for the imdb dataset from torch.utils.data to help in data loading.

@rasbt
Copy link
Owner

rasbt commented Jan 8, 2025

Thanks for the feedback. Yes, I think the dataset would nowadays be easier to get from the datasets library. The splits are different though, and I am surprised about the low test set accuracy. Both the training and validation accuracy were 100% though? This is an interesting case of overfitting where the validation accuracy seems almost too good to be true (and the test accuracy unexpectedly bad).

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 15, 2025

Yes sir, both the training and validation accuracies were 100% at some point during the training process and maintained it till the last epoch. Here's a link to the repo containing the code just in case you want to take a look at the code. https://github.com/Emmanuel-Ibekwe/Machine-learning-by-S.-Raschka-notebooks

I had commented out the training code and saved the model.

Pls sir, if you are still interested in the code, try manually copying it with your mouse because clicking on the link just leads to a non-existent issue.

Also, sorry for the lack of comments and headings in the code (I had not cared much about them since it was basically for learning purposes). The training code is found at the very bottom of the file with the related code that builds up to it preceding it.

@rasbt
Copy link
Owner

rasbt commented Jan 16, 2025

Thanks for sharing, but it seems the link doesn't work:

Image

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 17, 2025

Good day sir. I finally figured out why it keeps directing to a wrong address. Github seems to be embedding the wrong url in the link. I've fixed that.

It works now.
https://github.com/Emmanuel-Ibekwe/Machine-learning-by-S.-Raschka-notebooks

The training code is towards the bottom of the file. Sorry once again for the lack of comments and headings.

@rasbt
Copy link
Owner

rasbt commented Jan 30, 2025

It looks like you are correctly splitting the training set into train and validation subsets. Honestly, I can't see why the accuracies would be exactly the same (100%) in your case. Sorry!

@Emmanuel-Ibekwe
Copy link
Author

Good day sir. You got something different? You used the hugging face dataset? I've long moved past the chapter though. I got the same accuracies after many iterations.
Thanks for the responses so far.

@rasbt
Copy link
Owner

rasbt commented Feb 1, 2025

No worries, I was just looking at your code to see if there was anything suspicious that could explain why the training and validation accuracies would be both 100%. I couldn't find an issue (like an accidentally double-assigned variable or so). In any case, please don't worry about it and feel free to move to the next chapter(s) :).

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Feb 3, 2025

Ok sir. Thank you for the help so far. :)
By the way, I'm done with the chapters I needed (1 to part of 17) in the book.

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Feb 19, 2025

Good afternoon sir. How's it going? Sorry to disturb. Pls I would love if you would help me resolve this issue I'm getting from a series of code from the book, NLP with Transformers by Lewis Tunstall and co. I had tried creating an issue on the book's github on a previous yet similar issue but I got no response for over 3 weeks now. It's ok if you decline. I very much understand your busy schedule and moreover, it's not from your book.
This is the link to the Kaggle notebook: https://www.kaggle.com/code/immanuelibekwe/fork-of-nlp-chapter-4. (You only have to click on the edit button to get the actual notebook)

The trainer.train() of the hugging face Trainer api seems to be running indefinitely and the GPU or even the CPU is not utilized while it runs. I searched online and queried chatgpt, yet I got nothing.

I had run into the Trainer api in your book but because it took forever to run and you had not introduced online platform for using GPU's for free at the time, I left it.

@rasbt
Copy link
Owner

rasbt commented Feb 20, 2025

Sorry, unfortunately I currently wouldn't have the capacity to help with other books

@Emmanuel-Ibekwe
Copy link
Author

Ok sir. I do understand. Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants