-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning and cannot download any submission files #11
Comments
Same here. It would be better if the authors gave us a link to directly download the dataset (ex: from Google Drive) instead of making us scrape the dataset ourselves. |
Hi! Sorry for the wait, it's true that it's far from convenient the way to download the data, we are looking for alternatives that also comply with Reddit terms of service. Let me look into the warning and why it's not downloading images and get back to you! |
Hi! Finally got some time to get into this, apparently Pushshift is migrating to a new infrastructure and all data before Novemeber is not yet available (see official thread). They are also updating the API, so there were some issues in pmaw. I've already updated pmaw, but we will have to wait until Pushshift is completely online again. In the meantime, you can uncomment this line to scrape the latest post in the subreddit (although I still have to check whether the preprocessing scripts will need to be updated) or contact me by email. I'll post updates in this issue :) |
Has anyone successfully downloaded this data? Is there a backup of this data available? |
When I'm trying to download dataset,
I got warning
/opt/conda/lib/python3.7/site-packages/pmaw/Request.py:263: UserWarning: 2000 items were not found in Pushshift
warnings.warn(f"{self.limit} items were not found in Pushshift")
and cannot download any images :(
Downloading comments of 0 submission files
Getting images for:
[]
How can I solve this problem?
The text was updated successfully, but these errors were encountered: