-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NewGrounds embeds not captured #81
Comments
Can you e-mail me a instance of this? I hadn't seen that particular gallery structure. I'm not sure what to do about general embedded stuff (this is an issue with patreon too). Scraping things like youtube and imgur is a substantially complicated task, and it's a whole lot of work I'd like to avoid. I've throught about trying to do something like use jdownloader as an external tool for this sort of thing. Right now, I just ignore external links. |
I emailed you an example.
Are they included in the database? It would be nice to be able to comb the database for a certain type of link I know a CLI tool or JDownloader can handle. |
I mean, I try to save the contents of any text description, so.... maybe? A lot of it is hard because it's basically done with freeform text input. It'd be a pretty easy bit of SQL to dump every description from a specific user to a csv file for further poking, if you want.
|
CSV files are definitely workable, I just have to do regex matches, really. I assume there's no easy way to get a separate csv file for each individual artist without doing some Python scripting? |
The above query is for a single artist? You could either do python stuff, or use a bash script to dump to multiple files. The actual query can be one line, and you can pass a query and database to Possibly relevant: https://stackoverflow.com/questions/43295406/how-to-copy-to-multiple-csv-files-in-postgresql |
Yes I know, but doing hundreds of artists by hand isn't exactly ideal.
That was the idea. I'll probably automate it somehow, but a lot less competently.
Aha, perfect, a for-loop. |
Whoops, didn't mean to close the entire issue. Also, now:
|
Oh nice thanks. I had a syntax error with the script you pasted above that I was going to ask you about, but this is a lot better. And since it's JSON, I can more easily iterate over this with jq. I did manage to learn how to properly get a shell script for git-bash from Git for Windows, though. Turns out I'm probably still better off using PowerShell, but it could come in handy later. |
Just now noticed, but only the initial pic in a 'series' post is captured, artists often will include extra pictures in the submission's "description". (Unfortunately these pictures are of lower resolution than the initial, so often artists ALSO put HQ links to them pointing to imgur.com or files.catbox.moe ...)
The text was updated successfully, but these errors were encountered: