Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify language in scraper so it doesn't look for the wrong thing #195

Closed
wants to merge 3 commits into from
Closed

Conversation

mrsrec
Copy link
Contributor

@mrsrec mrsrec commented Dec 6, 2023

This is a copy of #192 which was mistakenly closed by deleting the fork that the code was in.

Original text:

"Depending on a web server's configuration or location, it may get back a response with a language besides English, breaking the scraper. This fixes that.

This does not currently affect en.scratch-wiki.info because it doesn't put an Accept-Language header by default, and Scratch ignores IP addresses, but it will affect other web servers and may affect en.scratch-wiki.info as well should either of the two aforementioned things change."

Closes #191

@mrsrec
Copy link
Contributor Author

mrsrec commented Dec 6, 2023

Comment by @jacob-g "While I don't think this is likely to be a problem, I suppose it doesn't hurt to be safe. I'll validate this when I get the chance."

@mrsrec
Copy link
Contributor Author

mrsrec commented Dec 6, 2023

Closes #191

@@ -8,7 +8,9 @@ class ScratchUserCheck {

private static function fetchProfile($username, &$isScratcher, &$joinedAt, &$error) {
$url = sprintf(self::PROFILE_URL, $username);
$html = @file_get_contents($url);
$html = @file_get_contents($url, false, stream_context_create(
Copy link
Member

@jacob-g jacob-g Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Make sure to wrap the outer part of the array properly. I believe it should be:

stream_context_create([
  "http" => ["header" => "Accept-Language: en\r\nCookie: scratchlanguage=en"]
])

Also a more minor detail: single quotes are preferred over double quotes unless you are specifically intending to use variable interpolation or escape sequences.

@mrsrec
Copy link
Contributor Author

mrsrec commented Mar 12, 2024

@jacob-g Might be web server-specific. Added your changes

"http" => ["header" => "Accept-Language: en\r\nCookie: scratchlanguage=en"]
));
$html = @file_get_contents($url, false, stream_context_create([
'http' => ['header' => 'Accept-Language: en\r\nCookie: scratchlanguage=en']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I should have been clearer with the single quotes: it's right for http and header, but for the body you actually do want double quotes due to the use of \r and \n.

@jacob-g
Copy link
Member

jacob-g commented Mar 12, 2024

Other than the quote thing, looks good now. Fix that and we're good to go.

@mrsrec
Copy link
Contributor Author

mrsrec commented Mar 13, 2024

OK, now?

@mrsrec mrsrec closed this by deleting the head repository Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scraper doesn't specify language
2 participants