-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix search engine API response handling #22075
Conversation
|
IMO, the problem is that |
I agree with the latter option. |
If we are willing to change the function signature then maybe it would be simpler to add a boolean in arguments that defaults to false, and which determines whether we want to manually escape def retrieve_url(url: str, custom_headers: Mapping[str, Any] = {}, request_data: Optional[Any] = None, should_escape_quotes=False) -> str:
# ...
if should_escape_quotes:
dataStr = dataStr.replace('"', '\\"')
dataStr = htmlentitydecode(dataStr)
return dataStr Thus allowing plugins to control this behavior. I don't think applying |
For me, the need to decode HTML entities is a special case (although it may be more common than others), and in the general case it can retrieve data in arbitrary format (not only HTML), which should keep HTML entities as-is.
I would choose first option for v5.0.x and v5.1.x, and second one for v5.2.x and above. |
I agree it should be a togglable option, however I don't think it should be all-or-nothing. Almost any request to apibay.org yields JSON results containing HTML entities (at least Another, maybe better option would be to parse these entities later in the process, when using the JSON data in the UI. That would make more sense since parsing entities is just a matter of making data human-readable. |
Just a side note. Let's limit the change only for >= 5.1.x. Backporting doesn't seem like a good idea. |
Why, considering that existing plugins are not supposed to be affected? |
The fastest way to fix #22074 is for the plugin to fetch the web data by itself (duplicate/copy the code) and not rely on qbt helpers. Either backport to v5.0 or releasing v5.1 will still take a lot of time to reach users. |
@biskweet |
Created a new pull request here on the qbittorrent/search-plugins repo. |
## Following - [this issue](qbittorrent/qBittorrent#22074) on the main qBittorrent repository - and the discussion on [this subsequent pull request](qbittorrent/qBittorrent#22075) Here is the fix for the piratebay search engine. A gist of the code is available [here](https://gist.github.com/biskweet/f06ff7b260ef1ce3a31d27ac1a9edcbf) for testing. ## Recalling the problem: API apibay.org returns weird JSON that causes the piratebay search engine to crash when handling its response. If some search results contain `"` (quotation marks) characters, the server escapes them by replacing `"` with `"` HTML entities in order to still provide a syntactically valid JSON response. While this is not incorrect, it would be best if apibay.org returned properly escaped quotes, i.e. using backslashes. When handling the response data, functions [`retrieve_url`](https://github.com/LightDestory/qBittorrent-Search-Plugins/blob/master/src/helpers.py#L75-L117) and [`htmlentitydecode`](https://github.com/LightDestory/qBittorrent-Search-Plugins/blob/master/src/helpers.py#L75-L117) blindly unescape all entities thereby corrupting previously valid JSON. As a consequence, `json.loads` crashes. For example: ```json { "title": "Ubuntu 22.04.5 LTS ("Jammy Jellyfish")" } ``` becomes ```json { "title": "Ubuntu 22.04.5 LTS ("Jammy Jellyfish")" } ``` ## Solution proposed We no longer use the `retrieve_url` function -- instead, I created a dedicated `retrieve_url` function (which is almost a copy-paste of the original) that fixes the problem by manually escaping quotes *before* escaping the rest of the data. PR #331.
Added manual escaping of HTML entities before automatic replacement.
Now,
"Ubuntu 22.04.5 LTS ("Jammy Jellyfish")"
becomes"Ubuntu 22.04.5 LTS (\"Jammy Jellyfish\")"
instead of"Ubuntu 22.04.5 LTS ("Jammy Jellyfish")"
.Closes #22074.