-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a test for gallery RSS content #3676
Conversation
af46402
to
0d3819a
Compare
Update on the test's failure to parse the enclosures in the generated RSS: This seems to be caused by a problem in the RSS parser I chose, "rss_parser". I do not understand how its parsing of enclosures can ever work. I filed yet another verbose issue there, but I think perhaps the situation there is nontrivial, so I don't have much hope of seeing that resolved soon. I think maybe I made a mistake by selecting someone's hobby RSS parser, rather than something that's actually currently usable. (I searched for "python rss parser", it was late and I clicked the top result.) Current plan is to switch to another RSS parser, maybe "feedparser". |
Update on the CI failure in Python 3.11 on Windows latest. The image timestamp is off by one hour, and GMT clocks sprang forward an hour on Sunday night. Is it a timezone/daylight savings bug? |
I fixed the failure of the new test to parse RSS enclosures by switching RSS parsing library, from parse-rss to feedparser. Seems to work now. The only wrinkle is that it doesn't seem to automatically return the gallery 'description' field (which is empty in our data, but i'd like to assert about it so that maybe later we could fill it with something, like the gallery description text.) |
If the issue is with dates and times only, it might be worth it to ignore it somehow (e.g. by removing the date/time field altogether). |
I have a few outstanding questions, marked 'TODO', to resolve in code review.
The publish date on the webp image is now different, since we rebased onto a change in the preceding branch, which correctly populates this EXIF field on the new webp file. This affects the order if the items in the RSS feed, and the title of the RSS feed. (possibly that last part is a bug? But it is not today's problem.)
The use of zip here was faulty. Zip stops iterating at the length of the shortest given iterable. So if the actual RSS feed contained initial correct items, but then ended early (including the degenerate case of being totally empty), then the zip would iterate over the number of items in 'parsed.feed', comparing them to the initial items of 'expected_items', and then would silently stop iterating and the test would erroneously pass. Using zip_longest guards against that. Adding a default value to `to_dict()`'s use of getattr, and a message to the assertion at the end of test_gallery_rss produces much nicer error messages on failure.
The previous library, rss_parser, doesn't successfully parse enclosures: dhvcc/rss-parser#37 I switched to feedparser, changed the test to use their format of returned parsed data structure. I still need to tell it to grab the (custom?) field of the gallery description, but everything else seems to work.
The new test now makes assertions about the content of the enclosure (a data structure about linked items, ie the images in this case) in each RSS feed item. This will enable us, in a later commit, to test the fallback to a different MIME type when a file isn't recognized.
The new gallery RSS test can get access to the 'description' field in the feed XML. It is a field labelled 'subtitle' in the parsed feed data. Delete the TODO related to this item. Delete some debug printing. Add a TODO wondering whether the gallery description should be blank, or should be populated with the content from the gallery index.txt.
Use new variable BUILDTIME to explicitly show that the RSS output feed.updated field derives its value from the time the `nikola build` was done.
107a057
to
2f43d3c
Compare
(eg. on Windows, comparison is case-insensitive, and converts slashes into backslashes)
I pushed some test changes which give better error messages. Turns out, the test failure due to an off-by-one-hour timestamp isn't a daylight savings problem at all. It is caused by the gallery's |
A pure guess: Might it be nikola/plugins/task/galleries.py:547, in get_excluded_images(): excluded_image_list = ["{0}/{1}".format(gallery_path, i) for i in excluded_image_name_list] The |
The newly-added test fails because the gallery image tesla2_lg.jpg should be excluded from the gallery - it is mentioned in exclude.meta. But on Windows, it does not get excluded, so the gallery RSS feed includes one erroneous extra item.
OK, so, this is interesting. The 'tentative fix' commit, above, did fix the previous test failure, ie: The test failure was that the 'id' entry of each item in the RSS feed was unexpected. On linux/mac, it is:
but on Windows, it was:
i.e. with single backslashes (the above line shows them escaped, as the failing test output does). The 'tentative fix' commit is for galleries.py to generate the same 'id' on Windows as it does elsewhere, i.e. use os.path.join instead of Also, now with that fix in place, there is a new, similar failure:
The same chain of reasoning applies: I'm pretty certain the windows version is not a valid URL, but it works because browsers will helpfully massage the backslashes into forward slashes and resolve the resulting URL. So probably the ultimately right thing to do is fix the URL generation on Windows? But in the absence of strong advice from the project owners, this PR should probably be conservative, preserve current behavior, and tweak the test to demonstrate that (with a "TODO" comment in the test that I think this might be a bug.) Thoughts welcome. More as it happens. Thanks people! |
Prepare test to diagnostically output the actual results on Windows, so that subsequent commits can modify the test to expect the actual current behavior.
The URL generation on Windows is probably buggy and should be fixed, even if that would mean some minor regressions in places depending on the buggy behavior (e.g. RSS feed GUIDs). We usually don’t pay much attention to pull requests being small or very self-contained. |
Due to a probable bug, images in the gallery exclude.meta are not excluded on Windows. The test now expects this, to demonstrate the current behavior.
...instead of dict literals, for consistency with the other larger dicts in the same test.
9dd464c
to
10d1fe5
Compare
Windows RSS item ids are now of same form as on other OSes, ie: 'galleries/demo/tesla_tower1_lg.jpg' Formerly these used backslashes on Windows, which caused a bug where excluded images (listed in exclude.meta), were not excluded because their id didn't match. This means I've removed a big ugly TODO from the test, which no longer expects these excluded items to appear in the RSS feed on Windows. This is a re-application of the 'tentative fix' that was applied and reverted earlier in this PR.
7899cad
to
ad03124
Compare
20aa60a
to
66da315
Compare
Thank you for the advice! This is now fixed, and I believe this PR is complete and AFAIK can be merged. |
nikola/nikola.py
Outdated
@@ -1926,7 +1927,8 @@ def path(self, kind, name, lang=None, is_link=False, **kwargs): | |||
else: | |||
return link | |||
else: | |||
return os.path.join(*path) | |||
# URLs should always use forward slash separators, even on Windows | |||
return pathlib.PurePosixPath(*path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return a string, not a pathlib.Path instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahar, fair point, thank you. I was curious about why this wasn't causing problems in the tests, so I had a quick look around and it seems like almost every call to this function is using the return value as a component passed to os.path.join(), which accepts pathlib.Paths. So that explains why the problem isn't visibly manifesting. But yes, you are right, future callers to this (and maybe the existing one in Nikola.link()) might barf on pathlib.Paths. So I'll convert it to str, and swallow my idealism over this fix not requiring a test change.
Pull Request Checklist
Description
I added a new test, which asserts about the content of the samplesite gallery RSS feed.
This PR advances issue #3671, by completing the checklist item "Add a test of gallery RSS content".
I believe this PR is complete and mergeable.
excluded_image_list
, causing theexclude.meta
file not to work properly, hence the first image in the gallery was not the expected one, and had an unexpected timestamp. This is all fixed in this MP, and discussed below.