Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve social media preview #3163

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions integreat_cms/api/v3/social_media_headers.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,19 @@

def get_excerpt(content: str) -> str:
"""
Correctly escapes, truncates and normalizes the content of the page to display in a search result
Correctly escapes, truncates and normalizes the content of the page to display in a search result.
Then apply some crazy string operations to handle several edge cases.
:param content: The content of the page
:return: A page excerpt containing the first 100 characters of "raw" content
"""
return unescape(strip_tags(content))[:100].replace("\n", " ").replace("\r", "")
stripped_content = unescape(
strip_tags(content.replace("\n", " ").replace("\r", "").replace("<br>", " "))
)
if len(stripped_content) <= 100:
return stripped_content.strip().replace(" ", " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this line? :)

Copy link
Member Author

@svenseeberg svenseeberg Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example

Hallo<br>
Welt

will evaluate to Hallo Welt as we replaced newlines and br tags with empty spaces in line 51.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't re.sub(r"\s+", " ", stripped_content.strip()) work better in this case? Who says we only have two spaces?

return stripped_content[:100].strip().rsplit(' ', 1)[0]+" ..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return stripped_content[:100].strip().rsplit(' ', 1)[0]+" ..."
return stripped_content[:100].strip().rsplit(' ', 1)[0]+" "



def get_region_title(region: Region, page_title: str) -> str:
Expand Down
Loading