Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(search): clean markdown elements in search contents #2457

Merged
merged 13 commits into from
Sep 21, 2024

Conversation

Koooooo-7
Copy link
Member

@Koooooo-7 Koooooo-7 commented Jun 27, 2024

Summary

Changes

  • Process the content before store it into indexDB, hence we don't need parse it every times.
    TODO In v5+: move it as a async job instead of handing main thread too long, especially with large contents. (it looks fine in our site for now)
  • Adaption to use marked v13+ with pure new renderer rewrite.
    • Remove every markdown element stylings.
    • Remove the helper ?> !> of docsify either.
  • Copied functions instead import to reduce the package size (import it will block the build min optimize since it is over 500kb).
  • Hardcode the ... to matched contents as truncation surroundings.
  • Test cases for the changes.

Snapshot (before -> after)

Screenshot1
Screenshot2

Related issue, if any:

What kind of change does this PR introduce?

Bugfix

For any code change,

  • Related documentation has been updated, if needed
  • Related tests have been added or updated, if needed

Does this PR introduce a breaking change?

Yes
No

Tested in the following browsers:

  • Chrome
  • Firefox
  • Safari
  • Edge

Copy link

vercel bot commented Jun 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docsify-preview ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 19, 2024 5:33am

Copy link
Member

@jhildenbiddle jhildenbiddle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be working but there are a few issues:

  1. There is a noticeable delay when display search results. The delay Is very noticeable using Safari and less noticeable using Chrome (macOS 14.5). This appears to be a result of storing raw search data, then doing a lot of text processing on the data each time a search query is performed. Why not do the processing before storing the search data so we only have to do that processing one time and not before rending searching results?
  2. E2E tests are failing. This may be result of bringing the branch up to date with develop.

CC: @sy-records

@paulhibbitts
Copy link
Collaborator

paulhibbitts commented Jul 19, 2024

Thanks so much @Koooooo-7 for working on this! I've also tried the latest Preview in this thread, which I assume has this change included, and noticed two things:

  1. The matching text snippet does not have ellipses (...) at the start of the text when truncated which can reduce the readers understanding of the truncation going on. For example, the fifth result returned when searching for class displayed erty 'classList' of null (#1527) (d6df2b8), closes...

  2. The highlighting of the found text is on longer happening, which may be an understandable side effect of the merged v5 style updates etc.

I hope the above helps.
Paul

@Koooooo-7
Copy link
Member Author

Hi @jhildenbiddle @paulhibbitts ---
Thx for the points.
I think the performance issue does need to resolve.
I will sync with @sy-records after the #2464 storage layer change merged to do the new storage adaption and performance refine .

@Koooooo-7
Copy link
Member Author

This appears to be a result of storing search data unmodified, then doing a lot of text processing every a search query is performed. Why not do the processing while retrieving the search data and store the result so we only have to do basic text matches on search queries are performed?

Make sense.
I think we could store the formatted data in storage and simple format the search content to do the retrieve instead of format it every time.

@Koooooo-7
Copy link
Member Author

@paulhibbitts

  1. The matching text snippet does not have ellipses (...) at the start of the text when truncated which can reduce the readers understanding of the truncation going on. For example, the fifth result returned when searching for class displayed erty 'classList' of null (#1527) (d6df2b8), closes...
  1. The highlighting of the found text is on longer happening, which may be an understandable side effect of the merged v5 style updates etc.

thx for the nice catch, notes the styling issue. 👌

@Koooooo-7 Koooooo-7 marked this pull request as draft July 20, 2024 08:35
@Koooooo-7 Koooooo-7 added this to the 5.x milestone Jul 20, 2024
Copy link
Collaborator

@paulhibbitts paulhibbitts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the update @Koooooo-7 ! The display of preceding ellipses look good 🙂

I've done some more testing, and have a few comments/questions:

  1. Empty ellipses (......) are being displayed when searching for items matching only a Header and no immediate content below. For example, search for "Headings" which is on the UI Kit page. If no content within ellipses perhaps do not display ellipses/content at all?
  2. Should we include Markdown image paths/names? For example, search for "icon.svg"?

@Koooooo-7
Copy link
Member Author

Koooooo-7 commented Aug 1, 2024

  1. Empty ellipses (......) are being displayed when searching for items matching only a Header and no immediate content below. For example, search for "Headings" which is on the UI Kit page. If no content within ellipses perhaps do not display ellipses/content at all?

Nice catch! I didn't aware that there may have a empty search content, I will update it when there is empty content, no ... display.

---- Updated

  1. Should we include Markdown image paths/names? For example, search for "icon.svg"?

For now, I keep the images path and names/titles meta for searching, although we can not see it in the content directly.

@paulhibbitts
Copy link
Collaborator

Awesome @Koooooo-7 , looks good! Thank you very much 🙏🏼

@paulhibbitts paulhibbitts self-requested a review August 1, 2024 16:08
sy-records
sy-records previously approved these changes Sep 18, 2024
Copy link
Member

@sy-records sy-records left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sy-records
Copy link
Member

There seems to be a problem with diacritics.

image

@sy-records sy-records self-requested a review September 18, 2024 04:13
@Koooooo-7
Copy link
Member Author

Koooooo-7 commented Sep 18, 2024

There seems to be a problem with diacritics.

I checked the previews behavior, it is different from v4 result since last year. Which has a pure wrong result highlight for cafe.

Current behavior in this PR more looks like a "patch" to correct search contents, but we still need figure it out when and why the search content changed.


Update:

There is a potential issue for the postContent and handlePostContent, the handlePostContent may have large size than the postContent after being formatted (e.g " to &quot, size up to 5 times), which makes the substring in wrong place, will get a fix on it.
Because we have the content format function changes, the behavior is still changed than v4+ .

@Koooooo-7
Copy link
Member Author

ping @sy-records

@sy-records sy-records requested a review from a team September 19, 2024 10:58
@Koooooo-7 Koooooo-7 merged commit 95901eb into develop Sep 21, 2024
9 checks passed
@Koooooo-7 Koooooo-7 deleted the search-filter-md-pure branch September 21, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants