Skip to content

Commit

Permalink
refactor(DisplayComponent): strip HTML using Jsoup, add Apache Common…
Browse files Browse the repository at this point in the history
…s Text for XML entity unescaping #218

Update `DisplayComponent` to use Jsoup for HTML stripping and include Apache Commons Text library for unescaping XML entities. This change improves the reliability of text processing by utilizing specialized libraries.
  • Loading branch information
phodal committed Aug 7, 2024
1 parent 50f3c8b commit 53f3969
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 2 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,8 @@ project(":") {
// token count
implementation("com.knuddels:jtokkit:1.0.0")

implementation("org.apache.commons:commons-text:1.12.0")

// junit
testImplementation("io.kotest:kotest-assertions-core:5.7.2")
testImplementation("junit:junit:4.13.2")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ class DisplayComponent(question: String) : JEditorPane() {
}

private fun stripHtmlAndUnescapeXmlEntities(input: String): String {
// 使用 Jsoup 去除HTML标签
val text = Jsoup.parse(input).text()
// 使用 Apache Commons Text 解码XML实体
return StringEscapeUtils.unescapeXml(text)
}
}

0 comments on commit 53f3969

Please sign in to comment.