-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* begin feed refactor * ui updates * keep icon url * debounce feed invalidation * sanitize fields * rm id [skip ci] * add more fields * add line [skip ci] * fetch column * fix favicon url * save favicon and render * more required fields * rename module * add saved to stats * count saved in total * 404 styles * improve styles and rm local entry store * debounce stats invalidation * rename classes * render html and display title * rename query state hook * scrape wip * add scraped_at field * libxml * sudo * initial readability port * add scraping job * check content_html * fix check * trim and retry urls * ui tweaks * scrape favicons * use b64 * readme [skip ci]
- Loading branch information
Showing
65 changed files
with
3,044 additions
and
1,200 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
use ammonia::Builder; | ||
use std::collections::HashSet; | ||
|
||
const REMOVE_TAGS: [&str; 1] = ["article"]; | ||
|
||
// Sanitize HTML input, allowing only safe elements | ||
pub fn extract_html(src: &str) -> String { | ||
Builder::default().rm_tags(HashSet::from(REMOVE_TAGS)).clean(src).to_string() | ||
} | ||
|
||
#[cfg(test)] | ||
mod test { | ||
use super::*; | ||
|
||
#[test] | ||
fn it_keeps_only_safe_elements() { | ||
let src = r#"<article><p>Some body text that we <em>want</em> to keep.</p><p class="read-more">[<a href="https://example.com">Read More</a>]</p><script>alert("gotcha")</script><style>body { display: none }</style></article>"#; | ||
|
||
let parsed = extract_html(src); | ||
assert_eq!( | ||
parsed, | ||
r#"<p>Some body text that we <em>want</em> to keep.</p><p>[<a href="https://example.com" rel="noopener noreferrer">Read More</a>]</p>"# | ||
); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
mod stylistic; | ||
pub use stylistic::extract_stylistic_html; | ||
|
||
mod text; | ||
pub use text::extract_text; | ||
|
||
mod html; | ||
pub use html::extract_html; | ||
|
||
// TODO: use `.url_relative(UrlRelative::RewriteWithBase(...))` with ammonia and pass in site URL to rewrite relative URLs |
Oops, something went wrong.