Skip to content

Commit

Permalink
Strip script and style tags through ::clean() method instead of preg_…
Browse files Browse the repository at this point in the history
…replace

Huge tags can lead to a failure of preg_replace, thus erasing the whole
fetched content.

Fixes wallabag/wallabag#5847

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
  • Loading branch information
Kdecherf authored and j0k3r committed Jun 13, 2022
1 parent 0c0653d commit 6689f19
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions src/Readability.php
Original file line number Diff line number Diff line change
Expand Up @@ -108,10 +108,6 @@ class Readability implements LoggerAwareInterface
protected $useTidy;
// raw HTML filters
protected $pre_filters = [
// remove obvious scripts
'!<script[^>]*>(.*?)</script>!is' => '',
// remove obvious styles
'!<style[^>]*>(.*?)</style>!is' => '',
// remove spans as we redefine styles and they're probably special-styled
'!</?span[^>]*>!is' => '',
// HACK: firewall-filtered content
Expand Down Expand Up @@ -366,6 +362,9 @@ public function prepArticle(\DOMNode $articleContent): void

$this->logger->debug($this->lightClean ? 'Light clean enabled.' : 'Standard clean enabled.');

$this->clean($articleContent, 'style');
$this->clean($articleContent, 'script');

$this->cleanStyles($articleContent);
$this->killBreaks($articleContent);

Expand Down

0 comments on commit 6689f19

Please sign in to comment.