Skip to content

Commit

Permalink
Allow parsing with HTML5 (#162)
Browse files Browse the repository at this point in the history
* Asserted on broken behaviour with HTML4 parsing

This commit just captures existing behaviour because in the next commit
I'm going to make this configurable

* Allow parsing with HTML5

Nokogiri is on the path to parsing with HTML5 by default:
sparklemotion/nokogiri#2331

But, there are some things they still need to do. For those of us who
want to opt-in to HTML5 parsing, I've added an option for it. This will
prevent the gem from messing with the structure of the html
(specifically, prematurely closing <a> tags that wrapped table elements.
  • Loading branch information
jesseduffield authored Nov 11, 2024
1 parent 430a4b6 commit 75c07ad
Show file tree
Hide file tree
Showing 6 changed files with 49 additions and 2 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,16 @@ Get stats for a campaign
AhoyEmail.stats("my-campaign")
```

## HTML5 Parsing

By default, this gem uses Nokogiri's HTML 4 parser to rewrite href attributes for the `utm_params` and `track_clicks` features. This can cause link tags to be prematurely closed if they were wrapping table elements, because doing so violates the HTML 4 spec.
To use HTML5 parsing instead, set this in an initializer:
```ruby
AhoyEmail.html5 = true
```
## History
View the [changelog](https://github.com/ankane/ahoy_email/blob/master/CHANGELOG.md)
Expand Down
4 changes: 3 additions & 1 deletion lib/ahoy_email.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
require_relative "ahoy_email/engine" if defined?(Rails)

module AhoyEmail
mattr_accessor :secret_token, :default_options, :subscribers, :invalid_redirect_url, :track_method, :api, :preserve_callbacks, :save_token
mattr_accessor :secret_token, :default_options, :subscribers, :invalid_redirect_url, :track_method, :api, :preserve_callbacks, :save_token, :html5
mattr_writer :message_model

self.api = false
Expand Down Expand Up @@ -79,6 +79,8 @@ module AhoyEmail

self.save_token = false

self.html5 = false

self.subscribers = []

self.preserve_callbacks = []
Expand Down
10 changes: 9 additions & 1 deletion lib/ahoy_email/processor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def track_links
if html_part?
part = message.html_part || message

doc = Nokogiri::HTML::Document.parse(part.body.raw_source)
doc = parse_message(part.body.raw_source)
doc.css("a[href]").each do |link|
uri = parse_uri(link["href"])
next unless trackable?(uri)
Expand Down Expand Up @@ -92,6 +92,14 @@ def track_links
end
end

def parse_message(raw_source)
if AhoyEmail.html5
Nokogiri::HTML5.parse(raw_source)
else
Nokogiri::HTML::Document.parse(raw_source)
end
end

def html_part?
(message.html_part || message).content_type =~ /html/
end
Expand Down
4 changes: 4 additions & 0 deletions test/internal/app/mailers/utm_params_mailer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ def nested
mail_html('<a href="https://example.org"><img src="image.png"></a>')
end

def nested_table
mail_html('<a href="https://example.org"><table></table></a>')
end

def multiple
mail_html('<a href="https://example.org">Test</a>')
end
Expand Down
6 changes: 6 additions & 0 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ def with_save_token
yield
end
end

def with_html5
AhoyEmail.stub(:html5, true) do
yield
end
end
end

class ActionDispatch::IntegrationTest
Expand Down
17 changes: 17 additions & 0 deletions test/utm_params_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,23 @@ def test_nested
assert_body '<img src="image.png"></a>', message
end

# When nokogiri parses with html5, it allows an <a> tag to wrap a <table> tag
def test_nested_table_html5
with_html5 do
message = UtmParamsMailer.nested_table.deliver_now
assert_body "utm_medium=email", message
assert_body '<table></table></a>', message
end
end

# When nokogiri parses with html4, it disallows an <a> tag to wrap a <table> tag,
# and closes the <a> tag before the <table> tag
def test_nested_table_html4
message = UtmParamsMailer.nested_table.deliver_now
assert_body "utm_medium=email", message
assert_body '</a><table></table>', message
end

def test_multiple
message = UtmParamsMailer.multiple.deliver_now
assert_body "utm_campaign=second", message
Expand Down

0 comments on commit 75c07ad

Please sign in to comment.