Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse benchmark #104

Merged
merged 1 commit into from
Jan 6, 2024
Merged

Add parse benchmark #104

merged 1 commit into from
Jan 6, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Jan 6, 2024

I want to improve the parsing process and would like to add a parsing benchmark.

The benchmark process just parses the XML from beginning to end.

Since performance differs depending on whether YJIT is ON or OFF, both are measured.

@naitoh naitoh marked this pull request as draft January 6, 2024 07:57
rexml.gemspec Outdated Show resolved Hide resolved
benchmark/parse.yaml Outdated Show resolved Hide resolved
benchmark/parse.yaml Outdated Show resolved Hide resolved
@naitoh naitoh marked this pull request as ready for review January 6, 2024 12:42
@naitoh naitoh requested a review from kou January 6, 2024 12:42
Rakefile Outdated
benchmark_tasks << "benchmark:#{name}"

case name
when /\Aparse/, "shift"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
when /\Aparse/, "shift"
when /\Aparse/

'dom' : REXML::Document.new(xml).elements.each("root/child") {|_|}
'sax' : REXML::Parsers::SAX2Parser.new(xml).parse
'pull' : |
p = REXML::Parsers::PullParser.new(xml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using parser for variable name because we have p method?

Suggested change
p = REXML::Parsers::PullParser.new(xml)
parser = REXML::Parsers::PullParser.new(xml)

'pull' : |
p = REXML::Parsers::PullParser.new(xml)
while p.has_next?
res = p.pull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to assign the return value:

Suggested change
res = p.pull
p.pull

Comment on lines 31 to 32
n_elements = Integer(ENV.fetch("N_ELEMENTS", "10000"), 10)
n_attributes = Integer(ENV.fetch("N_ATTRIBUTES", "10"), 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these configurations, we take 10+ minutes for one rake benchmark: https://github.com/ruby/rexml/actions/runs/7431570516/job/20222471649?pr=104

Can we reduce benchmark time? Or do we need these numbers for meaningful benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set this value because I need to execute XML with a large number of elements, but the execution time is too long, so I will reduce the number of elements.

Copy link
Contributor Author

@naitoh naitoh Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou kou merged commit 72a26d6 into ruby:master Jan 6, 2024
39 checks passed
@kou
Copy link
Member

kou commented Jan 6, 2024

Thanks!

@naitoh naitoh requested a review from kou January 6, 2024 22:59
@naitoh naitoh deleted the add_parse_benchmark branch January 6, 2024 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants