Skip to content

Conversation

@toddsundsted
Copy link
Contributor

LibXML.xmlFreeNode(node) on line 60 is never called because node.value.doc is a LibXML::Doc* not a LibXML::Node* and so the comparison always evaluates to false. The equality method in Comparable only kicks in if the receiver and the one parameter are the same type. This PR casts doc to the correct type of the comparison.

The following spec demonstrates the problem. If you adjust the number of iterations, the memory allocated scales linearly with iterations and never plateaus. Unlinking nodes by iterating over children and xpath nodes both demonstrate the problem.

require "spec"
require "xml"

describe "XML memory management" do
  it "does not leak parser memory" do
    xml = %Q|<root>#{"<child/>" * 10}</root>|

    GC.collect
    baseline_rss = get_rss_kb

    1_000_000.times do
      doc = XML.parse(xml)
      root = doc.root.not_nil!
      root.children.each(&.unlink)
    end

    # 1_000_000.times do
    #   doc = XML.parse(xml)
    #   doc.xpath_nodes("//child").each(&.unlink)
    # end

    GC.collect
    final_rss = get_rss_kb
    growth = final_rss - baseline_rss

    p growth: growth

    (growth < 50_000).should be_true
  end
end

private def get_rss_kb : Int64
  {% if flag?(:darwin) %}
    `ps -o rss= -p #{Process.pid}`.strip.to_i64
  {% else %}
    File.read("/proc/self/status").lines.each do |line|
      return line.split[1].to_i64 if line.starts_with?("VmRSS:")
    end
    0_i64
  {% end %}
end

@Blacksmoke16 Blacksmoke16 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. performance topic:stdlib:serialization labels Nov 25, 2025
@straight-shoota
Copy link
Member

straight-shoota commented Nov 25, 2025

Both your PRs are reusing the same master branch on your fork, which killed #16406.
It's better to create dedicated branches for each change 😅

In order to avoid further disruptions, we should keep this PR as is and reopen a new PR with the changes from #16406 (36c769a) on a different branch.

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, I wish a linter would notice "always true" and "always false" comparisons and warn about them.

Thank you @toddsundsted 🙇

@straight-shoota straight-shoota added this to the 1.19.0 milestone Nov 25, 2025
@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 25, 2025

I could reopen #16406 from the github pr refs. See #16414.

@straight-shoota
Copy link
Member

This is an easy trap. I'm wondering if Pointer#== should always compare the address, regardless of the pointer type. They're partially identical because they point to the same start address. The exact memory size may differ between types, though. And interpretation differs anyway. 🤔
But it's not about the objects themselves, it's about pointers to them. And pointers are primarily defined by their address.

@toddsundsted
Copy link
Contributor Author

Both your PRs are reusing the same master branch on your fork, which killed #16406. It's better to create dedicated branches for each change 😅

Sorry! Got careless.

@toddsundsted
Copy link
Contributor Author

toddsundsted commented Nov 25, 2025

ugh again :-( this is the last time, I swear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. performance topic:stdlib:serialization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants