-
Notifications
You must be signed in to change notification settings - Fork 663
Proposal: Limited mutability API #311
Comments
As a Gumbo user, I'm divided on this. Gumbo is a fantastic HTML5 parser. Is there sufficient motivation behind the mutability API to make it equally fantastic? Will the next step be to add a serializer? If so, will that be fantastic? If the answer to those questions is yes, then this seems like an exciting possibility. But @nostrademons has presented some strong arguments against going down that path here and in other issues, and I find them convincing even though I also stand to benefit from a mutability API. This seems like a potentially vast scope expansion for Gumbo, and something that might be better left to other projects (either existing or new). As @nostrademons points out, there are a number of third party bindings that provide DOM APIs, and those projects benefit from being able to focus on that problem while Gumbo focuses on parsing. |
To clarify a few things:
But will work just fine with any global allocator, even a custom one. And of course will work with the simple malloc, realloc, and free.
As will any change to the memory allocator including the arena proposal. We are talking about v1.0 here not v0.9.5. It is really the change to memory allocation that will will need to make changes to the Options. Furthermore the changes to support fragment_parsing, templates, add rtc, and the like all create similar backwards incompatibility issues but they need to be done at some point and are now in the master tree. That is why we are proposing this for v1.0.0 not v0.9.5.
Actually if they want to walk the tree at al, they need to be up to speed on all of the vector and attribute calls and understand how they work. So it is really not much new (or shouldn't be). There are no new functions being proposed just asking to expose existing ones.
Based on Nostrademons recent statistical studies on the 60K+ websites, given the very low number of attributes and children nodes, being O(n) is really a moot point if 99.5% of the time n is 1 or 2. Some other points based on an earlier response:
Hope this clarifies things a bit. I realize the discussion at #295 is very very long, but it can make for interesting reading! |
My 2c: I think some work could be done on the current workarounds to make them more practical without adding this to gumbo. Gumbo is amazing because it does one thing really well and interoperates quite well with other tools that might have a wider scope. So I'm all in favor of keeping Gumbo as simple as possible, and maybe focus on speed & other optimizations instead. |
@rgrove Are there any DOM implementations in C that you know of? Which projects do you know of that offer good DOM APIs that work in conjunction with this project? |
I believe the maintainer of gumbo wrote a converter from gumbo's parse tree to a libxml tree/dom.
…Sent from my iPad
On Mar 3, 2017, at 1:26 PM, Ryan Grove ***@***.***> wrote:
@lastmjs Sorry, C isn't my area of expertise so I'm not sure what the landscape looks like for DOM-related projects in C. I currently use Gumbo via Nokogumbo in Ruby, which provides bindings with Nokogiri.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Based off discussion in #295. Both Sigil and Github have a need to make small local modifications to the parse tree before reserializing it out. This is currently very difficult because of the number of pointers that must be kept in sync, the possibility of introducing memory leaks by not updating them, and the need to pass a GumboParser around for the allocator.
Concrete proposal
Current workaround
We currently recommend that people who want mutation wrap the whole parse tree in an API of their choice, mutate that, and then serialize it out. Gumbo's API is simple enough that a tree-walker can be written in a page or so of code, and tree traversal time is negligible compared to parse time (~1%). Several outside bindings have DOM APIs already, eg. lua-gumbo, gumbo-libxml, and the html5lib and BeautifulSoup adaptors that come with the main distribution.
Benefits
If this is useful to you, you'll probably know it immediately. :-) But enumerating them:
There is a partial branch demonstrating some of these changes at vmg/development.
Drawbacks
Compromise solutions
Comment with a +1 or -1, or any additional comments or considerations.
The text was updated successfully, but these errors were encountered: