Skip to content

Conversation

cyisfor
Copy link

@cyisfor cyisfor commented Feb 10, 2016

I wanted to pull out all the relevant nodes in my document, to modify them without further lookup, and then just clone the modified document. So like, collector.next.attr("href",nextLink) and then return collecter.doc.clone(). But collector.next = collector.doc.root.querySelector("...") does a full search of the entire document tree. I can pare that down by doing a query selector on "head" or something, but it's still sweeping over the same tree in multiple passes many times needlessly.

But just writing a parser wouldn't cut it either, since I want to do things with nested elements, like if <p id="navigation"> had <a href="next"> and <a href="prev"> and so on in it. I could reinvent the wheel, and have seven billion ifInThisParticularNestedParagraphCalledNavigation flags, or... I could match these elements as the dom module is parsing them. If you can get the current element right before onClose() is run, then all its child elements will already be parsed, and you can act as if it were any other document element.

So a "push" DOM builder, that fires off events as it's building, is sort of what I am aiming at. Don't know if it's any good, just thought that I would be able to use it, and it'd cut down on a couple dozen querySelector() calls that really didn't need to be independent walks of the document tree.

user added 15 commits February 8, 2016 05:17
document.clone() was horribly annoying when I tried to use it, when I just wanted to clone one node based on another, or just clone a node itself.

An even more sophisticated thing would automatically clone nodes that are passed to appendChild, rather than erroring out if they have a different document.
Parsing two pages of text for a single line number wasn't fun, so I just went and made my own unit tester by ripping off dunit, since I didn't know if I should add dunit as a dependency to this package.
There's a unit testing framework that takes the AST of the source, and rewrites "assert(A==B)" into "assertEqual(A,B)". This is not that unit testing framework. I am not going to do that.
Now that I can actually read, I can start fixing the values to be what they actually turn out to be. Also, there was an error on line 332 that probably should have had a stack trace, so I made my stupid unit tester use a special exception instead of blanket catching all Throwable.
and explaining my madness
It fires events as it builds the DOM, allowing to manipulate the end product, or examine pieces of it, without an extra pass over the document tree for every transformation.
But that's a big difference from structs...
I could do this with way less syntax and keep the DOMBuilder a struct. Just have to implement my own custom object inheritance system! No problem!
There's no way to tell what functions parseHTML requires, without failing to provide those functions and dying with an error. Then reading the errors which can't be machine parsed, and manually adding stubs for the missing functions.

No interface or abstract exists for structs, and no inheritance, so parseHTML just has to duplicate its code for every possible builder. Maybe classes would be a good idea after all...
parseHTML undeterminably requires some functions to take 0, 1 or 2 HTMLStrings as arguments. So, abstract out stub logic, and add stuff to build argument and type/argument lists.
First take a block and substitute @super@ for the generated super-call to the wrapper. Then substitute that block in where @block@ is for the stub template. Now everything can be stubs, with less room for error!
This is such a hack. Maybe we should use classes after all. But it works! You don't even have to ensure you typed the right superfunction for a given function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants