-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to Construct 2.10 #477
Conversation
Thanks, this is a very large change, PTAL at the CI test failures... Can you clarify how this is required to support DWARFv5 stuff? |
Some of the form types mentioned here require Will investigate the CI failures. |
d6ea94c
to
ab6714e
Compare
CI and unit test issues have been resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very nice! Still thinking about this, left some initial comments for now
return 0 | ||
|
||
|
||
class EmbeddableStruct(Struct): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the newer construct versions removed these? I wonder if they offer any alternatives -- writing these custom structs always felt like a kludge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there are no replacements as mentioned here construct/construct#1027 (comment)
I did however manage to remove RepeatUntilExcluding
and StreamOffset
We're long overdue on this :-) One thing that is slightly annoying is that we're adding more special construct utils/parsers instead of removing existing ones! @sevaa - please take a look as well, when you have some time. @SupremeMortal did you have a chance to measure performance? I want to make sure we're not regressing here |
Move away from an internal copy of Construct to the latest version in order to support DWARF v5 required data types.
ab6714e
to
0bc0fe9
Compare
I have not measured performance properly yet, but on my system, the unit tests now take on average around 9.1 seconds compared to before when it took 8.3 seconds. There are some optimisations that can be made to improve it which I will investigate. |
Running |
On general grounds, I welcome not forking vendor packages gratutously :) That said, I'll take a closer look. |
Hey now, it wasn't gratuitous :-) That said, now seems like a fine time to relinquish the fork. |
My view on where things stand now:
No rush on any of this, just thought a status update from my end would be useful. |
I'm unsure if the performance will be good enough. I tested with a 1.5GB ELF file with DWARF v5 debug, and it took 20 seconds with this PR compared to 10 seconds without which is quite a large regression in performance. I also tried optimising structs with the new lazy-loaded structures in construct 2.10, but it ended up being unusable for most of the structs in this codebase. |
@SupremeMortal thanks for the update! This is indeed concerning. pyelftools doesn't excel at performance in general, so making it 2x slower for large inputs is a hard pill to swallow. @arekbulski cc-ing you, given the discussion in #180 |
So a new convention to learn. Doesn't look too terrible. It did seem unnatural to me all along that parsers needed to have a name attached to them just in case the parser is embedded in a struct. As for compiling, we could start by storing the basic building blocks (trivial DWARF forms, nondynamic headers, etc) in compiled form and see where it takes us. |
What do you mean by "compiling" here? Can you elaborate? |
Compilation is a feature in Construct. |
@eliben Are we abandoning this? I'd rather not. |
@sevaa I don't have the capacity to spend too much time on this, personally. The 2x performance penalty seems like a problem to me - so it will have to be addressed. WDYT? |
I have some bandwidth currently, wanted to revisit this. There's been several releases of Construct since, maybe they've done something about performance - or maybe we could apply some tweaks on our side. I don't see @SupremeMortal publishing any changes since the initial commit - if he looked at the performance issues, it was not mature enough for publishing, and anyway, the fork needs to be merged with the recent changes to the master. If @SupremeMortal doesn't respond in a few days, I could fork the fork and sort of take over. |
FYI, for me, with this PR's fork and the latest construct, the performance difference with the master is not that great - the master is ~2% faster for a firehose parsing of DWARF CU/DIEs in a subset of our corpus. That's two percent, not times two, |
Interesting. I wonder what benchmark @SupremeMortal is alluding to in #477 (comment) 2% is definitely tolerable! |
Status update. I have merged the branch with the master. Autotests pass. I've slapped together a primitive performance test that isolates the parsing - reading all files under testfiles_for_dwarfdump into memory, then parsing them into CUs and DIEs under the timer (the I/O is deliberately not timed). I've added UPDATE: construct has a built-in parser called The latest result on said performance test is - 3.1 sec vs 2.7 in the current master. The previous test run, the one with a 2% discrepancy, I'm afraid, was marred with debugger interference. That's about a 15% slowdown. I might get more mileage out of compile() if something is done about the compiling of Embed and lambdas. One rather counterintuitive result I've got - since DIE is the most parsed structure in DWARF (at least in the firehose scenario), I've tried prebuilding a parser for the attribute sequence of an abbreviation and reusing that in the DIE parser, but that only slowed down the benchmark. Even weirder, when I compiled the parser, it slowed down the benchmark even more. |
Superseded by #548 |
Move away from an internal copy of Construct to the latest version in order to support new DWARFv5 data types.