-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doccursor #217
Doccursor #217
Conversation
At the risk of duplication, add more dedicated _get_header_line() to make it easier to understand what is needed where.
We want to wrap these in an object, but we'll try to do so incrementally. By moving this code into our own `doccursor` module, it makes sense that we import `CursorKind` and `TokenKind` from it instead of from Clang's library despite them being the same by design.
Okay, I'm not very far in reviewing, but I'm super uneasy about |
That may well be one of those things that didn't cross my sleep deprived mind 😅 Can be fixed by passing it to the instance's scope though, all the copies should still point to the same dict, no waste either way I believe. I literally did it that way because I was thinking it made it clearer there was only one. |
Yeah, I looked through the code, and I think it's a very small change to pass the dict around. |
I haven't looked into all the details yet, but from a high level, compared to my approach:
All in all, I like this better than my approach (and with the future addition of multiple classes, much better) and obviously much better than what we currently have. Please fix the Nice work! 👍 |
Oh, when I first saw the caching I immediately thought "Premature optimization is the root of all evil" 😁 |
462fe31
to
055100f
Compare
As a 1st step, we make a _very_ shallow wrapper around the Clang cursor. There is of course no point to it yet, but with this simple trick we are now using `DocCursor`s everywhere past `parse()`. The only effect of this commit is to worsen performance very slightly due to needless invocation of path specific code on all cursors. Running the full test suit is now ~9% slower on my machine. We shall revisit this later when it actually matters.
Another baby step, which is again mostly cosmetic. In doing this, we can remove plenty of the attributes we defined just to get cursor compatibility. We may want to expose some of this again later, but likely through a better API. For now we simplify the code as much as possible. After we should start thinking of how this will play out with docstring and improving these methods / attributes to provide a nice API for the new cursor.
We want to start improving the API in ways that will no longer mimic the Clang's cursor one. We need to prevent (e.g.) our cursor `type` property turns out to mean something different than Clang's own namesake (spoiler: it will!). There might be a few cases where the result might be the same, but that will not be obvious most of the times and it may still pose a performance issue anyway. As a rule, we stick to using the Clang cursor everywhere we can.
We want to do better than this later on, but for now we need to simplify the parser so that we focus instead on API alone with docstring as the biggest consumer. In order to do that the easy way, we will start by providing an API that reflects the current docstring API, then pass the cursor to docstring directly simplifying its own API and finally isolating the parser from docstring entirely. The idea is _not_ to mimic Clang's cursor any more! Any similarity in attributes and property names is coincidental from now on. By this point we've also made up the lost performance from the stub commit!
055100f
to
5aee698
Compare
Thanks for the quick review. Pushed some fixes, including the
Well, the thought did cross my mind. That's why I put more effort in the commit message trying to find out exactly why we should care than actually implementing it xD That said, you did much the same thing in your branch if I read correctly. Though instead of lazy caching you just called the methods ahead of time on |
7e44f65
to
59cc22b
Compare
I read through this commit by commit, and didn't spot any issues. I'm getting pretty tired though, so I'll get back to it still with fresh eyes. It occurred to me the caching might be possible to implement using a decorator, and googling found https://docs.python.org/dev/library/functools.html#functools.lru_cache. I'm not insisting on switching to a decorator now but I'm also undecided whether to merge or drop the caching at this point. I'll sleep on it. |
That looks pretty cool! I'm tired myself, but I'll look into it. It would simplify the code for sure, even if it's not that complicated to begin with. It should also have the added benefit of not being tied to a doccursor instance if I understand it right. So, as long as it's the same base cursor (same hash), the cache would still work. E.g. iterating over the children multiple times currently creates new doccursors with an empty cache every time, but this should still work as the cursors would be identical every time. |
This on top fixes #201. I can send this separately, along with a test, but FYI.
Edit: It occurs to me this could lead to infinite recursion, but I'm not sure yet if it's just theoretical. |
Looks like there are pitfalls in using this on methods, need to investigate: https://www.youtube.com/watch?v=sVjtp6tGo0g I'm thinking let's leave that for future improvement. |
This doesn't need to be part of this PR, but I can add it if you prefer it that way. |
I can handle it afterwards. It was just FYI, in case you see any problems with it. |
Had a quick look. The pitfall is a bit unfortunate, but thinking about it with a fresh mind it's obvious it had to be that way and not the way I was positing it would work. Still, he volunteers a solution that may still be worthwhile versus the current implementation. Let's leave that commit out for now then since it's not an issue yet, not even potentially (cursor not yet exposed to events and we know exactly how many times we call these things). I'll take the opportunity to think if there is a way to get the global cache working the way we would want, allowing multiple passes over cursors. Maybe we can recover the cursor instance from the cache instead of generating a new one on |
I didn't, until you mentioned infinite recursions xD That said, this all should be reworked as docstring can now be simplified and it's possible that |
By the way, I had never looked into how decorators worked and with that short video it all makes sense now, so thanks for that. It's a useful thing to know, I wish I had looked into it sooner ;) |
Oh, I completely agree. The whole |
@BrunoMSantos Okay. I'm prepared to merge this as-is, or with |
Great! But leave the cache commit out then. I'll have a better look at it and push another PR later, no need to rush it. |
Umm, would you mind doing that and force pushing the branch, please? I mean I can of course do it myself and push a separate branch and merge a separate pull request, but I'd rather it was your pull request that I merge. |
59cc22b
to
2ad9dcd
Compare
Finally, we get to make the parser completely oblivious to the rendering layer! Hopefully docstring and doccursor can now evolve to something better. Closes jnikula#180. Though a lot of the potential is still to be realized.
2ad9dcd
to
0cd1f59
Compare
I forget this is not like email workflow :P |
@BrunoMSantos One thing I noticed that bugs me a little. I think |
Ah yes, I intended to mention it in the PR for us to discuss it, but apparently forgot. I'm still expecting to address it, but I felt what I had was enough of an improvement already and I'm not sure I want A cursor having no comment is already fine I think. Both by |
I guess it needs special casing somewhere, and the debate is where to have it. Perhaps the answer depends on what's easiest for implementing #166.
I've got a branch implementing some of this, but it's stale now and needs rebasing. It also stalled, because currently we only recurse for things that have a comment, but recursing for everything opens up the struct vs. member can of worms again for things like:
We get the cursor for both Anyway, getting a bit off-topic here... |
Based on an old attempt by me and on #206.
I think this is worthwhile merging already, minus any review concern, so I'm submitting it as such. Given the amount of changes I wouldn't be surprised if it needs some iterations, but I'm quite happy with it already.
I did not go as far as the original in simplifying docstring. But that is independent work which shouldn't block this I think.
Anyway, I took a couple of early commits from #206 with maybe some minor changes. The rest kind of took a different approach, but with plenty of inspiration drawn from the original, which forced me to find ways around issues I wouldn't have tackled otherwise. So thanks for that @jnikula :)
-->8--
On to real concerns:
One possible concern is with 88bc569 and f46d74f, which are simple, but a bit chunkier than I'd like. I can look into breaking those down if need be. When I bundled those changes together, they were simpler, but then tiny little things had to be bolted on as I revisited earlier commits...
The biggest architectural difference is with the single cursor object, which I named
DocCursor
(goes withDocstring
I guess).Originally, I didn't consider merging both ideas and had already invested too much on the current implementation, but I did realize I could have something like this to better group related code:
I tried going that way, but it would take too much work rebasing though, so I'd rather attempt that later instead of rewriting everything.
One thing I'm quite certain though: I love how the parser turned out with the single cursor type. It did require a tiny trick with the
_COMMENTS
variable and an init function, but I think it's well worth it. I also had to address the domain change onextern "C" {}
differently to make it look nice. I'm quite stoked with the end result though. Those bits of magic are all in e9c1eee.Another difference is in how I handled the multiple calls to certain big methods. The original initialized things on construction, I implemented a caching mechanism which I find pretty elegant and should do the job even better on account of being lazier :)
Finally, another thing I'm unsure about is returning
None
from cursor properties that don't apply to a given cursor or throw a meaningful message. Not a big deal to let it go for now I think.