-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match beginning and end of line correctly #3575
base: master
Are you sure you want to change the base?
Conversation
6a8e479
to
c0ba33e
Compare
This "padded regex" approach feels hacky. Maybe it is reliable (I haven't analyzed it carefully yet), but how about a straightforward approach instead: search not just within the given range ( |
That doesn't work because |
Ok, fair point. |
926e411
to
2e3bc77
Compare
I've force-pushed a new version:
I've also rebased the PR so that one can test it together with the changes to the search functionality done in previous PRs. |
Thanks, LGTM (apart from my above comment about comments, and the fact that it's better to squash such fix-up changes into one commit, but we can squash them later). |
Side note: looks like we can deduplicate the code in |
I've combined Moreover, the code at the beginning of One might think of making EDIT: We might also want the first argument to |
When |
8bd0cac
to
3e16d5b
Compare
LGTM, but I've just noticed that it still doesn't fix the issue in the case of So I guess we also need a similar fix in (Unless we treat this as the intended behavior, but then it's inconsistent with |
I was thinking of modifying |
@JoeKar what do you think about this PR? Maybe any ideas of an alternative approach?
I'm fine either way. Although it seems reasonable to address it in this PR, since it's rather a part of the same issue, and the way to fix it is probably just to adopt the same approach as in |
Indeed it is hacky, but it seems to fit the needs via the help of regular expressions on its own. Otherwise we've to solve it with additional code dropping (parts of) results. |
Well, yes, we could compile the padded regexps in Not even mentioning the fact that we compile the same regex every time when calling |
Hmm... Throughout the entire loop, That should also address your concern about compiling twice, since even when we call |
Several questions and comments about the extension to
|
Sounds good. 👍 |
As a first step, I've implemented @dmaluka's suggestion to remove |
Maybe... Hard to say without seeing an actual implementation. IIUC it is not directly related to the issue addressed by this PR, it would be just an extra improvement?
I wouldn't worry too much about that. We already changed
Yep, and that seems inevitable. And I imagine we wouldn't need to do that per line, we can just call
I'm not sure. A file may be large, the number of matches may be huge, do we really want to store and return them all and then process them, if we can instead just call
Maybe. Again, hard to discuss without an actual implementation. |
LGTM. |
Here is a modification of EDIT: Calls of the form |
e91bc2e
to
7943952
Compare
I've force-pushed a slightly modified version. It works the same, but the code is cleaner now. |
So please squash the last 2 commits into one (to let reviewers review the relevant version, not waste time on the outdated one). |
7943952
to
3822986
Compare
|
||
from := Loc{0, i}.Clamp(start, end) | ||
to := Loc{charCount, i}.Clamp(start, end) | ||
matches := b.findAll(search, from, to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, what is the advantage of using findAll()
to remember all matches in some temporary storage to handle them afterwards, instead of just using findDown()
and handling each match right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could do that. I wrote the code with the idea in mind to have some public FindAll
function at a later point. That code would be based on the current findAll
. One could change findAll
to some findAllFunc
that takes a callback function as argument (simialr to ReplaceAllFunc
). This way the problem of creating temporary lists would disappear. (That would also be a good idea for a public function.) Currently findAll
is only called for the first and last lines, so that performance doesn't really matter. That would change of course if we used this function for all lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I missed the email notification about your comments a week ago...
One could do that. I wrote the code with the idea in mind to have some public
FindAll
function at a later point. That code would be based on the currentfindAll
. One could changefindAll
to somefindAllFunc
that takes a callback function as argument (simialr toReplaceAllFunc
).
Still, what for? Why not keep things simple?
Currently
findAll
is only called for the first and last lines, so that performance doesn't really matter.
A single line may be huge. (Although that's an edge case, and currently micro handles huge lines very poorly anyway, for unrelated reasons, due to the dumb data structure with O(n) access, which IMO we should fix some day, because that's a shame.)
But my point is not so much about performance but about simplicity. We have a chance to try to make the logic not just more correct but at the same time simpler, by simply calling findDown()
in a loop for the entire range, instead of complicating the code with more special cases, so why not take this chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to call MultipleReplace
only once in ReplaceRegex
with all deltas at once (as done in master and this PR), then I believe we cannot deal with each match right away (unless we modify MultipleReplace
). The problem is that multiple deltas on a single line only work if the last (= rightmost) change comes first. If we simply want to call findDown
repeatedly over the whole buffer, then we have to store all matches in a list (currently done in findAll
) and then process this list backwards (done in ReplaceRegex
). (My understanding is that there is no efficient way in Go to prepend to a slice.)
Apart from this list reversal, this would indeed lead to fairly simple code. Padded repexps would be compiled for almost every call to findDown
. If one doesn't want that, then one could create the padded regexps before calling findDown
.
Please let me know how you want to proceed.
EDIT: Modifying the processing of deltas should be easy to do (in ExecuteTextEvent
). As far as I can tell, MultipleReplace
is the only place where a TextEvent
with multiple deltas is created, and ReplaceRegex
is the only function calling MultipleReplace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to call
MultipleReplace
only once inReplaceRegex
with all deltas at once (as done in master and this PR), then I believe we cannot deal with each match right away (unless we modifyMultipleReplace
).
Indeed, good point. Thanks for explanations.
I've analyzed all this in more detail and now it seems to me that the current approach in this PR is already as simple and practical as it can be without further refactoring of EventHandler (which we can do separately, if we want to).
Padded repexps would be compiled for almost every call to
findDown
.
Yeah, that's also an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be interested in a separate PR that modifies ExecuteTextEvent
? In a separate commit I could modify the search code (precompile padded regexps and use the findDown
method only in ReplaceRegex
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be interested in a separate PR that modifies
ExecuteTextEvent
?
If it goes well, why not?
ca6a634
to
19f3a0d
Compare
19f3a0d
to
df1711d
Compare
I've noticed that we used an incorrect correction to the start and end of a match in the case of a padded regexp. Previously we were dealing with incomplete characters if the padding matched a combining character. (For the end we could probably do without the fix because the combining characters come after the base character, but for the start it's important.) |
df1711d
to
4aac963
Compare
4aac963
to
4f5806e
Compare
When searching with
buf:FindNext
, the start of the search region currently always matches^
and the end of the region$
, no matter if they are the indeed the start or end of a line. Similar problems exist with other empty-string pattern like\b
. The goal of this PR is to fix this. This is done by modifying the search pattern for the first and last lines (prepending and/or appending.
.Code common to
findDown
andfindUp
has been moved to separate functions. One could tweak the specific way it's done, but maybe it's better to first get some feedback if you are willing to go in this direction at all.