toke.c: S_intuit_more: Add more commentary #23708
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This function is described in its comments as 'terrifying', and by its original author, Larry Wall, as "truly awful". As a result, it has been mostly untouched since its introduction in 1993. That means it has not been updated as new language features have been added.
As an example, it does not know about lexical variables, so the code it has for globals just doesn't work on the vast majority of modern day coding practices.
Another example is it knows nothing of UTF-8, and as a result simply changing the input encoding from Latin1 to UTF-8 can result in its outcome being the opposite result.
And it is buggy.
A few years ago, I set out to try to understand it. I added commentary and simplified some overly complicated expressions, but left its behavior unchanged.
Now, I set out to make some changes, and found many more issues than I had earlier. This commit adds commentary about those. Hopefully this will lead to some discussion and a consensus on the way forward.