toke.c: S_intuit_more: Add more commentary #23708

khwilliamson · 2025-09-14T17:48:19Z

This function is described in its comments as 'terrifying', and by its original author, Larry Wall, as "truly awful". As a result, it has been mostly untouched since its introduction in 1993. That means it has not been updated as new language features have been added.

As an example, it does not know about lexical variables, so the code it has for globals just doesn't work on the vast majority of modern day coding practices.

Another example is it knows nothing of UTF-8, and as a result simply changing the input encoding from Latin1 to UTF-8 can result in its outcome being the opposite result.

And it is buggy.

A few years ago, I set out to try to understand it. I added commentary and simplified some overly complicated expressions, but left its behavior unchanged.

Now, I set out to make some changes, and found many more issues than I had earlier. This commit adds commentary about those. Hopefully this will lead to some discussion and a consensus on the way forward.

This set of changes does not require a perldelta entry.

That also avoids crashing on overrun.

This function is described in its comments as 'terrifying', and by its original author, Larry Wall, as "truly awful". As a result, it has been mostly untouched since its introduction in 1993. That means it has not been updated as new language features have been added. As an example, it does not know about lexical variables, so the code it has for globals just doesn't work on the vast majority of modern day coding practices. Another example is it knows nothing of UTF-8, and as a result simply changing the input encoding from Latin1 to UTF-8 can result in its outcome being the opposite result. And it is buggy. An example of how hard this can be to get right is this fairly common use in our test suite: [$A-Z] That looks like a character class matching 27 characters. But wait, what if there exists a $A and a parameterless subroutine 'Z'. Then this could instead be an expression for a subcript. A few years ago, I set out to try to understand it. I added commentary and simplified some overly complicated expressions, but left its behavior unchanged. Now, I set out to make some changes, and found many more issues than I had earlier. This commit adds commentary about those. Hopefully this will lead to some discussion and a consensus on the way forward.

khwilliamson force-pushed the intuit_more_commentary branch from 4c36ccf to f0a5a44 Compare September 15, 2025 12:30

khwilliamson referenced this pull request Sep 15, 2025

intuit_more: no need to copy before keyword check

56f81af

That also avoids crashing on overrun.

khwilliamson force-pushed the intuit_more_commentary branch 2 times, most recently from d1fffd6 to 3407c5f Compare September 22, 2025 22:03

khwilliamson force-pushed the intuit_more_commentary branch from 3407c5f to c77f0b2 Compare September 24, 2025 13:33

khwilliamson mentioned this pull request Sep 24, 2025

Initial overhaul of S_intuit_more #23764

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

toke.c: S_intuit_more: Add more commentary #23708

toke.c: S_intuit_more: Add more commentary #23708

Uh oh!

khwilliamson commented Sep 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

toke.c: S_intuit_more: Add more commentary #23708

Are you sure you want to change the base?

toke.c: S_intuit_more: Add more commentary #23708

Uh oh!

Conversation

khwilliamson commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

khwilliamson commented Sep 14, 2025 •

edited

Loading