fix formatting of block quotes with **/ closing characters #748

wylieconlon · 2024-06-14T21:05:05Z

This is a common bug with Regex-based tokenizers. The previous regex was capturing too many characters in the "MIDDLE" regex, which was causing expressions like /** comment **/ to be treated as individual operators instead of a comment.

The fix implemented here is to use a stack-based tokenizer which to parse nested comments instead. This appears to work as expected for TransactSQL dialects according to the reference doc.

Closes #747

codesandbox-ci · 2024-06-14T21:06:05Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

nene · 2024-06-18T09:04:10Z

Hey, thanks for the patch. 👍 👍 👍

It's rare to get a fix for a non-trivial issue like this.

I'll take some time to read the code (I'm super tired right now), but will most likely merge this in.

wylieconlon · 2024-06-18T15:27:31Z

Please let me know if you have any feedback about the implementation! Thank you for making it easy to contribute :)

nene

Hey, after reviewing this fix, I decided to make a simpler one instead. See #751

I found this solution to be a bit too complex to read and understand. Wrote some comments about the problems I see with this code.

I did end up using the test you wrote, and included you accordingly to list of contributors.

Thanks.

nene · 2024-06-20T06:32:59Z

src/lexer/NestedComment.ts

-      } else if ((match = this.matchSection(MIDDLE, input))) {
-        result += match;
+        result += '*/';
+        i++;


Modifying the loop variable of a for-loop breaks my expectations. With a for-loop my expectation is that the i++ at the top means that at every step of the loop we always increment the index by one. When more complex behavior is needed, I think a while-loop would be a better choice.

nene · 2024-06-20T06:35:27Z

src/lexer/NestedComment.ts

+        } else if (nestLevel < 0) {
+          return null;
+        }
+      } else if (input[i] === '/' && input[i + 1] === '*') {


Here we match the start of a comment with one kind of code. But above we use another kind of code this.matchSection(START, input) to do the same thing.

If the code does the same thing, it should better look the same as well. Or there should be some explanation as to why one needs to do the same thing differently.

Here, with the majority of the logic rewritten from regexes to plain char comparisons, that matchSection() method is really a leftover from old implementation and its existence will confuse the future reader.

nene · 2024-06-20T06:42:21Z

src/lexer/NestedComment.ts

+        if (nestLevel === 0) {
+          this.lastIndex = i;
+          return [result];
+        } else if (nestLevel < 0) {


When will it happen that nestLevel becomes negative?

I don't think this ever happens.

fix formatting of block quotes with **/ closing characters

5c37ecf

nene reviewed Jun 20, 2024

View reviewed changes

nene closed this Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix formatting of block quotes with **/ closing characters #748

fix formatting of block quotes with **/ closing characters #748

wylieconlon commented Jun 14, 2024

codesandbox-ci bot commented Jun 14, 2024

nene commented Jun 18, 2024

wylieconlon commented Jun 18, 2024

nene left a comment

nene Jun 20, 2024

nene Jun 20, 2024

nene Jun 20, 2024

fix formatting of block quotes with **/ closing characters #748

fix formatting of block quotes with **/ closing characters #748

Conversation

wylieconlon commented Jun 14, 2024

codesandbox-ci bot commented Jun 14, 2024

nene commented Jun 18, 2024

wylieconlon commented Jun 18, 2024

nene left a comment

Choose a reason for hiding this comment

nene Jun 20, 2024

Choose a reason for hiding this comment

nene Jun 20, 2024

Choose a reason for hiding this comment

nene Jun 20, 2024

Choose a reason for hiding this comment