Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix formatting of block quotes with **/ closing characters #748

Conversation

wylieconlon
Copy link

This is a common bug with Regex-based tokenizers. The previous regex was capturing too many characters in the "MIDDLE" regex, which was causing expressions like /** comment **/ to be treated as individual operators instead of a comment.

The fix implemented here is to use a stack-based tokenizer which to parse nested comments instead. This appears to work as expected for TransactSQL dialects according to the reference doc.

Closes #747

Copy link

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@nene
Copy link
Collaborator

nene commented Jun 18, 2024

Hey, thanks for the patch. 👍 👍 👍

It's rare to get a fix for a non-trivial issue like this.

I'll take some time to read the code (I'm super tired right now), but will most likely merge this in.

@wylieconlon
Copy link
Author

Please let me know if you have any feedback about the implementation! Thank you for making it easy to contribute :)

Copy link
Collaborator

@nene nene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, after reviewing this fix, I decided to make a simpler one instead. See #751

I found this solution to be a bit too complex to read and understand. Wrote some comments about the problems I see with this code.

I did end up using the test you wrote, and included you accordingly to list of contributors.

Thanks.

} else if ((match = this.matchSection(MIDDLE, input))) {
result += match;
result += '*/';
i++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the loop variable of a for-loop breaks my expectations. With a for-loop my expectation is that the i++ at the top means that at every step of the loop we always increment the index by one. When more complex behavior is needed, I think a while-loop would be a better choice.

} else if (nestLevel < 0) {
return null;
}
} else if (input[i] === '/' && input[i + 1] === '*') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we match the start of a comment with one kind of code. But above we use another kind of code this.matchSection(START, input) to do the same thing.

If the code does the same thing, it should better look the same as well. Or there should be some explanation as to why one needs to do the same thing differently.

Here, with the majority of the logic rewritten from regexes to plain char comparisons, that matchSection() method is really a leftover from old implementation and its existence will confuse the future reader.

if (nestLevel === 0) {
this.lastIndex = i;
return [result];
} else if (nestLevel < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will it happen that nestLevel becomes negative?

I don't think this ever happens.

@nene nene closed this Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FORMATTING] TransactSQL incorrectly formats some block comments, leading to broken statements
2 participants