-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for TSX #409
Support for TSX #409
Conversation
Maybe I am wrong here but I would expect |
I think the rule exists but it's not referenced by any other rule because it gets filtered out for TSX dialect: https://github.com/tree-sitter/tree-sitter-typescript/blob/198e2ea43d1c4ddd76ee883f4eae15f4201cd241/common/define-grammar.js#L214-L221 And I guess tree-sitter is smart enough to filter out unused productions as |
Thanks for this proposal! Before getting into the details of it, I have to say I'm a little surprised that TSX is not simply a superset of TypeScript? I tried finding a reference for TSX but couldn't find one quickly. If it should be a superset, this is something we should try to get fixed in the grammar. Now for the proposal. I'm in favor of trying to get TSX supported, but I'm not sure about the right approach here. You already mention the problem with changed line numbers and resulting debugging. Besides that, I think there are two more aspects that are not optimal:
My ideal solution would be something along the lines of tree-sitter/tree-sitter-graph#144, where we could split up the spec into multiple files, controlling which parts are used in toplevel files for the two dialects. However, it's unlikely that we can spent time to implement that any time soon. I think we can change your solution a bit to overcome the most important downsides.
Text manipulation like this is not ideal, but I think it would work, not affect the runtime code or the debugging capabilities, and be self-contained in this crate. /cc @dcreager Do you think this an acceptable solution? |
Support TSX dialect of the TypeScript. Because the TSX is almost a superset of TypeScript, it makes sense to reuse the .tsg file. However it requires a bit of preprocessing to handle the differences.
Preprocesses the .tsg file into typescript and tsx specific .tsg files. The preprocessing is done in the build script. Add a --dialect|-d option to the CLI to select the dialect.
28f6954
to
cd323fd
Compare
@hendrikvanantwerpen Thank you for the suggestions. I updated the PR with the build script preprocessing as you suggested. Please note that although the .tsg file now loads for the TSX dialect, it doesn't successfully process .tsx files. Unlike the javascript TSG file, it's missing all the jsx_element related stanzas (next up for me to do). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look very nice! I left some comments below, to iron out some details and simplify the code a little, but overall I'm happy with the direction.
fn main() { | ||
let out_dir = std::env::var_os("OUT_DIR").unwrap(); | ||
for dialect in DIALECTS { | ||
let input = std::fs::File::open(TSG_SOURCE).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use expect
instead of unwrap
to give a little context to the resulting error.
let input = std::fs::File::open(TSG_SOURCE).unwrap(); | ||
|
||
let out_filename = Path::new(&out_dir).join(format!("stack-graphs-{dialect}.tsg")); | ||
let output = std::fs::File::create(out_filename).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem.
let out_filename = Path::new(&out_dir).join(format!("stack-graphs-{dialect}.tsg")); | ||
let output = std::fs::File::create(out_filename).unwrap(); | ||
|
||
preprocess(input, output, dialect).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem.
- Remove dialect stack - Better error handling - Use clap::ValueEnum - Append _typescript to {try}_language_configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few final nits, but otherwise this is good to go!
let directive_start = Regex::new(r";\s*#dialect\s+(\w+)").unwrap(); | ||
|
||
// Matches: ; #end | ||
let directirve_end = Regex::new(r";\s*#end").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
let directirve_end = Regex::new(r";\s*#end").unwrap(); | |
let directive_end = Regex::new(r";\s*#end").unwrap(); |
|
||
filter = Some(dialect == directive); | ||
output.write_all(line.as_bytes())?; | ||
} else if directirve_end.is_match(line) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem:
} else if directirve_end.is_match(line) { | |
} else if directive_end.is_match(line) { |
} | ||
|
||
filter = Some(dialect == directive); | ||
output.write_all(line.as_bytes())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the directives out of the generated file where they have already been applied.
output.write_all(line.as_bytes())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for two reasons:
- Since I process a full line at a time there's a chance that a comment / directive follows valid TSG code:
(function) { ...} ; #dialect tsx
always writing out the line ensures that the valid code is not lost.
- When looking at the generated code, I found it convenient to still see the directives as they highlight dialect specific sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that in the generated code the the directives are already processed, and them not being there makes that obvious.
I see your point about the comment coming after some other code. (And this is always the trouble with text-based preprocessing instead of something properly supported in the language---but that's the cost of going for an expedient solution here ;).) My expectation with markers in pairs like this is that they operate on blocks of lines. A tricky example with keeping the lines would be:
; #dialect tsx
(function) {
...
} ; #end
If this is processed with the current algorithm, it would leave an umatched }
I think. I think treating them as operating on lines simplifies things. If you're worried that people may mistakes, you can add an extra check that the line before the matched directive matches [\s;]+
, or throw an error that a directive cannot be on the same line as code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're correct that I accounted for one edge case but not the mirror image of it. I'll add a check to ensure that directives are not mixed with code. And then I won't output the directives.
} | ||
|
||
filter = None; | ||
output.write_all(line.as_bytes())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem:
output.write_all(line.as_bytes())?; |
@@ -17,9 +17,10 @@ pub mod tsconfig; | |||
pub mod util; | |||
|
|||
/// The stacks graphs tsg path for this language. | |||
pub const STACK_GRAPHS_TSG_PATH: &str = "src/stack-graphs.tsg"; | |||
pub const STACK_GRAPHS_TSG_PATH: &str = "./stack-graphs.tsg"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub const STACK_GRAPHS_TSG_PATH: &str = "./stack-graphs.tsg"; | |
pub const STACK_GRAPHS_TSG_PATH: &str = "src/stack-graphs.tsg"; |
This should be relative to the crate root for VSCode to handle it correctly when running cargo test
e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
pub const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg")); | ||
pub const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg")); | |
pub const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg")); | |
const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg")); | |
const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg")); |
No need to expose the generated file paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem but it is currently public: https://github.com/github/stack-graphs/blob/main/languages/tree-sitter-stack-graphs-typescript/rust/lib.rs#L22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're completely right. I thought this was exporting the path, but it's exporting the source. In that case, yes, keep it public!
- Also fixes error msgs to use 1-based line nums
@hendrikvanantwerpen if you want, I can squash the commits. The first one has that dependency on "askama" so not great if you ever run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
If you can share, I'd love to hear what you are using stack graps for. We're always curious where this gets applied :)
We're interested in analyzing JS/TS apps in their entirety. As such, a single real world app typically has a good mix of all 4: JS/JSX/TS/TSX. |
Support TSX dialect of the TypeScript.
Because the TSX is almost a superset of TypeScript, it makes sense to reuse the .tsg file. However it
requires a bit of preprocessing to handle the differences.
I'm posting this PR as a straw-man to solicit input on how to best proceed. The tree-sitter has TSX support as a dialect of typescript. While the two dialects are very similar, TSX does not support type assertions in the form of:
Therefore the .tsg file fails to run for TSX as it has a
(type_assertion)
tree-sitter query.In this patch I went with using "askama" templating engine to strip out the offending stanza before feeding it into the TSG parser. The biggest issue with this approach is that it messes up the line numbers, making debugging difficult. Other alternatives include:
Curious to hear what others think would be the best approach.