Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse function breaks when there’s a line ending in a string #100

Open
ariasuni opened this issue Mar 20, 2018 · 5 comments
Open

Parse function breaks when there’s a line ending in a string #100

ariasuni opened this issue Mar 20, 2018 · 5 comments

Comments

@ariasuni
Copy link

In: tree = parser.parse('var i = "test\nvalue"')
Illegal character '"' at 1:8 after LexToken(EQ,'=',1,6)
Illegal character '"' at 1:19 after LexToken(ID,'value',1,14)

In: tree.to_ecma()
'var i = test;\nvalue;'

The behavior is the same with \r.

@metatoaster
Copy link

metatoaster commented Apr 18, 2018

No, never mind, if an actual newline character occur inside a string token and actual new line, Node.js doesn't even like it either.

$ cat | node
var i = "test
value"
[stdin]:1
var i = "test
        ^^^^^

SyntaxError: Invalid or unexpected token

ES5 (which is what slimit supports) doesn't have multiline strings like Python does, so fortunately for the parsers, this is a a valid syntax error in the provided ES5 script which the parser correctly provided.

However, if you meant to an escaped sequence representing the newline, this will then work (note the raw string prefix r):

>>> from slimit.parser import Parser
>>> print(Parser().parse(r'var i = "test\nvalue"').to_ecma())
var i = "test\nvalue";

@ariasuni
Copy link
Author

Well I had this problem when trying to scrape information out of a working JavaScript code on a high-traffic website.

@metatoaster
Copy link

Can you please provide the link to the example that choked?

@metatoaster
Copy link

metatoaster commented Apr 19, 2018

Anyway, I do see what you mean - I had mistakenly used my patched version of slimit that correctly reported that as a parsing error. Anyway, the correct behavior with that input should throw a SyntaxError exception, which my patched version (and calmjs.parse) does. The definition in the ECMA-262 specification that states this as an invalid syntax is defined in section 7.8.4 (specifically "A line terminator character cannot appear in a string literal" at the bottom of that section, where a "line terminator" includes newline characters)

To make things most clear, this is the input JavaScript with the invalid syntax:

var i = "test
value"

Assume that input is assigned to program in the following Python code:

>>> from slimit.parser import Parser
>>> parser = Parser()
>>> node = parser.parse(program)
Illegal character '"' at 1:8 after LexToken(EQ,'=',1,6)
Illegal character '"' at 1:19 after LexToken(ID,'value',1,14)
>>> print(node.to_ecma())
var i = test;
value;

This changed the program entirely, as slimit erroneously fully parsed the input without raising an error and produced an incorrect AST, and this is where my initial confusion lied (when I saw the output which I then used as input, then I noticed the quotes on the original input). The correct behavior is implemented in calmjs.parse, which correctly process this as a syntax error:

>>> from calmjs.parse import es5
>>> es5(program)
Traceback (most recent call last):
...
calmjs.parse.exceptions.ECMASyntaxError: Illegal character '"' at 1:9 after '=' at 1:7

@ariasuni
Copy link
Author

Well, I’m probably mistaken: this should have been a non-working JavaScript extract among the working ones, because I have the same kind of error with newlines inside strings in my web browser’s console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants