Merge pull request #105 from planetarypy/104-end-statement

Remove request of next token after end-statement
planetarypy · Aug 11, 2022 · 6b872ff · 6b872ff
2 parents 7821f85 + 8f34f0d
commit 6b872ff
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 11 deletions.
diff --git a/.github/workflows/python-test.yml b/.github/workflows/python-test.yml
@@ -12,7 +12,7 @@ jobs:
     strategy:
       matrix:
         os: [ubuntu-latest, macos-latest]
-        python-version: [3.6, 3.7, 3.8, 3.9]
+        python-version: ['3.6', '3.7', '3.8', '3.9', '3.10']
         # install-target: ['.', '.[allopts]']
 
     steps:

diff --git a/HISTORY.rst b/HISTORY.rst
@@ -30,6 +30,18 @@ and the release date, in year-month-day format (see examples below).
 Not Yet Released
 ----------------
 
+Fixed
++++++
+* The parser was requesting the next token after an end-statement, even
+  though nothing was done with this token (in the future it could
+  be a comment that should be processed).  In the very rare case
+  where all of the "data" bytes in a file with an attached PVL label
+  (like a .IMG or .cub file) actually convert to UTF with no
+  whitespace characters, that next token will take an unacceptable
+  amount of time to return, if it does at all.  The parser now does
+  not request additional tokens once an end-statement is identified
+  (Issue 104).
+
 
 1.3.1 (2022-02-05)
 ------------------

diff --git a/pvl/parser.py b/pvl/parser.py
@@ -496,16 +496,29 @@ def parse_end_statement(self, tokens: abc.Generator) -> None:
                     f'"{end}"'
                 )
 
-            try:
-                t = next(tokens)
-                if t.is_WSC():
-                    # maybe process comment
-                    return
-                else:
-                    tokens.send(t)
-                    return
-            except LexerError:
-                pass
+            # The following commented code was originally put in place to deal
+            # with the possible future situation of being able to process
+            # the possible comment after an end-statement.
+            # In practice, an edge case was discovered (Issue 104) where "data"
+            # after an END statement *all* properly converted to UTF with no
+            # whitespace characters. So this request for the next token
+            # resulted in lexing more than 100 million "valid characters"
+            # and did not return in a prompt manner.  If we ever enable
+            # processing of comments, we'll have to figure out how to handle
+            # this case.  An alternate to removing this code is to leave it
+            # but put in a limit on the size that a lexeme can grow to,
+            # but that implies an additional if-statement for each character.
+            # This is the better solution for now.
+            # try:
+            #     t = next(tokens)
+            #     if t.is_WSC():
+            #         # maybe process comment
+            #         return
+            #     else:
+            #         tokens.send(t)
+            #         return
+            # except LexerError:
+            #     pass
         except StopIteration:
             pass