Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of number parsing #11228

Open
radeusgd opened this issue Oct 1, 2024 · 5 comments
Open

Improve performance of number parsing #11228

radeusgd opened this issue Oct 1, 2024 · 5 comments
Assignees
Labels
--low-performance -libs Libraries: New libraries to be implemented

Comments

@radeusgd
Copy link
Member

radeusgd commented Oct 1, 2024

Currently the NumberParser relies on a really complicated Regex that is not as efficient as we want.

We want to modify it to use a custom solution that should be more efficient.

TODO:

  • what are the goals exactly?
  • do we have benchmarks that measure the performance, so that we can compare? Parsing a column of Text values to Integer? CSV reading?
@radeusgd radeusgd self-assigned this Oct 1, 2024
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Oct 1, 2024
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented --low-performance labels Oct 1, 2024
@jdunkerley
Copy link
Member

Better integration of problems with the new Parser:

  • Propagate the message from NumberParseFailure in some way.
  • Get more friendly error reporting it would be great to save somewhere (perhaps in a SeparatorParseResult) what character was encountered to cause the 'invalid separators'.

@jdunkerley
Copy link
Member

Exponential Notation:
1E+3
1E ==> 1 with symbol E
1,000,000.456E6
Rule was (0<=x<10)
1,000,000.456E6 ==> 1,000,000.456 and stop at E
23E6 => 23000000
Allow in exponent mode a single decimal point separator and allow a wider range than before (i.e. 0 <= x < 1000).

Merged version will be just enough to get it working again not deeply tested.

@jdunkerley
Copy link
Member

Add a benchmark for parsing a single column and parsing 300 columns.

@jdunkerley
Copy link
Member

ToDos:

  • Redundant separator spacing check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--low-performance -libs Libraries: New libraries to be implemented
Projects
Status: 📤 Backlog
Development

No branches or pull requests

2 participants