-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: read more rows to infer column type #56
Comments
Please, this is also my issue. Can we get a bit more extended feedback here= Btw, as far as I can see, if a blank cell appears in a column and the other cells are int (for example) the inferred type is obj while it should be an int option (in this example)... I share here my workaround based on
as well as
Usage examples
|
@giuliohome when I looked, the number of lines was easy to find in the code. IIRC it was something like 25 or 50. |
Hi @jackfoxy |
@giuliohome I recall other typing issues. You probably found one I experienced. I filed 3 issues, all on the same day, which addressed the most egregious problems I was having. I think it was not long after this I dismantled all the Excel processing and banned Excel file submission in our company. That's how I solved the problem. |
@jackfoxy thanks for sharing your experience and feel free to unsubscribe from the issue. Really, I was writing to the maintainers of this project. |
Description
I don't know how many rows the TP reads before attempting to infer column types, but it is not enough. Type inference is frequently incorrect because the TP did not recognize (for instance) that a blank cell appears in the column, which should render it an option type, or that a decimal number appears further down a column, and so the type should not be int.
Conduct some experiments to come up with a greater number of rows consumed to do type inference, but not so many it impacts performance too much. More rows is better.
Known workarounds
ForceString - not a nice alternative
The text was updated successfully, but these errors were encountered: