You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In cc7fdb2, I added a stopgap measure for interpreting dates in files being imported. As of that commit, the code checks known date columns for any instance of / and, if one is found, assumes that the file has dates in standard Excel format of mm/dd/yyyy. If one is not found, it assumes that the dates in the file are already correctly formatted as YYYY-MM-DD. Naturally, those are not the only two options!
The correct way to do it, as I discussed with @kfogel, is to check each date to see if it's obvious what the format is. If it is, then record that. If, after checking the dates, all of them are in the same format, then change them all to YYYY-MM-DD (or leave them as they are, if they already are in that format). If, on the other hand, the dates are inconsistently formatted, then reformat each one according to our best guess. If some are still ambiguous (e.g. 01/01/2015 could be mm/dd/yyyy or dd/mm/yyyy), then assume mm/dd/yyyy, since that is the Excel standard.
The text was updated successfully, but these errors were encountered:
I think that's a shorthand version of the real algorithm? Let's spell it out:
For incoming Excel files, we need to unambiguously determine the actual date represented by each date string in a date column. (Internally, we'll store dates in YYYY-MM-DD format, but that's always been the case. IOW this issue isn't about reformatting, it's just about interpreting -- we're not changing how the dates are represented in the Excel file, we're just deciding how we determine what those date strings mean.)
Loop over all the dates in the Excel file. For each date, if it is unambiguous, then obviously interpret it as that date and make a note that one more date of that format has been seen (whatever it was, YYYY/MM/DD or M/D/YYYY or D/M/YY or whatever, etc). If the date is ambiguous, then hold on to it for now -- we'll fix it later.
Once the initial loop is done, we have the following things: a bunch of dates whose meaning is known exactly (the unambiguous ones), a bunch of dates whose meaning is not yet known (the ambiguous ones), and a list of date formats we have seen (the ones signaled from the unambiguous dates) with a count of how often each format was seen.
If that list of seen formats associated with their counts is zero elements long, then assume M/D/YYYY, because that's the excel standard, and handle M/D/YY in the obvious way, of course. (By the way I believe Excel does not pad with leading zeros for M and D when they are a single digit, FWIW.)
If that list of seen formats is exactly one element long, then we know what format to use for interpreting the ambiguous cases, so use it :-).
If the list of seen formats is two or more elements long, then if any of those formats are M/D/YYYY (in which I also include M/D/YY, as per above), use that, because it seems to be the Excel default. If none of them are, then just pick any seen format that is not less popular than any of other seen formats, and assume that format for the ambiguous dates. This last case is highly unlikely ever to happen, but if it does, we can handle it.
In cc7fdb2, I added a stopgap measure for interpreting dates in files being imported. As of that commit, the code checks known date columns for any instance of
/
and, if one is found, assumes that the file has dates in standard Excel format of mm/dd/yyyy. If one is not found, it assumes that the dates in the file are already correctly formatted as YYYY-MM-DD. Naturally, those are not the only two options!The correct way to do it, as I discussed with @kfogel, is to check each date to see if it's obvious what the format is. If it is, then record that. If, after checking the dates, all of them are in the same format, then change them all to YYYY-MM-DD (or leave them as they are, if they already are in that format). If, on the other hand, the dates are inconsistently formatted, then reformat each one according to our best guess. If some are still ambiguous (e.g.
01/01/2015
could be mm/dd/yyyy or dd/mm/yyyy), then assume mm/dd/yyyy, since that is the Excel standard.The text was updated successfully, but these errors were encountered: