-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
When you upload a CSV file in the corpus form, Textcavator will extract column info from the file. The code for this uses the pandas library to parse the file, while the CSVReader is based on the csv base library. The CSVReader is responsible for extracting the content, so it can happen that a file appears to be parsed as intended in step 2 of the form (data upload), but isn't parsed correctly in step 4 (indexing).
This also relates to #1998: if you show a data preview based on the pandas output, it may not match the output in step 4.
Suggested solution: rewrite backend/addcorpus/json_corpora/csv_field_info.py to use the csv library instead of pandas.
Alternative solutions:
- Try to find a configuration for pandas and/or csv so the output in these steps is always consistent. Precarious.
- Rewrite the
CSVReaderinianalyzer_readersso it's based onpandasinstead ofcsv. Not preferred because that class is already, like, mission-critical, and using pandas would not improve it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels