Corpus form: inconsistency between CSV sniffer and CSV reader

When you upload a CSV file in the corpus form, Textcavator will extract column info from the file. The code for this uses the `pandas` library to parse the file, while the `CSVReader` is based on the `csv` base library. The CSVReader is responsible for extracting the content, so it can happen that a file appears to be parsed as intended in step 2 of the form (data upload), but isn't parsed correctly in step 4 (indexing).

This also relates to #1998: if you show a data preview based on the pandas output, it may not match the output in step 4.

Suggested solution: rewrite `backend/addcorpus/json_corpora/csv_field_info.py` to use the `csv` library instead of `pandas`.

Alternative solutions:
- Try to find a configuration for pandas and/or csv so the output in these steps is always consistent. Precarious.
- Rewrite the `CSVReader` in `ianalyzer_readers` so it's based on `pandas` instead of `csv`. Not preferred because that class is already, like, mission-critical, and using pandas would not improve it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corpus form: inconsistency between CSV sniffer and CSV reader #2003

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Corpus form: inconsistency between CSV sniffer and CSV reader #2003

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions