Skip to content
ortnever edited this page Mar 1, 2021 · 40 revisions

Tutorial

Check validity of your CSV with: https://zazuko.com/csv-validate/

Basic Requirements

  • All CSVs must be in UTF8 encoding (otherwise special characters woll look weird). We recommend Notepad on Windows to detect and change encoding[^footnote-utf8].
  • Make sure the CSVs are in correct format (or use Data Type string which is also the default)
    • Date: Data must be in xs:date format (example: 2001-01-31)
    • DateTime: Data must be in xs:datetime format (example: 2001-01-31T17:30:00)
    • String: If the string contains the seperator character or quotes then it must be enclosed in quotes ("Hans, Heiri"). Quotes within a string must be doubled ("Hans ""Johnny"" Müller")
    • Float, Decimal: Decimal character is ., thousands separators are not supported.
  • CSV formal integrity can be checked by https://zazuko.com/csv-validate/ (which has quite some problems with different separator characters though)
    • tbd
      • Excel Export Details?
      • Coordinates (need to be WGS84)

Exporting from Excel

  • Excel uses your system defaults for CSV export and there is no way you could specify the separator character during export[^footnote-excel-delimiter]
    • While Cube Creator can basically handle different field delimiters the "Replace CSV" feature can't. So you might make sure you've got the right one from the beginning.
  • Make sure your dates and times are formatted correctly as the column format fill be used for export
    • Date format is yyyy-mm-dd
    • Time format is hh:mm:dd ()
    • DateTime format is yyyy-mm-ddThh:mm:ss[^footnote-excel-mm]

Missing Values

* are in general to be avoided (identifiers are a must, cross check with danis tutorial)
* need to be an empty field (no special string or similar allowed)
* non symmetric (e.g. missing values starting from 2010 for only one station)

Multilinguality

* if a dimension is nominal (names) the 4 languages need to be provided in the csv if necessary to show them correctly.
* metadata can be added in all languages in the tool

Excel-Anleitung_Umwandlung_einer_Kreuztabelle_in_eine_flache_Tabelle.docx

Replacing a CSV with a new one

Any CSV can be replaced by a new one, if the following conditions are met

  • Separator Character remains the same
  • All columns from the old CSV are still present (case sensitive)

In other words, the following changes are allowed

  • Additional columns
  • Different order of columns
  • Different row content

Open questions

Fabian

  • What are the constraints about columns names (first line of the .csv): space allowed (it seems not - there were problems when creating links between tables) ? accents allowed (French "é" /german "ö") ?
  • Clarify the maximum .csv size, on each environment (int/prod), and what is the way to handle bigger files (we already have bigger files from Thomas Bettler: 70 MB for "Jahrbuch Wald & Holz 2020", 240 MB for "Forestry_E_All_Data_(Normalized).csv")

Veronique :

  • you should mention that if format requirements are not met, the CSV will still be imported . However, the data type cannot be specified correctly during the mapping. Could you explain the consequences if the data type can't be specified. Consequences on the use of the data by Visualize?
  • Boolean data format must be true/false?
  • Missing values : are to be avoid but...it's very common to have missing values in our data. It is therefore important to explain here what happens at the cube level and/or in Visualize in the case of missing values. Should we use the "Default value" field (available under "Edit column mapping") in case of missing values? If yes, please explain how.
  • non symmetric (e.g. missing values starting from 2010 for only one station) : Could you explain how to deal with this case.

Footnotes

[^footnote-utf8]: The easiest way to tell if it's UTF8 is to open the file in notepad.exe and click "Save as". Whatever encoding is proposed is the actual encoding of your CSV. [^footnote-excel-delimiter]: How to change the delimiter (on Windows) is described here: https://www.koskila.net/how-to-change-the-delimiter-when-exporting-csv-from-excel/ [^footnote-excel-mm]: Don't ask how Excel tells the difference between "mm" and "mm"

Clone this wiki locally