Events.beta.csv format is imperfect #506

atruskie · 2021-07-15T04:28:26Z

Actual behaviour:

The new CSV output format has some problems:

event duration is not included
event frequency bands are not included
sometimes profiles are not included (like for the boobook)
FileName is the segmented file name
ResultStartSeconds and EventStartSeconds are duplicates
SegmentStartSeconds and SegmentDurationSeconds are verbose and duplicate ResultMinute
All floating point numbers should be truncated but aren't

Expected behavior:

The above not to happen.

How to reproduce this bug:

Run a multi recogniser, investigate the results

Additional Details

AP

Version: v21.7.0.4

Some example data: all.txt

towsey · 2021-07-16T01:18:29Z

Fantastic that you are dealing with this. I have been meaning to log it myself as an issue. Is it possible for events where appropriate to include additional info? For example, for oscillation events to include the oscillation rate and for harmonics to include the interval. And also to include score where one is available.

atruskie · 2021-07-16T01:58:34Z

Great question.

In short: not really.

CSV is great when all the events have the same shape/type of data. The reason for most of the above issues is we output the results based on the base class, which is EventCommon I think, which lacks the end/low/high properties.

I think, given our nature, it's safe to try and output those extra columns. But if any of that data is missing, we'll get a lot of sparse columns.

But for even more specific events, then we'll definitely end up with a lot of sparse columns. For a recogniser that produces oscillation events, most rows would have the oscillation column filled. But for a multi-recogniser case, most rows would have an empty oscillation column.

To achieve the flexibility we want here, we need to be able to encode arbitrary data structures, which is what the JSON output is for. Each object inside a JSON result can have whatever properties we'd like it to have.

Both of these formats are inefficient for their own reasons, and have strengths over the other.

I think I want to make the CSV useful and dense by default for the common case. And leave the JSON for outputting complex data.

towsey · 2021-07-27T03:58:52Z

The additional info I would like to add is not complex - i,e, just scalars. It could be done by adding another one or two properties to EventCommon called Score1 and Score2 that would be in addition to the existing Score property. You could then add an event property such as periodicity by assigning it to one of the score fields. The documentation would describe what information was provided in each of the score fields. Trouble is that if we wait for json parsers etc, it will be long time and more difficult for the user.

atruskie · 2021-11-14T22:42:00Z

Assigning data to columns with generic names is not something we will do. Descriptive names are vital to people understanding what data they're looking at.

atruskie added the enhancement label Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events.beta.csv format is imperfect #506

Events.beta.csv format is imperfect #506

atruskie commented Jul 15, 2021

towsey commented Jul 16, 2021

atruskie commented Jul 16, 2021

towsey commented Jul 27, 2021

atruskie commented Nov 14, 2021

Events.beta.csv format is imperfect #506

Events.beta.csv format is imperfect #506

Comments

atruskie commented Jul 15, 2021

Actual behaviour:

Expected behavior:

How to reproduce this bug:

Additional Details

towsey commented Jul 16, 2021

atruskie commented Jul 16, 2021

towsey commented Jul 27, 2021

atruskie commented Nov 14, 2021