dataset: add RaCCooNS by Frank and Aumeistere #961

SiQube · 2025-02-20T13:23:17Z

resolves #954

requires #989

codecov · 2025-02-20T13:27:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (af69b1c) to head (155b572).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #961   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           87        88    +1     
  Lines         3720      3742   +22     
  Branches       638       638           
=========================================
+ Hits          3720      3742   +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SiQube · 2025-02-21T13:58:30Z

there is additional raw gaze data. unfortunately, the files are ascii and we currently only support csv, tsv and feather data, see #962

dkrako · 2025-02-25T19:01:16Z

Isn't this just documentation related? ToyDatasetEyeLink contains only ascii data and you should be able to load it without issues.

SiQube · 2025-02-25T19:09:40Z

yes, but we have to have the right parsing criterions. I'd want to do this in a seperate PR and reopen an issue. maybe someone from the original authors can help?

dkrako · 2025-02-25T19:22:21Z

Ah I see, it's not the EyeLink format then? can you provide some example lines?

SiQube · 2025-02-27T14:56:27Z

the files are actually eyelink ascii files. however I am not entirely sure how trials are split and I don't want to propagate mistakes. maybe we can setup a meeting with Stefan Frank where we can discuss how the trials were split? CC @izaskr

izaskr · 2025-02-27T15:03:46Z

Alright, let me check. Will get back to you with this.

SiQube · 2025-03-03T08:20:43Z

from @izaskr:

There's a variable TRIAL_INDEX that is increased by 1 at the end of each trial.
If you look for this in the text files you find the separations between consecutive trial.
I hope that answers David's question!

I'll implement it and then we can merge the dataset to pymovements.

saeub · 2025-03-05T09:51:57Z

@SiQube I'll take care of the ASCII parsing (hopefully today)

saeub · 2025-03-05T19:31:27Z

I see three problems:

Participant IDs in TSV files are numbers, but not all participant IDs in ASC filenames are in a number format (e.g. 001_2).
Trial IDs in the ASC files are offset by 1 compared to the TSV files, meaning the user won't be able to join the event and gaze frames.
Trial variable messages (!V TRIAL_VAR) are at the end of the trial, meaning we can't add info like stimulus ID to the gaze dataframe.

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

For 3., I created an issue #990.

SiQube · 2025-03-05T21:27:15Z

maybe we can discuss with Frank and @izaskr ?

dkrako · 2025-03-06T13:15:22Z

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

I have the feeling that we will probably need some custom loading functions for specific datasets that we want to add in the future. This way we can stay flexible for cases like this where we need to postprocess data.

This is a bit of a bummer but we really can't avoid that as datasets vary in their data standards.
We can still keep DatasetDefinition classes for such cases. There we can add a custom_post_processing() method or so.
This way #914 wouldn't be blocked.

Also #352 (adding mat-file support) could benefit from custom pre/post processing functions.

saeub · 2025-03-06T13:26:33Z

@SiQube Should we wait with merging this until we have a solution for parsing trial variables (#990) and custom postprocessing functions (#961 (comment))? Or do you want to merge this now and improve it later?

(I think the dataset is perfectly usable as it is now, just some of the information about the stimulus is missing, and the gaze and precomputed event frames don't match up.)

SiQube · 2025-03-06T13:29:07Z

I don't mind merging it asap and fix it later => we can move one of the comments to a new issue. using pymovements for this dataset is still valuable (I think) since downloading and preprocessing works (for most of the data)

dkrako · 2025-03-06T13:29:18Z

I'm in favor of merging this PR without the additional trial infos and improve on this later on when we have solved the underlying issues.

Also, adding a custom_post_processing() to a DatasetDefinition wouldn't help much when a user simply uses gaze.from_asc() to load a single file. #997 could help with this though.

dkrako · 2025-03-06T13:41:10Z

We should probably also mention these issues in the docstring of the dataset.

dkrako

Alright, lets merge this.

I just have two comments to make the classes clickable in the documentation (following #986).

@SiQube you can resolve the comments without changes in code if you like, as these probably aren't relevant anymore after merging #914

src/pymovements/datasets/raccoons.py

docs/source/bibliography.bib

dkrako

can you add a yaml definition?

SiQube · 2025-03-30T09:16:58Z

@izaskr can you check with stefan why the dataset is down?
@izaskr in the paper its written 1920x1018 pixel screen resolution, I assume its a type and it should be 1920x1080

add to public_datasets.yaml

saeub · 2025-03-30T10:16:26Z

src/pymovements/datasets/raccoons.py

+        },
+    )
+
+    trial_columns: list[str] = field(default_factory=lambda: [])


@SiQube I just noticed that I forgot to add trial_index0 to the trial columns here. Although I'm not sure we should add it actually, because it only exists in the gaze dataframe, but not in the event dataframe?

Maybe trial_columns="trial_index0" should only be added to the custom_read_kwargs for gaze?

saeub · 2025-03-30T10:30:53Z

Also, the parser is getting conflicting information regarding the eye tracker version:

The ASC files say it's EYELINK II CL v4.594 Jul 6 2012, which is translated to EyeLink 1000 in our parsing logic:

pymovements/src/pymovements/gaze/_utils/parsing.py

Lines 508 to 509 in 913c372

elif float(version_number) < 5:

model = 'EyeLink 1000'
The experiment metadata (and the paper) says it should be EyeLink 1000 Plus.

Is anyone familiar with these version mappings? (maybe @theDebbister, since she wrote the metadata parsing code?)

theDebbister · 2025-03-31T13:14:57Z

Regarding the version numbers, they are explained here at the bottom of the answer thread: https://www.sr-research.com/support/thread-8853.html

Which would basically mean that the metadata is wrong? Because the version is 4.x right? I will quickly ask Stefan to be sure.

Update: Stefan will look into it

SiQube · 2025-04-07T14:05:00Z

@izaskr @theDebbister any update?

theDebbister · 2025-04-08T06:50:40Z

Yes, Stefan said it is possible that the paper is wrong and they used the EyeLink 1000 instead of 1000 Plus. It seems that our device detection is still correct.

add functional test files

dataset: add RaCCooNS by Frank and Aumeistere

248ffc1

SiQube requested review from dkrako and prassepaul as code owners February 20, 2025 13:23

github-actions bot added the dataset label Feb 20, 2025

SiQube added 2 commits February 20, 2025 14:30

update docs

79bce43

add gaze data

8758e69

SiQube enabled auto-merge (squash) February 20, 2025 14:07

Merge branch 'main' into RaCCooNS

7d47393

Fix encoding for precomputed events and reading measures

fe89ad0

saeub mentioned this pull request Mar 5, 2025

Add encoding argument to from_asc() #989

Merged

14 tasks

saeub added 2 commits March 5, 2025 21:18

Parse gaze files

e9acc5c

Merge remote-tracking branch 'upstream/main' into RaCCooNS

2b10779

saeub disabled auto-merge March 5, 2025 20:26

saeub marked this pull request as draft March 5, 2025 20:27

Merge branch 'main' into RaCCooNS

e36a568

saeub marked this pull request as ready for review March 6, 2025 13:26

Merge branch 'main' into RaCCooNS

c6de72c

dkrako mentioned this pull request Mar 6, 2025

support custom pre or post load functions for specific datasets #998

Open

dkrako approved these changes Mar 6, 2025

View reviewed changes

src/pymovements/datasets/raccoons.py Outdated Show resolved Hide resolved

src/pymovements/datasets/raccoons.py Outdated Show resolved Hide resolved

dkrako enabled auto-merge (squash) March 6, 2025 14:31

dkrako reviewed Mar 6, 2025

View reviewed changes

docs/source/bibliography.bib Outdated Show resolved Hide resolved

dkrako requested changes Mar 17, 2025

View reviewed changes

dkrako marked this pull request as draft March 26, 2025 13:35

auto-merge was automatically disabled March 26, 2025 13:35
Pull request was converted to draft

SiQube added 2 commits March 30, 2025 08:38

Merge remote-tracking branch 'origin/main' into RaCCooNS

c17b73d

wait for raccoons to e back online again

8d222d7

add experiment

de897f3

add to public_datasets.yaml

saeub reviewed Mar 30, 2025

View reviewed changes

saeub mentioned this pull request Apr 9, 2025

feat: use locale encoding by default in from_asc() #1084

Merged

14 tasks

SiQube added 2 commits April 17, 2025 02:38

hotfix: citation key

2917bc6

fix eyetracker model

c1b8a8d

add functional test files

SiQube force-pushed the RaCCooNS branch from 850078d to c1b8a8d Compare April 17, 2025 00:56

SiQube marked this pull request as ready for review April 17, 2025 00:56

Merge branch 'main' into RaCCooNS

155b572

SiQube enabled auto-merge (squash) April 17, 2025 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset: add RaCCooNS by Frank and Aumeistere #961

dataset: add RaCCooNS by Frank and Aumeistere #961

SiQube commented Feb 20, 2025 •

edited by saeub

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading

SiQube commented Feb 21, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 25, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 27, 2025

izaskr commented Feb 27, 2025

SiQube commented Mar 3, 2025

saeub commented Mar 5, 2025

saeub commented Mar 5, 2025 •

edited

Loading

SiQube commented Mar 5, 2025

dkrako commented Mar 6, 2025 •

edited

Loading

saeub commented Mar 6, 2025

SiQube commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako left a comment

dkrako left a comment

SiQube commented Mar 30, 2025 •

edited

Loading

saeub Mar 30, 2025

saeub Mar 30, 2025 •

edited

Loading

saeub commented Mar 30, 2025

theDebbister commented Mar 31, 2025 •

edited

Loading

SiQube commented Apr 7, 2025

theDebbister commented Apr 8, 2025

dataset: add RaCCooNS by Frank and Aumeistere #961

Are you sure you want to change the base?

dataset: add RaCCooNS by Frank and Aumeistere #961

Conversation

SiQube commented Feb 20, 2025 • edited by saeub Loading

codecov bot commented Feb 20, 2025 • edited Loading

Codecov Report

SiQube commented Feb 21, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 25, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 27, 2025

izaskr commented Feb 27, 2025

SiQube commented Mar 3, 2025

saeub commented Mar 5, 2025

saeub commented Mar 5, 2025 • edited Loading

SiQube commented Mar 5, 2025

dkrako commented Mar 6, 2025 • edited Loading

saeub commented Mar 6, 2025

SiQube commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako left a comment

Choose a reason for hiding this comment

dkrako left a comment

Choose a reason for hiding this comment

SiQube commented Mar 30, 2025 • edited Loading

saeub Mar 30, 2025

Choose a reason for hiding this comment

saeub Mar 30, 2025 • edited Loading

Choose a reason for hiding this comment

saeub commented Mar 30, 2025

theDebbister commented Mar 31, 2025 • edited Loading

SiQube commented Apr 7, 2025

theDebbister commented Apr 8, 2025

SiQube commented Feb 20, 2025 •

edited by saeub

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading

saeub commented Mar 5, 2025 •

edited

Loading

dkrako commented Mar 6, 2025 •

edited

Loading

SiQube commented Mar 30, 2025 •

edited

Loading

saeub Mar 30, 2025 •

edited

Loading

theDebbister commented Mar 31, 2025 •

edited

Loading