Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) #24

emilBeBri · 2020-04-22T19:18:44Z

Hi - looking at the two other bug reports, no one seems to be maintaining the package anymore, but anyway, might be helpful for others so here goes:

there is an issue with unique identifiers on for example the 'FV19TOTA' table. like so:

# libs
library(dkstat)
library(data.table)


# table
f1 <- setDT(dst_get_data(table = 'FV19TOTA', VALRES='*', OMRÅDE='*', Tid='*', lang='da', format='CSV'))

# not unique
uniqueN(f1) == nrow(f1)
nrow(f1) - uniqueN(f1) # 24.707 ikke-unikke rækker... ret mange

f1_dups <- f1[duplicated(f1)]

I wrote to DST, thinking it was a problem with the data. They wrote to me, in Danish:

Hej Emil

Dubletterne skyldes at data hentes ud uden ”koder”, som der altid er til de teksterne (for fx områderne), og der er flere valgsteder med samme navn, så hvis man kun ser på tekster, så vil der være dubletter. Der er fx flere ”Assens” og flere ”Bedsted” osv. men hvis man tager data ud med ”Kode og tekst”, så vil det være forskellige rækker.

Dette svarer måske også dels på dit spørgsmål vedr. ”niveauer” for område (valgsteder), da koden indikerer niveauet. Det kan være det er dokumenteret yderligere i den tilhørende ”Statistikdokumentation” (jeg har dog ikke tjekket det).

Danmarks Statistik er ikke involveret i R-pakken du bruger, så vi kan ikke umiddelbart være behjælpelige med denne, men måske er der en parameter ang. ”Koder/tekster”?

So the problem seems to be that the only the text, but not the numbercode for the rows are getting parsed at the lowest level. They do at the higher levels though.

This also happens with the statsDK-package, link here

The text was updated successfully, but these errors were encountered:

Fixes #24

aleksanderbl29 · 2024-11-28T13:04:11Z

Hi @emilBeBri
Thank you for raising this issue.

I have implemented changes that should change the behaviour in cases such as FV22TOTA.

The new version of the package with these fixes should be on r-universe later today. This issue will be automatically closed by the PR.

Fix #24

emilBeBri mentioned this issue Apr 22, 2020

Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) mikkelkrogsholm/statsDK#3

Open

aleksanderbl29 mentioned this issue Nov 28, 2024

Improve error handling #39

Closed

aleksanderbl29 added a commit that referenced this issue Nov 28, 2024

Use CodeAndValue when overlaps are detected in any columns

ccd6d16

Fixes #24

aleksanderbl29 linked a pull request Nov 28, 2024 that will close this issue

Fix #24 #40

Merged

aleksanderbl29 closed this as completed in #40 Nov 28, 2024

aleksanderbl29 added a commit that referenced this issue Nov 28, 2024

Merge pull request #40 from rOpenGov/24-FV22TOTA

7570d7d

Fix #24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) #24

Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) #24

emilBeBri commented Apr 22, 2020

aleksanderbl29 commented Nov 28, 2024 •

edited

Loading

Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) #24

Only the text and not the numerical code gets parsed with the FV19TOTA table (and maybe others, too) #24

Comments

emilBeBri commented Apr 22, 2020

aleksanderbl29 commented Nov 28, 2024 • edited Loading

aleksanderbl29 commented Nov 28, 2024 •

edited

Loading