You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi - looking at the two other bug reports, no one seems to be maintaining the package anymore, but anyway, might be helpful for others so here goes:
there is an issue with unique identifiers on for example the 'FV19TOTA' table. like so:
# libs
library(dkstat)
library(data.table)
# table
f1 <- setDT(dst_get_data(table = 'FV19TOTA', VALRES='*', OMRÅDE='*', Tid='*', lang='da', format='CSV'))
# not unique
uniqueN(f1) == nrow(f1)
nrow(f1) - uniqueN(f1) # 24.707 ikke-unikke rækker... ret mange
f1_dups <- f1[duplicated(f1)]
I wrote to DST, thinking it was a problem with the data. They wrote to me, in Danish:
Hej Emil
Dubletterne skyldes at data hentes ud uden ”koder”, som der altid er til de teksterne (for fx områderne), og der er flere valgsteder med samme navn, så hvis man kun ser på tekster, så vil der være dubletter. Der er fx flere ”Assens” og flere ”Bedsted” osv. men hvis man tager data ud med ”Kode og tekst”, så vil det være forskellige rækker.
Dette svarer måske også dels på dit spørgsmål vedr. ”niveauer” for område (valgsteder), da koden indikerer niveauet. Det kan være det er dokumenteret yderligere i den tilhørende ”Statistikdokumentation” (jeg har dog ikke tjekket det).
Danmarks Statistik er ikke involveret i R-pakken du bruger, så vi kan ikke umiddelbart være behjælpelige med denne, men måske er der en parameter ang. ”Koder/tekster”?
So the problem seems to be that the only the text, but not the numbercode for the rows are getting parsed at the lowest level. They do at the higher levels though.
Hi - looking at the two other bug reports, no one seems to be maintaining the package anymore, but anyway, might be helpful for others so here goes:
there is an issue with unique identifiers on for example the 'FV19TOTA' table. like so:
I wrote to DST, thinking it was a problem with the data. They wrote to me, in Danish:
So the problem seems to be that the only the text, but not the numbercode for the rows are getting parsed at the lowest level. They do at the higher levels though.
This also happens with the statsDK-package, link here
The text was updated successfully, but these errors were encountered: