Generate pseudo_id for practice #290
-
I am trying to generate dummy data that includes a variable indicating the GP practice for each patient. I am using the function patients.registered_practice_as_of() to return pseudo_id. So far I have tried specifying a normally distributed integer like this: returning="pseudo_id", I am also using the Measures() function to generate data at weekly level. But the result of using the normally disributed expectation is that the pseudo_id's vary from week to week, whereas in reality we would expect the same set of IDs each week. For example, we would expect every GP practice to have at least some consultations for respiratory tract infections each week. The only alternative I can see is to specify the "category" option for return_expectations, although this would require entering hundreds of individual ratios, one for each practice. Is there any other way to generate pseudo_id to create a variable where there are many different IDs (hundreds or even thousands) and each ID value appears each week? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
There's a couple of possible workarounds for this at the moment:
return_expectations={"category": {"ratios": {c: 1/1000 for c in range(1,1001)}}} This will give reproducible categories that are 1-1000. The unfortunate quirk with this is that the data type this would return would be
|
Beta Was this translation helpful? Give feedback.
There's a couple of possible workarounds for this at the moment:
return_expectations
without having to write them all out:This will give reproducible categories that are 1-1000. The unfortunate quirk with this is that the data type this would return would be
categorical
, whereas in the real data the practice ID would be anint
. However, this is not an issue if you're using the CSV output format as that obscures the data types anyway. If you're usingfeather
ordta
output formats, you might need to convert to anint
in the dummy data.