Generate pseudo_id for practice #290

SRW612 · 2021-06-22T08:52:39Z

SRW612
Jun 22, 2021

I am trying to generate dummy data that includes a variable indicating the GP practice for each patient. I am using the function patients.registered_practice_as_of() to return pseudo_id. So far I have tried specifying a normally distributed integer like this:

returning="pseudo_id",
return_expectations={"rate" : "universal", "int" : {"distribution" : "normal", "mean": 1500, "stddev": 50}

I am also using the Measures() function to generate data at weekly level. But the result of using the normally disributed expectation is that the pseudo_id's vary from week to week, whereas in reality we would expect the same set of IDs each week. For example, we would expect every GP practice to have at least some consultations for respiratory tract infections each week.

The only alternative I can see is to specify the "category" option for return_expectations, although this would require entering hundreds of individual ratios, one for each practice.

Is there any other way to generate pseudo_id to create a variable where there are many different IDs (hundreds or even thousands) and each ID value appears each week?

Answered by alexwalkerepi

Jun 22, 2021

There's a couple of possible workarounds for this at the moment:

A neat bit of Python that you can use to specify lots of categories in the return_expectations without having to write them all out:

return_expectations={"category": {"ratios": {c: 1/1000 for c in range(1,1001)}}}

This will give reproducible categories that are 1-1000. The unfortunate quirk with this is that the data type this would return would be categorical, whereas in the real data the practice ID would be an int. However, this is not an issue if you're using the CSV output format as that obscures the data types anyway. If you're using feather or dta output formats, you might need to convert to an int in the dummy data.

View full answer

alexwalkerepi · 2021-06-22T14:22:02Z

alexwalkerepi
Jun 22, 2021
Maintainer

There's a couple of possible workarounds for this at the moment:

A neat bit of Python that you can use to specify lots of categories in the return_expectations without having to write them all out:

return_expectations={"category": {"ratios": {c: 1/1000 for c in range(1,1001)}}}

This will give reproducible categories that are 1-1000. The unfortunate quirk with this is that the data type this would return would be categorical, whereas in the real data the practice ID would be an int. However, this is not an issue if you're using the CSV output format as that obscures the data types anyway. If you're using feather or dta output formats, you might need to convert to an int in the dummy data.

The other option would be to continue using the int, and accept that the results look funny in the dummy data, but should be okay in the real data.

1 reply

SRW612 Jun 22, 2021
Author

Thanks Alex, I'm sure I can make one of those options work. And very useful to know that the real practice IDs are ints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate pseudo_id for practice #290

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Generate pseudo_id for practice #290

SRW612 Jun 22, 2021

Replies: 1 comment · 1 reply

alexwalkerepi Jun 22, 2021 Maintainer

SRW612 Jun 22, 2021 Author

SRW612
Jun 22, 2021

Replies: 1 comment 1 reply

alexwalkerepi
Jun 22, 2021
Maintainer

SRW612 Jun 22, 2021
Author