Skip to content

Commit 840c2b8

Browse files
Create hes_apc_diagnosis.md
1 parent 3452ca9 commit 840c2b8

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
layout: default
3+
title: HES APC - Diagnosis
4+
parent: HES APC
5+
grand_parent: Curated Assets
6+
nav_order: 1
7+
permalink: /curated_assets/hes_apc/hes_apc_diagnosis
8+
---
9+
10+
# HES APC - Diagnosis
11+
12+
<a href="https://github.com/BHFDSC/hds_curated_assets/blob/main/D08-hes_apc.py" class="btn btn-primary fs-5 mb-4 mb-md-0 mr-2" target="_blank">View code on GitHub</a>
13+
14+
The *hes_apc_diagnosis* table is curated from the latest archived version of the HES APC table (hes_apc_all_years_archive). The output is a long format table where each row represents an individual three-digit (DIAG_3_01,…, DIAG_3_20) or four-digit (DIAG_4_01,…, DIAG_4_20) ICD-9 or ICD-10 diagnosis code associated with a specific individual and hospital episode. Non-alphanumeric characters and trailing Xs are removed from the diagnosis codes, and rows where codes are null, or an empty string, are removed ensuring only valid codes are retained. The resulting table includes 10 columns: 6 identifier columns (person ID, episode key, episode start date, episode end date, admission date and discharge date) and 4 columns capturing the diagnosis code and position (diag_column, code, diag_digits, diag_position).
15+
16+
- **diag_column**: the name of the original diagnosis column (eg., DIAG_3_01)
17+
- **code**: the actual ICD diagnosis code
18+
- **diag_digits**: indicates whether the ICD code is the three- or four-digit version
19+
- **diag_position**: indicates the position of the diagnosis code (eg., 1-20)
20+
21+
22+
## Example
23+
24+
| person_id | epikey | epistart | epiend | admidate | disdate | diag_column | code | diag_digits | diag_position |
25+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
26+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_3_01 | H25 | 3 | 1 |
27+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_3_02 | H40 | 3 | 2 |
28+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_3_03 | H53 | 3 | 3 |
29+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_4_01 | H258 | 4 | 1 |
30+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_4_02 | H402 | 4 | 2 |
31+
| A | 178954263574 | 2021-06-24 | 2021-06-24 | 2021-06-24 | 2021-06-24 | DIAG_4_03 | H533 | 4 | 3 |
32+
| B | 559478246553 | 2020-10-07 | 2020-10-08 | 2020-10-07 | 2020-10-08 | DIAG_3_01 | T85 | 3 | 1 |
33+
| B | 559478246553 | 2020-10-07 | 2020-10-08 | 2020-10-07 | 2020-10-08 | DIAG_3_02 | Y84 | 3 | 2 |
34+
35+
36+
37+
The table is saved to the DSA schema **dsa_391419_j3w9t_collab**. The archived_on_date is in the format **YYYY_MM_DD**.
38+
39+
{: .highlight-title }
40+
> Table Name
41+
>
42+
> >
43+
> hds_curated_assets__hes_apc_diagnosis_archived_on_date
44+
45+
The below code will load the hes_apc_diagnosis table as at October 2024 using PySpark:
46+
47+
{% highlight markdown %}
48+
```python
49+
import pyspark.sql.functions as f
50+
dsa = f'dsa_391419_j3w9t_collab'
51+
demographics_table = spark.table(f'{dsa}.hds_curated_assets__hes_apc_diagnosis_2024_10_01')
52+
```
53+
{% endhighlight %}

0 commit comments

Comments
 (0)