Skip to content

Commit

Permalink
Map mutation gene symbols to Entrez IDs (#12)
Browse files Browse the repository at this point in the history
* In mutation data, map gene symbol to Entrez ID based on combination of chromosome and gene symbol.

* In mutation data, map gene symbol to Entrez ID based on combination of chromosome and gene symbol.

* Create python script that includes gene symbol -> entrez id conversion for mutation data.

* Move mutation gene -> entrez mapping to separate notebook.

* Include mutation mapping .tsv file

* Include small data matrices.

* Added comments and removed some string concatenation.
  • Loading branch information
clairemcleod authored and gwaybio committed Aug 10, 2016
1 parent 02a306a commit e6a7fcf
Show file tree
Hide file tree
Showing 11 changed files with 124,920 additions and 1,163 deletions.
231 changes: 186 additions & 45 deletions 2.TCGA-process.ipynb
100644 → 100755

Large diffs are not rendered by default.

92 changes: 46 additions & 46 deletions data/subset/expression-matrix-all-genes.tsv

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion data/subset/expression-matrix-all-samples.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -4256,7 +4256,6 @@ TCGA-E8-A3X7-01 0 8.05 8.19 9.49 5.06 10.9 8.27 10.8 7.01 9.11 3.23 6.07 0 5.62
TCGA-E8-A413-01 0 8.15 8.44 9.42 6 10.6 9.58 10.3 7.56 8.67 2.42 5.49 0 6.32 13.6
TCGA-E8-A414-01 0 7.97 8.4 9.7 5.71 10.5 9.63 10.8 7.28 8.57 2.99 5.66 0 6.97 12.8
TCGA-E8-A415-01 0 7.94 7.22 9.44 5.22 10.4 9.28 10.6 7.2 8.99 2 5.65 0 6.91 13.2
TCGA-E8-A416-01 0 7.81 6.63 9.39 4.26 10.5 9.77 8.77 7.11 9.2 2.25 6.27 0 7 13.2
TCGA-E8-A417-01 0 7.76 7.69 9.92 3.35 10.7 9.48 10.4 7.42 8.46 1.31 5.77 0 6.39 13.2
TCGA-E8-A418-01 0 8.01 7.01 9.43 5.4 10.4 9.3 10.7 7.13 9.03 1.85 1.85 0 5.69 14.4
TCGA-E8-A419-01 0 8.32 6.98 9.5 3.61 10.4 9.4 10.8 8.05 9 2.13 4 0 6.45 13.3
Expand Down
92 changes: 46 additions & 46 deletions data/subset/expression-matrix-small.tsv
Original file line number Diff line number Diff line change
@@ -1,51 +1,51 @@
sample_id 1421 5203 5818 9875 10675 10919 23262 23467 54941 79622 147746 255167 284123 646851 728689
TCGA-25-1313-01 0.835 9.73 9.4 10.3 8.65 12.2 9.46 10.4 6.63 7.48 3.01 5.73 0 6.25 13.3
TCGA-22-4593-01 0 9.53 13.6 10.9 4.56 10.7 9.45 9.86 5.4 9.49 1.76 4.26 0 5.07 14
TCGA-2G-AALW-01 5.57 8.92 9.7 10.4 4.79 11.1 9.47 8.83 7.83 9.24 2.73 5.12 0 6.34 13.8
TCGA-3G-AB0O-01 0 8.63 9.64 9.04 1.99 11.3 9.39 4.77 8.33 10.5 0 4.85 0 5.38 13.3
TCGA-3N-A9WD-06 0 8.93 9.15 9.5 6.45 10.5 9.25 8.12 8.31 9.41 2.18 3.14 0 4.77 13.8
TCGA-41-5651-01 6.15 9.24 11.5 10.1 12.5 11.3 9.38 11.7 6.3 10.4 2.83 5.67 0 5.45 13
TCGA-44-6776-01 0 7.96 9.08 10.4 6.16 10.8 8.63 5.46 8.32 9.4 3.27 2.59 0 5.18 13.9
TCGA-44-A47A-01 0 7.8 11.2 10.6 6.59 10.4 9.37 7.03 6.9 8.54 2.61 2.35 0 10.2 13.8
TCGA-60-2721-01 0 7.69 14.1 10.8 3.2 11 9.12 7.89 7.01 9.03 1.74 3.88 0 5.76 13.3
TCGA-A2-A3XZ-01 0 8.16 10.4 10.1 6.19 11.5 8.71 8.33 7.87 9.49 2.98 1.02 0 6.19 13.7
TCGA-A3-3378-01 0 7.52 8.24 9.8 3.14 10.1 10.4 6.7 7.88 8.13 1.99 0 0 7.27 12.6
TCGA-AA-3697-01 0 9.3 10.9 10.7 3.45 11.3 9.68 7.54 7.3 9.42 0 0 0 5.47 13.7
TCGA-BA-4077-01 0 7.6 12.9 9.88 3.7 9.97 9.08 9.51 6.49 9.51 0.973 4.5 0 2.54 13
TCGA-BJ-A2NA-01 0 8.05 7.85 9.76 4.48 10.3 9.31 10.5 7.73 8.46 2.51 4.96 0 5.75 14.4
TCGA-49-4487-01 0 9.51 8.45 10.6 7.17 10.4 8.71 8.18 7.19 9.49 2.6 0.882 0 4.89 13.5
TCGA-59-2363-01 5.11 9.3 10.7 9.46 6.01 12.1 9.73 8.94 6 8.47 2.86 6.33 0 7.61 13
TCGA-69-8253-01 0 9.16 11.5 10.9 5.44 10.1 9.2 5.94 7.45 8.57 1.28 1.84 0 5.26 13.8
TCGA-78-8662-01 0 7.75 10.6 11 9.06 11.6 9.02 4.43 5.42 10.3 4.51 1.58 0 6.79 13.7
TCGA-86-A4P8-01 0 7.5 8.11 9.1 6.32 10.4 9.83 7.61 9.32 7.95 3.13 3.78 0 6.19 12.6
TCGA-A7-A2KD-01 0 8.31 10.4 10.6 4.28 10.3 9.95 9.51 7.18 9.4 3.55 2.24 0 8.27 12.8
TCGA-AN-A0FX-01 0 8.39 10.4 10.3 3.06 10.9 9.93 11.2 7.16 9.03 1.54 1.38 0 6.51 13
TCGA-B6-A0WS-01 0 7.85 9.44 10.2 3.31 10.5 9.86 11.1 9.32 9.72 2.15 3.4 0 6.27 13.4
TCGA-B6-A0WV-01 0 8.46 9.57 9.96 6.73 9.97 11.4 10.6 6.09 9.17 1.7 0.926 0 7.27 12.8
TCGA-BA-4076-01 0 8.67 13.1 9.52 3.65 9.69 9.23 7.51 6.12 8.85 2.64 4.2 0 5.65 13.1
TCGA-BH-A0HP-01 0 8.19 10.3 9.62 6.28 10.9 10.1 11 9.18 9.18 2.54 2.08 0 6.65 13.2
TCGA-BH-A1ES-06 0 8.85 8.82 9.51 2.31 10.6 9.73 7.13 6.48 8.81 1.29 3.84 0 6.66 12.9
TCGA-BP-4961-01 0 7.87 9.24 9.67 2.19 9.98 10.5 7.7 8.53 8.2 2.35 3.33 0 4.08 13.6
TCGA-BR-6454-01 0 9.6 11.3 10.7 3.34 11 8.77 7.84 7.4 9.11 0.65 7.09 0 4.26 13.1
TCGA-BR-A4IU-01 0 8.68 10.4 10.3 4.29 10.2 9.51 9.73 8.95 7.76 3.79 5.28 0 5.68 12.6
TCGA-C8-A1HK-01 0 8.54 10.2 10.3 4.36 11 10 10.6 8.03 9.42 1.32 0 0 7.6 12.7
TCGA-CR-6474-01 0 8.9 13 10.4 2.57 9.73 8.62 10.4 6.67 7.71 1.86 1.73 0 3.57 13.7
TCGA-CR-7385-01 0 8.19 13.3 10.8 5.9 10.8 9.69 6.51 6.8 9.94 1.61 2.14 0 2.69 13.8
TCGA-CX-7219-01 0 7.96 13.1 11 4.25 10.5 9.86 7.21 6.81 8.49 2.37 2.83 0 4.45 13.7
TCGA-D6-6823-01 0 8.69 13.7 11.1 5.26 10.3 8.5 7.52 7.08 8.34 2.01 2.43 0 2.87 14
TCGA-DD-AAE9-01 0 8.05 5.38 8.27 6.29 10.9 8.6 1.09 7.38 9.51 1.09 10.6 0 0 12.9
TCGA-DS-A3LQ-01 0 8.53 12.5 9.76 5.34 10.5 9.02 5.74 4.75 10.8 0.686 6.77 0 2.89 13.7
TCGA-DU-A76L-01 1.87 9.12 10.6 9.96 10.9 10.7 9.46 10.9 6.91 9.11 1.87 2.47 0 5.81 13
TCGA-DW-7838-01 0 7.8 8.64 9.06 1.41 9.18 10 9.02 7.12 9.2 0.498 1.41 0 5.7 13.5
TCGA-E9-A5FK-01 0 8.9 10.3 8.84 5.7 11 8.58 8.35 7.88 9.95 3.46 1.04 0 4.74 13.9
TCGA-ED-A7PZ-01 0 8.88 6.91 9 1.68 11.5 8.48 2.43 2.93 11.1 2.57 3.99 0.633 2.11 14.9
TCGA-EJ-8472-01 0 8.4 10.3 9.84 6.39 10.4 9.13 10.2 7.5 8.89 2.43 3.72 0 6.07 13
TCGA-EP-A3JL-01 0 9.21 8.69 8.4 3.79 10 8.82 5.43 6.55 10.3 2.47 10.9 0 1.33 13.3
TCGA-ET-A39O-01 0 8.26 7.64 9.66 4.07 10.3 9.46 11.1 7.81 8.35 2.59 1.97 0 4.58 14.1
TCGA-F4-6857-01 0 9.75 9.52 10.2 2.97 10.7 9.49 8.46 7.64 8.77 1.83 2.8 0 6.63 14
TCGA-FG-5965-01 0.943 7.68 10.4 10.9 13.6 11.3 8.97 12.2 6.45 9.74 3.63 4.61 0 7.18 12.3
TCGA-G3-A3CH-01 0 7.81 9.07 9.32 6.08 9.43 9.09 10.6 9.58 9.44 3.11 10.9 0 1.54 13.1
TCGA-G3-A3CJ-01 0 8.34 6.95 8.94 3.11 10 8.68 2.48 3.95 10.5 1.34 10.5 0 1.34 13.9
TCGA-GN-A8LN-01 0 8.74 8.71 11.1 8.92 11.2 10.2 6.73 7.09 8.9 1.19 0 0 6.96 14.5
TCGA-HT-7472-01 0 8.01 10.8 10.7 12.4 11.2 9.19 11.3 7.59 9.32 1 2 0 7.9 12.8
TCGA-HU-A4GC-01 0 9.38 9.56 9.2 2.96 10.4 9.23 6.29 8.83 8.67 0.778 0.441 0 5.52 12.9
TCGA-IB-A6UG-01 1.05 8.15 10.1 9.94 4 10.2 8.99 8.95 7.27 8.26 2.08 3.18 0 5.98 13.4
TCGA-L6-A4EP-01 0 8.48 9.49 9.02 4.65 10.9 8.87 10.8 7.09 9.03 3.98 4.81 0 6.31 14.5
TCGA-LT-A5Z6-01 0 7.96 11.4 9.11 6.43 11.6 9.54 11.5 6.19 9.2 1.36 1.62 0 8.58 12
TCGA-OL-A5DA-01 0 8.02 9.71 9.46 4.07 12.1 9.36 8.46 7.33 9.3 3.15 1.41 0 8.03 13.5
TCGA-OR-A5LA-01 0 9.32 7.1 10.3 3.54 9.64 9.36 10.8 6.94 10.2 0.916 0.916 0 5.19 14.8
TCGA-PJ-A5Z9-01 0 8.98 8.18 6.98 2.62 10.3 8.14 8.81 4.52 10.2 0.719 0.719 0 7.32 13.8
TCGA-QH-A6CY-01 2.3 8.69 11.1 10.1 13.7 11.2 8.67 12.7 7.33 10.1 6.05 6.31 0 7.06 13.1
TCGA-SR-A6MP-01 0 9.11 11.5 8.98 6.21 10.2 10.3 9.71 7.94 9.55 2.82 8.4 0 4.26 13.7
TCGA-V4-A9F0-01 0 7.77 10.2 12.1 7.37 11.3 9.05 9.09 6.41 9.73 4.48 2.99 0 2.71 15.1
TCGA-VD-AA8T-01 0 8.65 9.19 12.4 7.96 11 10.4 9.35 5.58 8.76 4.98 3.7 0 3.78 14.9
TCGA-VW-A8FI-01 1.13 8.45 9.09 9.42 13 11.5 9.44 11.1 6.11 10.1 2.53 4.33 0 7.17 12.6
TCGA-WC-A885-01 0 7.74 9.66 11.2 4.73 12.6 6.71 9.28 2.4 9.56 2.66 0 0 1.65 16
TCGA-YL-A8SA-01 0 7.3 9.44 10 6.02 10.7 9.2 9.74 6.58 9 3.54 2.54 0 6.1 13.4
TCGA-ZF-A9RE-01 0 8.53 14.4 10.9 4.82 11.1 9.46 5.39 6.39 9.13 0.616 2.53 0 5.09 13.1
TCGA-ZH-A8Y1-01 0 8.56 10.5 9.95 2.73 10.7 9.58 10.2 6.56 8.27 2.6 5.77 0 7.37 13.7
TCGA-C4-A0EZ-01 0 9.29 11 9.64 6.46 12 9.79 12.7 4.43 10.2 0.794 3.5 0 7.64 13.8
TCGA-C5-A1MI-01 0 9.49 10.2 10.3 6.52 11 10.4 7.89 6.59 9.14 2.25 6.03 0 4.65 13.4
TCGA-C5-A1ML-01 0 9.03 12.9 9.93 6.81 10.3 9.93 10.1 5.84 8.9 2.08 7.21 0 4.95 12.5
TCGA-D3-A3MO-06 0 8.68 9.82 10.3 6.76 10.9 10.2 7.13 9.57 8.8 2.25 0 0 6.83 13.7
TCGA-DC-4745-01 0 9.5 10.9 9.3 2.35 9.66 9.58 4.57 5.36 9.95 0 1.5 0 4.27 13.8
TCGA-DU-A7T6-01 1.47 8.82 11.7 11.2 11.7 11.9 9.31 11.5 6.63 9.68 4.22 4.15 0 4.49 14.1
TCGA-E9-A3QA-01 0 8.24 10.4 9.9 4.61 11.4 8.63 10.4 7.67 9.52 2.79 4.35 0 5.82 13.8
TCGA-EA-A43B-01 0 8.71 12 9.98 6.97 12.4 8.05 10.1 6.38 9.86 1.61 0 0 4.51 14.1
TCGA-EJ-A7NJ-01 0 7.64 10.4 10.5 4.58 10.2 9.4 10.9 6.54 9.45 3.11 8 0 5.52 12.6
TCGA-EK-A2RO-01 0 8.86 11.5 9.5 6.34 10.5 8.49 6.68 5.28 9.75 1.04 1.63 0 4.19 13.3
TCGA-EL-A3D0-01 0 8.97 7.77 8.93 4.61 10.9 8.68 9.83 7.53 7.97 3.76 4.67 0 5.69 13.8
TCGA-ER-A2NE-06 0 7.73 8.71 11.4 4.99 10.8 8.7 4.96 9.63 9.59 1.32 0 0 3.59 13.6
TCGA-FG-6688-01 0 7.45 9.87 10.5 11.4 10.6 8.73 10.5 7.52 9.62 1.69 5.55 0 7.65 12.8
TCGA-FG-A70Z-01 3.74 7.9 10.7 10.4 13.9 11 8.5 13.3 5.85 10.3 4.42 5.06 0 6.04 12.4
TCGA-HC-7079-01 0 7.85 10.8 10.7 3.52 10.5 9.8 9.39 8.92 8.45 2.89 5.29 0 5.05 12.7
TCGA-HC-8266-01 0 7.78 10.3 10.3 7.12 10.7 9.29 10.4 8.72 9.19 2.3 5.62 0 5.42 12.9
TCGA-HN-A2OB-01 0 7.66 10.9 10.2 4.28 11.2 9.77 8.17 8.76 8.9 2.34 3.9 0 7.15 13.1
TCGA-HT-A4DS-01 3.01 8.54 9.96 8.95 13.6 11.3 7.99 12.5 6.22 11.3 4.9 2.88 0 5.91 14.3
TCGA-HT-A616-01 2.13 8.53 11.1 9.68 13.2 11.3 8.93 13.2 5.16 9.91 7.08 6.03 0 6.47 13
TCGA-IB-A5SO-01 0.828 8.3 9.22 9.52 4.81 10.1 9.04 8.81 7.34 8.34 3 5.78 0 6.26 13.1
TCGA-KS-A4I7-01 0 8.07 7.79 9.22 6.3 10.6 9.02 10.5 7.13 8.48 2.58 4.79 0 5.64 13.3
TCGA-L5-A8NW-01 0 8.24 13.6 10.5 4.64 10.2 10.5 6.96 8.25 6.74 1.35 2.42 0 5.73 13.1
TCGA-N8-A4PN-01 0.488 8.27 13.1 11.6 4.23 12 9.45 10.2 5.81 9.27 5.7 7.64 2.54 7.11 13.4
TCGA-QU-A6IP-01 1.08 8.22 10.1 9.96 5.57 10.8 9.24 10.9 7.88 9.52 3.23 5.6 0 6.04 13.2
TCGA-RY-A843-01 1.88 8.18 10.4 9.55 13.6 11.1 8.93 12.5 6.3 9.87 5.42 5.48 0 6.47 12.6
TCGA-TM-A84F-01 1.37 8.22 9.53 9.72 11.7 11.1 8.85 10.9 7.2 10.1 3.06 4.36 0 7.53 13
TCGA-TQ-A7RW-01 0 7.71 10.6 10.9 10.9 11.3 9.53 10.9 8.03 9.06 0.586 1 0 7.41 13.5
TCGA-VM-A8CE-01 0.486 8.01 11.2 10.2 13.7 11.6 8.77 10.8 7.21 9.44 1.59 4.22 0 7.74 13
TCGA-WA-A7GZ-01 0 8.86 14 11.3 4.7 10.8 8.37 9.69 5.2 9.44 2.41 3.06 0 5.27 14.3
TCGA-WE-AAA3-06 0 7.34 8.71 11 7.26 11.3 8.68 7.54 8.53 9.69 1.25 0.942 0 5.74 13.6
TCGA-WK-A8XO-01 0.672 8.35 9.36 9.83 4.97 10.9 9.04 10.4 7.26 9.5 3.55 3.31 0 5.67 13.5
102 changes: 51 additions & 51 deletions data/subset/mutation-matrix-all-genes.tsv

Large diffs are not rendered by default.

Loading

0 comments on commit e6a7fcf

Please sign in to comment.