Skip to content

Commit

Permalink
Render toc-less
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Sep 10, 2024
1 parent 4a17e8d commit 3c3d958
Show file tree
Hide file tree
Showing 31 changed files with 54 additions and 164 deletions.
107 changes: 15 additions & 92 deletions docs/no_toc/05-data-visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,101 +56,59 @@ To create a histogram, we use the function [`sns.displot()`](https://seaborn.pyd


``` python
plt.figure()
sns.displot(data=metadata, x="Age")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-3-1.png" width="244" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-3-2.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-3-3.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-3-1.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-3-2.png" width="200%" />

(The `plt.figure()` and `plt.show()` functions are used to render the plots on the website, but you don't need to use it for your exercises.)

A common parameter to consider when making histogram is how big the bins are. You can specify the bin width via `binwidth` argument, or the number of bins via `bins` argument.


``` python
plt.figure()
sns.displot(data=metadata, x="Age", binwidth = 10)
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-4-7.png" width="244" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-4-8.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-4-9.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-4-5.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-4-6.png" width="200%" />

Our histogram also works for categorical variables, such as "Sex".


``` python
plt.figure()
sns.displot(data=metadata, x="Sex")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-5-13.png" width="244" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-5-14.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-5-15.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-5-9.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-5-10.png" width="200%" />

**Conditioning on other variables**

Sometimes, you want to examine a distribution, such as Age, conditional on other variables, such as Age for Female, Age for Male, and Age for Unknown: what is the distribution of age when compared with sex? There are several ways of doing it. First, you could color variables by color, using the `hue` input argument:


``` python
plt.figure()
sns.displot(data=metadata, x="Age", hue="Sex")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-6-19.png" width="306" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-6-20.png" width="590" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-6-21.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-6-13.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-6-14.png" width="200%" />

It is rather hard to tell the groups apart from the coloring. So, we add a new option that we want to separate each bar category via `multiple="dodge"` input argument:


``` python
plt.figure()
sns.displot(data=metadata, x="Age", hue="Sex", multiple="dodge")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-7-25.png" width="306" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-7-26.png" width="590" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-7-27.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-7-17.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-7-18.png" width="200%" />

Lastly, an alternative to using colors to display the conditional variable, we could make a subplot for each conditional variable's value via `col="Sex"` or `row="Sex"`:


``` python
plt.figure()
sns.displot(data=metadata, x="Age", col="Sex")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-8-31.png" width="745" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-8-32.png" width="1440" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-8-33.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-8-21.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-8-22.png" width="200%" />

You can find a lot more details about distributions and histograms in [the Seaborn tutorial](https://seaborn.pydata.org/tutorial/distributions.html).

Expand All @@ -160,17 +118,10 @@ To visualize two continuous variables, it is common to use a scatterplot or a li


``` python
plt.figure()
sns.relplot(data=expression, x="KRAS_Exp", y="EGFR_Exp")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-9-37.png" width="244" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-9-38.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-9-39.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-9-25.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-9-26.png" width="200%" />

To conditional on other variables, plotting features are used to distinguish conditional variable values:

Expand All @@ -186,65 +137,37 @@ Let's merge `expression` and `metadata` together, so that we can examine KRAS an
``` python
expression_metadata = expression.merge(metadata)

plt.figure()
sns.relplot(data=expression_metadata, x="KRAS_Exp", y="EGFR_Exp", hue="PrimaryOrMetastasis")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-10-43.png" width="317" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-10-44.png" width="629" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-10-45.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-10-29.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-10-30.png" width="200%" />

Here is the scatterplot with different shapes:


``` python
plt.figure()
sns.relplot(data=expression_metadata, x="KRAS_Exp", y="EGFR_Exp", style="PrimaryOrMetastasis")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-11-49.png" width="317" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-11-50.png" width="629" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-11-51.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-11-33.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-11-34.png" width="200%" />

You can also try plotting with `size=PrimaryOrMetastasis"` if you like. None of these seem pretty effective at distinguishing the two groups, so we will try subplot faceting as we did for the histogram:


``` python
plt.figure()
sns.relplot(data=expression_metadata, x="KRAS_Exp", y="EGFR_Exp", col="PrimaryOrMetastasis")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-12-55.png" width="744" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-12-56.png" width="1440" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-12-57.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-12-37.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-12-38.png" width="200%" />

You can also conditional on multiple variables by assigning a different variable to the conditioning options:


``` python
plt.figure()
sns.relplot(data=expression_metadata, x="KRAS_Exp", y="EGFR_Exp", hue="PrimaryOrMetastasis", col="AgeCategory")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-13-61.png" width="1074" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-13-62.png" width="2069" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-13-63.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-13-41.png" width="200%" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-13-42.png" width="200%" />

You can find a lot more details about relational plots such as scatterplots and lineplots [in the Seaborn tutorial](https://seaborn.pydata.org/tutorial/relational.html).

Expand Down Expand Up @@ -283,13 +206,13 @@ exp_plot = sns.relplot(data=expression, x="KRAS_Exp", y="EGFR_Exp")
exp_plot.set(xlabel="KRAS Espression", ylabel="EGFR Expression", title="Gene expression relationship")
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-67.png" width="244" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-45.png" width="244" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-68.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-69.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-46.png" width="480" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-14-47.png" width="672" />

You can change the color palette by setting adding the `palette` input argument to any of the plots. You can explore available color palettes [here](https://www.practicalpythonfordatascience.com/ap_seaborn_palette):

Expand All @@ -300,13 +223,13 @@ sns.displot(data=metadata, x="Age", hue="Sex", multiple="dodge", palette=sns.col
)
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-73.png" width="306" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-51.png" width="306" />

``` python
plt.show()
```

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-74.png" width="590" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-75.png" width="672" />
<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-52.png" width="590" /><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-53.png" width="672" />

## Exercises

Expand Down
2 changes: 1 addition & 1 deletion docs/no_toc/About.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ These credits are based on our [course contributors table guidelines](https://ww
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2024-09-09
## date 2024-09-10
## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
Expand Down
2 changes: 1 addition & 1 deletion docs/no_toc/about-the-authors.html
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ <h1>About the Authors<a href="about-the-authors.html#about-the-authors" class="a
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2024-09-09
## date 2024-09-10
## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
Expand Down
Loading

0 comments on commit 3c3d958

Please sign in to comment.