Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portfolio Docs #3395

Merged
merged 3 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 65 additions & 13 deletions docs/publishing/sections/2_static_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,72 @@ subprocess.run([
])
```

A Jupyter Notebook can be converted to PDF for email distribution with:
Or you can try:

```python
# Similar as converting to HTML, but change the output_format
# shell out, run NB Convert
OUTPUT_FORMAT = 'PDFviaHTML'
subprocess.run([
"jupyter",
"nbconvert",
"--to",
OUTPUT_FORMAT,
"--no-input",
"--no-prompt",
f"../{OUTPUT_FILENAME}.ipynb",
])
# Execute NB
jupyter nbconvert --to notebook --execute --inplace my_notebook.ipynb

# Convert NB to HTML then to PDF
jupyter nbconvert --to html --no-input --no-prompt my_notebook.ipynb
```

You can also convert a Jupyter Notebook to PDF for distribution in a few different ways. You might wonder why we don't suggest simply doing `File -> Save and Export Notebook As -> PDF`. We don't recommend this method because it leaves all your code cells visible, which usually isn't desirable.

All the code below are to be pasted into the <b>terminal</b>.

- The PDF generated has a very academic look, similar to a LaTex document.

```python
# Convert your original notebook
jupyter nbconvert --to pdf my_notebook.ipynb
```

- `Nbconvert` also has configuration options available. [Read about them here.](https://nbconvert.readthedocs.io/en/latest/config_options.html)

```python
# Hide all the code cells by adding --no-input
jupyter nbconvert --to pdf --no-input my_notebook.ipynb
```

- For a less academic look, you can convert your notebook into html before using `weasyprint`. This might cause blank pages to appear, typically at the beginning of your PDF. You will need to manually remove them using Adobe.

```python
# Make sure to install `weasyprint`
pip install WeasyPrint

# Execute NB
jupyter nbconvert --to notebook --execute --inplace my_notebook.ipynb

# Convert NB to HTML then to PDF
jupyter nbconvert --to html --no-input --no-prompt my_notebook.ipynb

# Convert to PDF
weasyprint my_notebook.html my_notebook.pdf
```

- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using `papermill` with the `weasyprint` steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/run_papermill.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/sb1_scorecard.ipynb) to produce 50+ PDF files for each of the nominated projects.

Briefly, the script above does the following:

- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
- Saves the notebook as html files.
- Converts the html files to PDF.
- Saves each PDF to the folder (organized by district) to our GCS.
- Deletes irrelevant files.

- Here are some tips and tricks when converting notebooks to HTML before PDF conversions.

- Any formatting should be done in HTML/CSS first.

- To create page breaks, add the following in a <b>Markdown</b> cell with however many `<br>` tags you'd like.

```python
<br>
<br>
<br>
<br>
<br>
```

- Follow the writing, rounding, and visualization ideas outlined in [Getting Notebooks Ready for the Portfolio](https://docs.calitp.org/data-infra/publishing/sections/4_notebooks_styling.html) section.
105 changes: 56 additions & 49 deletions docs/publishing/sections/4_notebooks_styling.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,42 +12,47 @@ We want all the content in our portfolio to be consistent. Below are guidelines
## Narrative

- Narrative content can be done in Markdown cells or code cells.

- Markdown cells should be used when there are no variables to inject.
- Code cells should be used to write narrative whenever variables constructed from f-strings are used.

- Markdown cells can inject f-strings if it's plain Markdown (not a heading) using `display(Markdown())` in a code cell.

```
from IPython.display import Markdown
```python
from IPython.display import Markdown

display(Markdown(f"The value of {variable} is {value}."))
```
display(Markdown(f"The value of {variable} is {value}."))
```

- Use f-strings to fill in variables and values instead of hard-coding them.

- Turn anything that runs in a loop or relies on a function into a variable.

- Use functions to grab those values for a specific entity (operator, district), rather than hard-coding the values into the narrative.

```
n_routes = (df[df.organization_name == operator]
.route_id.nunique()
)
```python
n_routes = (df[df.organization_name == operator]
.route_id.nunique()
)


n_parallel = (df[
(df.organization_name == operator) &
(df.parallel==1)]
.route_id.nunique()
)
n_parallel = (df[
(df.organization_name == operator) &
(df.parallel==1)]
.route_id.nunique()
)

display(
Markdown(
f"**Bus routes in service: {n_routes}**"
"<br>**Parallel routes** to State Highway Network (SHN): "
f"**{n_parallel} routes**"
)
)
```
display(
Markdown(
f"**Bus routes in service: {n_routes}**"
"<br>**Parallel routes** to State Highway Network (SHN): "
f"**{n_parallel} routes**"
)
)
```

- Stay away from loops if you need to use headers.

- You will need to create Markdown cells for headers or else JupyterBook will not build correctly. For parameterized notebooks, this is an acceptable trade-off.
- For unparameterized notebooks, you may want use `display(HTML())`.
- Caveat: Using `display(HTML())` means you'll lose the table of contents navigation in the top right corner in the JupyterBook build.
Expand Down Expand Up @@ -78,11 +83,11 @@ These are a set of principles to adhere to when writing the narrative content in
## Standard Names

- GTFS data in our warehouse stores information on operators, routes, and stops.
- Analysts should reference the operator name, route name, and Caltrans district the same way across analyses.
- Analysts should reference route name and Caltrans district the same way across analyses.
- Caltrans District: 7 should be referred to as `07 - Los Angeles`
- Between `route_short_name`, `route_long_name`, `route_desc`, which one should be used to describe `route_id`? Use `shared_utils.portfolio_utils`, which relies on regular expressions, to select the most human-readable route name.
- Use [`shared_utils.portfolio_utils`](https://github.com/cal-itp/data-analyses/blob/main/_shared_utils/shared_utils/portfolio_utils.py) to help you grab the right names to use. Sample code below.
```
```python
from shared_utils import portfolio_utils

route_names = portfolio_utils.add_route_name()
Expand All @@ -98,8 +103,8 @@ These are a set of principles to adhere to when writing the narrative content in

It's important to make our content as user-friendly as possible. Here are a few things to consider.

- Use a color palette that is color-blind friendly. There is no standard palette for now, so use your best judgement. There are many resources online such as [this one from the University of California, Santa Barbara](https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf).
- Add tooltips to your visualizations so users can find more detail.
- Use a color palette that is color-blind friendly. There is no standard palette, so use your best judgement. There are many palettes online such as [these ones from the University of California, Santa Barbara](https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf) for you to choose from.
- Add tooltips to your visualizations.
- Add `.interactive()` behind `Altair` charts which allow viewers to zoom in and out.

## Headers
Expand All @@ -108,7 +113,7 @@ It's important to make our content as user-friendly as possible. Here are a few

Headers must move consecutively in Markdown cells or the parameterized notebook will not generate. No skipping!

```
```python
# Notebook Title
## First Section
## Second Section
Expand All @@ -117,7 +122,7 @@ Headers must move consecutively in Markdown cells or the parameterized notebook

To get around consecutive headers, you can use `display(HTML())`.

```
```python
display(HTML(<h1>First Header</h1>) display(HTML(<h3>Next Header</h3>))
```

Expand All @@ -136,7 +141,7 @@ Markdown cells of the <i>H1</i> type creates the titles of our website, not the

## Last Checks

Your notebook is all ready to be published. However, it never hurts to double check your work once more. Here are some things to look over once more.
Your notebook is all ready to be published. However, it never hurts to do some final checks once more.

- All your values are formatted properly. Currencies should have $ and percentages should have %.
- The titles of your visualizations make sense and have the correct capitalizations.
Expand All @@ -149,31 +154,30 @@ Your notebook is all ready to be published. However, it never hurts to double ch

If you plan to rerun the same Jupyter Notebook over a set of different parameters, you need to setup your Jupyter Notebook in a particular way.

### Step 1: Packages to include

Copy and paste this code block below as shown for every notebook for the portfolio. Order matters, %%capture <b>must</b> go first.
### Packages to include

```
# Include this in the cell where packages are imported
Copy and paste this code block below as shown for every notebook for the portfolio. Order matters, `%%capture` <b>must</b> go first.

```python
%%capture

import warnings
warnings.filterwarnings('ignore')

import calitp_data_analysis.magics

all your other packages go here
# All your other packages go here
import pandas as pd
import utils
```

### Capturing Parameters

When parameterizing a notebook, there are 2 places in which the parameter must be injected. Let's say you want to run your notebook twelve times for each of the twelve Caltrans districts. The column `district` is the parameter.

#### Header:
#### Header

The first Markdown cell must include parameters to inject.You could set your header Markdown cell as:
`# District {district} Analysis`.
The first Markdown cell <b>must</b> include parameters to inject. Using the same example above, you could set your header Markdown cell to be `# District {district} Analysis` which would generate the title `District 1 Analysis` for District 1.

Please note:

Expand All @@ -185,39 +189,42 @@ Please note:

![header format](../assets/section4_image1.png)

#### Code Cell:
#### Code Cell

You will need to create two separate code cells that take on the parameter. Let's use `district` as an example parameter once again.

- Code Cell #1:
- <b>Code Cell #1</b>

- Add in your parameter and set it equal to any valid value.

- Comment out the cell.

- This is how your code cell should look:
- This is how your code cell must look.

```
```python
# district = "4"
```

- Turn on the parameter tag: go to the code cell go to the upper right hand corner -> click on the gears -> go to "Cell Tags" -> Add Tag + -> add a tag called "parameters" -> click on the new "parameters" tag to ensure a checkmark shows up and it turns dark gray.

![parameters tag](../assets/section4_image2.png)

- Code Cell #2:
- <b>Code Cell #2</b>

- Input the same parameter without any assigned value with `%%capture_parameters` at the top.
- This is how your code cell should look:
```
%%capture_parameters
district
```
- Your code cell must <b>exactly</b> like this or else your notebook won't parameterize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik it is technically possible to have more code/comments in one of these cells, just as long as %%capture_parameters remains at the very top of the cell. For example speedmaps has 2 of these, with an extra line of code: https://github.com/cal-itp/data-analyses/blob/main/ca_transit_speed_maps/speedmaps.ipynb

Perhaps we should rephrase this to a recommendation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll rephrase this. Thanks for checking this out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it! Could you check it out again? Thanks. @edasmalchi

- Do not add any additional lines of code.
- Do not add commented out code such as `# Comment this back in` in this cell.

#### If you're using a heading, you can either use HTML or capture the parameter and inject.
```python
%%capture_parameters
district
```

#### If you're using a heading, you can either use HTML or capture the parameter and inject

- HTML - this option works when you run your notebook locally.
```
```python
from IPython.display import HTML

display(HTML(f"<h3>Header with {variable}</h3>"))
Expand Down
Loading