scaling-security.Rmd

# Security {#scaling-security}

```{r, include = FALSE}
source("common.R")
```

Most Shiny apps are deployed within a company firewall and since you can generally assume that your colleagues aren't going to try and hack your app[^scaling-security-1], you don't need to think about security.
If, however, your app contains data that only some of your colleagues should be able to access, or you want to expose your app to the public, you will need to spend some time on security.
When securing your app, there are two main things to protect:

[^scaling-security-1]: If you can't assume that, you have bigger problems!
    That said, some companies do have a "zero-trust" model, so you should double check with your IT team.

-   Your data: you want to make sure an attacker can't access any sensitive data.

-   Your compute resources: you want to make sure an attacker can't mine bitcoin or use your server as part of a spam farm.

Fortunately your job is made a little easier because security is a team sport.
Whoever deploys your app is responsible for security **between** apps, ensuring that app A can't access the code or data in app B, and can't steal all the memory and compute power on the server.
Your responsibility is the security **within** your app, making sure that an attacker can't abuse your app to achieve their ends.
This chapter will give the basics of securing your Shiny, broken down into securing your data and securing your compute resources.

If you're interested in learning a little more about security and R in general, I highly recommend Colin Gillespie's entertaining and educational useR!
2019 talk, "[R and Security](https://www.youtube.com/watch?v=5odJxZj9LE4)".

```{r setup}
library(shiny)
```

## Data

The most sensitive data is stuff like personally identifying information (PII), regulated data, credit card data, health data, or anything else that would be a legal nightmare for your company if was made public.
Fortunately, most Shiny apps don't deal with those types of data[^scaling-security-2], but there is an important type of data you do need to worry about: passwords.
You should never include passwords in the source code of your app.
Instead either put them in environment variables, or if you have many use the [config](https://github.com/rstudio/config) package.
Either way, make sure that they are never included in your source code control by adding the appropriate files to `.gitignore`. I also recommend documenting how a new developer can get the appropriate credentials.

[^scaling-security-2]: If your app does work these types of data, it's imperative that you partner with a software engineer with security expertise.

Alternatively, you may have data that is user-specific.
If you need to **authenticate** users, i.e. identify them through a user name and password, never attempt to roll a solution yourself.
There are just too many things that might go wrong.
Instead, you'll need to work with your IT team to design a secure access mechanism.
You can see some best practices at <https://solutions.posit.co/secure-access/auth/kerberos/index.html> and <https://solutions.posit.co/connections/db/best-practices/deployment/>.
Note that code within `server()` is isolated so there's no way for one user session to see data from another.
The only exception is if you use caching --- see Section \@ref(cache-scope) for details.

Finally, note that Shiny inputs use client-side validation, i.e. the checks for valid input are performed by JavaScript in the browser, not by R.
This means it's possible for a knowledgeable attacker to send values that you don't expect.
For example, take this simple app:

```{r, eval = FALSE}
secrets <- list(
  a = "my name",
  b = "my birthday",
  c = "my social security number", 
  d = "my credit card"
)

allowed <- c("a", "b")
ui <- fluidPage(
  selectInput("x", "x", choices = allowed),
  textOutput("secret")
)
server <- function(input, output, session) {
  output$secret <- renderText({
    secrets[[input$x]]
  })
}
```

You might expect that a user could access my name and birthday, but not my social security number or credit card details.
But a knowledgeable attacker can open up a JavaScript console in their browser and run `Shiny.setInputValue("x", "c")` to see my SSN.
So to be safe, you need to check all user inputs from your R code:

```{r}
server <- function(input, output, session) {
  output$secret <- renderText({
    req(input$x %in% allowed)
    secrets[[input$x]]
  })
}
```

I deliberately didn't create a user friendly error message --- the only time you'd see it was if you're trying to break the app, and we don't need to help out an attacker.

## Compute resources

It's hopefully obvious that the following app is very dangerous, because it allows the user to run any R code they want.
They could delete important files, modify data, or send confidential data back to the user of the app.

```{r}
ui <- fluidPage(
  textInput("code", "Enter code here"),
  textOutput("results")
)
server <- function(input, output, session) {
  output$results <- renderText({
    eval(parse(text = input$code))
  })
}
```

In general, the combination of `parse()` and `eval()` is a big warning sign for any Shiny app[^scaling-security-3]: they instantly make your app vulnerable.
Similarly, you should never `source()` an uploaded `.R` file, or `rmarkdown::render()` an uploaded `.Rmd`. But these cases are pretty obvious, and are unlikely to be source of real problems.

[^scaling-security-3]: The only exception is if they don't involve user-supplied data in any way.

The bigger challenge arises because there are a number of functions that `parse()`, `eval()`, or both, in a way that you're not aware of.
Here are the most common:

-   **Model formulas**.
    It's possible to construct a model that executes arbitrary R code:

    ```{r}
    df <- data.frame(x = 1:5, y = runif(5))
    mod <- lm(y ~ {print("Hi!"); x}, data = df)
    ```

    This makes it difficult to safely allow a user to define their own models.

-   **Glue labels**.
    The glue package provides a powerful way to create strings from data:

    ```{r}
    title <- "foo"
    number <- 1
    glue::glue("{title}-{number}")
    ```

    But `glue()` evaluates anything inside of `{}`:

    ```{r}
    glue::glue("{title}-{print('Hi'); number}")
    ```

    If you want to allow a user to supply a glue string to generate a label, instead use `glue::glue_safe()` which only looks up variable names and doesn't evaluate code:

    ```{r, error = TRUE}
    glue::glue_safe("{title}-{number}")
    glue::glue_safe("{title}-{print('Hi'); number}")
    ```

-   **Variable transformation.** There's no way to safely allow a user to provide code snippets to transform a variable for dplyr or ggplot2.
    You might expect they'll write `log10(x)` but they could write `{print("Hi"); log10(x)}`

    This also means that you should never use the older `ggplot2::aes_string()` with user supplied input.
    Instead, stick with the techniques in Chapter \@ref(action-tidy).

The same problem can occur with SQL.
For example, if you construct SQL with `paste()`, e.g.:

```{r}
find_student <- function(name) {
  paste0("SELECT * FROM Students WHERE name = ('", name, "');")
}
find_student("Hadley")
```

An attacker can provide a malicious username:[^scaling-security-4]

[^scaling-security-4]: [\<https://xkcd.com/327/\>](https://xkcd.com/327/){.uri}

```{r}
find_student("Robert'); DROP TABLE Students; --")
```

This looks a bit odd, but it's a valid SQL query in three parts:

-   `SELECT * FROM Students WHERE name = ('Robert');` finds a student with name Robert.

-   `DROP TABLE Students;` deletes the `Students` table (!!).

-   `--'` is a comment needed to prevent the extra `'` from turning into a syntax error.

To avoid this problem, never generate SQL strings with paste and instead use system that automatically escapes user input (like [dbplyr](https://dbplyr.tidyverse.org)), or use `glue::glue_sql()`:

```{r}
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
find_student <- function(name) {
  glue::glue_sql("SELECT * FROM Students WHERE name = ({name});", .con = con)
}
find_student("Robert'); DROP TABLE Students; --")
```

It's a little hard to tell at first glance, but this is safe, because SQL's equivalent of `\'` is `''` so the query returns all rows of the `Students` table where the name is literally "Robert'); DROP TABLE Students; --".