lending-rate-prediction.qmd

---
title: "Predicting lending rates with Databricks and tidymodels"
format:
  html:
    css: styles.css
    code-fold: true
    code-summary: Show the code
    code-links:
      - text: "GitHub repository"
        icon: github
        target: blank
        href: https://github.com/posit-marketing/finance-app/
      - text: "Model pin"
        icon: pin
        target: blank
        href: https://pub.demo.posit.team/public/lend-fit-model-pin/
      - text: "Vetiver API"
        icon: arrow-left-right
        target: blank
        href: https://pub.demo.posit.team/public/lend-fit-vetiver-api/
      - text: "Shiny app"
        icon: window
        target: blank
        href: https://pub.demo.posit.team/public/lend-rate-prediction-app/
title-block-banner: banner.png
toc: true
filters:
  - shinylive
resource_files:
  - banner.png
---

Machine learning algorithms are reshaping financial decision-making, changing how the industry understands and manages financial risk. By analyzing vast amounts of data, these advanced algorithms deliver predictive insights that drive informed decisions and expedite client service. One example is in the consumer credit market, where accurately predicting lending rates is critical for customer acquisition and retention.

By analyzing historical data, machine learning enables the capability to offer potential clients personalized lending rates, quickly. This mitigates the chance of losing the customers to faster alternatives while minimizing loan-associated risks. Efficient but thorough application processing allows you to maintain a competitive edge in the market.

Financial analysts using Databricks can harness its performance and robust data governance capabilities, particularly when working with common datasets stored in Delta Lake. Analysts can use [ODBC](https://odbc.r-dbi.org/) to link with a SQL warehouse or [sparklyr](https://spark.posit.co/deployment/databricks-connect.html) to interface with a Databricks cluster.

Once the data is accessible, the [tidymodels framework of packages](https://www.tidymodels.org/) offer modeling and machine learning capabilities. Analysts can use these tools in managed environments with their preferred IDE in [Posit Workbench](https://posit.co/products/enterprise/workbench/) and deploy and share their models on [Posit Connect](https://posit.co/products/enterprise/connect/). The integration of tools combines the data governance strengths of Databricks with the powerful tools offered by Posit, making their work more productive and streamlined.

In this article, we will use publicly accessible loan applicant data from [LendingClub](https://www.lendingclub.com/) to create a machine learning model.^[Thank you to James Andersen for the blog post Predicting Lending Rates: An Intro to AWS Machine Learning that served as a starting off point for this post.] Our goal is to develop a personalized lending rate prediction model tailored to individual criteria.

## Databricks Delta Lake connection to RStudio

For those with access to Databricks, add the table to your catalog by running the below:

```sql
CREATE TABLE lending_club USING com.databricks.spark.csv OPTIONS(path 'dbfs:/databricks-datasets/lending-club-loan-stats/LoanStats_2018Q2.csv', header "true");
```

Let's start by loading the packages that will be integral to our workflow.

```{r}
#| label: setup
#| message: false
# Importing data
library(DBI)

# Wrangling packages
library(dplyr)
library(tidyr)
library(lubridate)
library(stringr)

# Visualization packages
library(dbplot)
library(ggplot2)

# Modeling packages
library(ranger)
library(rsample)
library(recipes)
library(parsnip)
library(workflows)
library(yardstick)

# MLOps
library(pins)
library(plumber)
library(vetiver)
```

As previously mentioned, analysts have several options for accessing Databricks data. In this walk-through, we'll demonstrate connectivity using the [odbc package](https://odbc.r-dbi.org/). The Posit Solutions site provides detailed guidance on using the new [odbc `databricks()` function](https://solutions.posit.co/connections/db/databases/databricks/index.html#using-the-new-odbcdatabricks-function). Essentially, we need to store a Databricks token and host URL in our R environment. The function itself requires only the HTTP path to the SQL warehouse.

```{r}
#| label: odbc-connection
# Sys.setenv("DATABRICKS_HOST" = "your-databricks-host.com")
# Sys.setenv("DATABRICKS_TOKEN" = "your-databricks-token")
con <-
  dbConnect(odbc::databricks(), httpPath = Sys.getenv("HTTP_PATH"))
```

This code snippet establishes a connection using the `con` object. Subsequently, we can use dplyr and dbplyr to navigate to our table. We retrieve the `lendingclub` table using `tbl()` and store it in an object named `lendingclub_dat`.

```{r}
#| label: get-tbl-data
lendingclub_dat <-
  tbl(con,
      dbplyr::in_catalog("hive_metastore", "default", "lendingclub"))
```

We can use the [dbplot package](https://edgararuiz.github.io/dbplot/) to generate a ggplot without transferring the data into R. This visualization displays the distribution of the variable we aim to predict, lending rate. Loan amounts are plotted on the x-axis, while the frequency of loans at each amount is depicted on the y-axis. It's shown that the majority of loans have an lending rate of less than 20%.

```{r}
#| label: dbplot-int-rate
#| fig-asp: 0.618
#| warning: false
lendingclub_dat |>
  mutate(int_rate = as.numeric(REGEXP_REPLACE(int_rate, "%", ""))) |>
  db_compute_bins(int_rate, binwidth = 0.5) |>
  ggplot() +
  geom_col(
    aes(x = int_rate, y = count),
    fill = "#1B909E",
    color = "#1B909E",
    alpha = 0.4
  ) +
  labs(
    title = "Distribution of lending rate",
    x = "Lending rate",
    y = "Count"
  ) +
  theme_minimal()
```

## Data cleaning and feature engineering

The `lendingclub_dat` object resembles a typical data frame, but it's actually a SQL table. With the [dbplyr package](https://dbplyr.tidyverse.org/), we can use dplyr syntax to query and manipulate these SQL tables directly in R.

Certain columns in `lendingclub_dat` are imported as characters instead of their intended numeric type. With dplyr's `mutate()` function, we can convert them to the correct type. The chunk below showcases the conversion of columns containing the word `annual` using both dplyr and SQL syntax. The `show_query()` function shows the underlying SQL query. This example highlights the efficiency of dplyr’s syntax in R, offering a more concise alternative to verbose SQL queries.

::: {.panel-tabset}

## R syntax

```{r}
#| label: dplyr-syntax
#| code-fold: false
#| eval: false
tbl(con,
    dbplyr::in_catalog("hive_metastore", "default", "lendingclub")) |>
    mutate(across(c(starts_with("annual")), ~ as.numeric(.)))
```

## SQL syntax

```{r}
#| label: sql-syntax
#| code-fold: false
tbl(con,
    dbplyr::in_catalog("hive_metastore", "default", "lendingclub")) |>
    mutate(across(c(starts_with("annual")), ~ as.numeric(.))) |>
    show_query()
```

:::

Now, we can proceed with using dplyr to clean up the rest of the variables and create supplementary variables that could potentially influence lending rates. Running operations before pulling the data into R with `collect()` leads to significantly faster performance.

```{r}
#| label: pre-collect-data-clean
lendingclub_dat <- tbl(
  con,
  dbplyr::in_catalog("hive_metastore", "default", "lendingclub")
) |>
  mutate(
    # Convert columns into numeric
    across(
      c(
        starts_with("annual"), starts_with("dti"), starts_with("inq"), starts_with("mo"), starts_with("mths"), starts_with("num"), starts_with("open"), starts_with("percent"), starts_with("pct"), starts_with("revol"), starts_with("tot"), "all_util", "il_util", "tax_liens", "loan_amnt", "installment", "pub_rec_bankruptcies", "num_tl_120dpd_2m", "bc_util", "max_bal_bc", "bc_open_to_buy", "acc_open_past_24mths", "avg_cur_bal", "delinq_2yrs", "pub_rec"
      ),
      ~ as.numeric(.)
    ),
    # Calculate loan to income statistic
    loan_to_income = if_else(
      application_type == "Individual",
      loan_amnt / annual_inc,
      loan_amnt / annual_inc_joint
    ),
    # Calculate percentage of monthly income the installment payment represents
    installment_pct_inc = if_else(
      application_type == "Individual",
      installment / (annual_inc / 12),
      installment / (annual_inc_joint / 12)
    ),
    # Calculate percentage of monthly income the installment payment represents
    adjusted_dti = if_else(
      application_type == "Individual",
      (loan_amnt + tot_cur_bal) / (annual_inc),
      (loan_amnt + tot_cur_bal) / (annual_inc_joint)
    ),
    #  Calculate utilization on installment accounts excluding mortgage balance
    il_util_ex_mort = if_else(
      total_il_high_credit_limit > 0,
      total_bal_ex_mort / total_il_high_credit_limit,
      0
    ),
    # Fill debt to income joint with individual debt to income where missing
    dti_joint = coalesce(dti_joint, dti),
    # Fill annual income joint with individual annual income where missing
    annual_inc_joint = coalesce(annual_inc_joint, annual_inc)
  ) |>
  collect()
```

After running `collect()`, when we have our data frame in R, we gain access to additional advanced R functions that are not available in SQL. This enables us to further refine and clean our data as needed.

```{r}
#| label: post-collect-data-clean
lendingclub_dat_clean <- lendingclub_dat |>
  mutate(
    # Missing values for these columns seem most appropriate to fill with zero
    across(
      c(
        "inq_fi", "dti", "all_util", "percent_bc_gt_75", "il_util", "avg_cur_bal", "all_util", "il_util", "inq_last_6mths", "inq_last_12m", "num_tl_120dpd_2m", "open_il_12m", "open_il_24m", "open_rv_12m", "open_rv_24m"
      ),
      ~ replace_na(., 0)
    ),
    # Missing values for these columns seem most appropriate to fill with column max
    across(
      c(
        "mo_sin_old_il_acct", "mths_since_last_major_derog", "mths_since_last_delinq", "mths_since_recent_bc", "mths_since_last_record", "mths_since_rcnt_il", "mths_since_recent_bc", "mths_since_recent_bc_dlq", "mths_since_recent_inq", "mths_since_recent_revol_delinq", "mths_since_recent_revol_delinq"
      ),
      ~ replace_na(., max(., na.rm = TRUE))
    ),
    # Remove percent sign
    int_rate = as.numeric(str_remove(int_rate, "%")),
    # Remove percent sign
    revol_util = as.numeric(str_remove(revol_util, "%")),
    # Create variable for earliest line of credit
    earliest_cr_line = parse_date_time2(
      paste("01", earliest_cr_line, sep = "-"), "dmy",
      cutoff_2000 = 50L
    ),
    # Calculate time since earliest line of credit
    age_earliest_cr = interval(
      as.Date(earliest_cr_line),
      as.Date(today())
    ) %/% days(1),
    # Convert characters to factors
    across(where(is.character), .fns = as.factor),
    term = trimws(term)
  )
```

With our data cleaned up, it's time to examine the variables we intend to include in the model. We create vectors for conveniently referencing variable categories later if necessary. Then, we compile all the desired columns into one vector named `all_vars` for use in our model.

```{r}
#| label: create-variable-vectors
applicant_numeric <- c(
  "annual_inc", "dti", "age_earliest_cr", "loan_amnt", "installment"
  )

applicant_text <- c("emp_title", "title")

applicant_categorical <- c("application_type", "emp_length", "term")

credit_numeric <- c(
  "acc_open_past_24mths", "avg_cur_bal", "bc_open_to_buy", "bc_util", 
  "delinq_2yrs", "open_acc", "pub_rec", "revol_bal", "tot_coll_amt", 
  "tot_cur_bal", "total_acc", "total_rev_hi_lim", "num_accts_ever_120_pd", 
  "num_actv_bc_tl", "num_actv_rev_tl", "num_bc_sats", "num_bc_tl", "num_il_tl", 
  "num_rev_tl_bal_gt_0", "pct_tl_nvr_dlq", "percent_bc_gt_75", "tot_hi_cred_lim", 
  "total_bal_ex_mort", "total_bc_limit", "total_il_high_credit_limit", 
  "total_rev_hi_lim", "all_util", "loan_to_income", "installment_pct_inc", 
  "il_util", "il_util_ex_mort", "total_bal_il", "total_cu_tl"
  )

NUMERIC_VARS_QB_20 <- c(
  "inq_last_6mths", "mo_sin_old_il_acct", "mo_sin_old_rev_tl_op", 
  "mo_sin_old_rev_tl_op", "mo_sin_rcnt_tl", "mort_acc", "num_op_rev_tl", 
  "num_rev_accts", "num_sats", "pub_rec", "pub_rec_bankruptcies", "tax_liens", 
  "all_util", "loan_to_income"
  )

NUMERIC_VARS_QB_5 <- c("num_tl_120dpd_2m")

NUMERIC_VARS_QB_10 <- c(
  "mths_since_last_delinq", "mths_since_last_major_derog", 
  "mths_since_last_record", "mths_since_rcnt_il", "mths_since_recent_bc", 
  "mths_since_recent_bc_dlq", "mths_since_recent_inq", 
  "mths_since_recent_revol_delinq", "num_tl_90g_dpd_24m", "num_tl_op_past_12m"
  )

NUMERIC_VARS_QB_50 <- c(
  "installment", "bc_open_to_buy", "loan_amnt", "total_bc_limit", "percent_bc_gt_75"
  )

mean_impute_vals <- c(
  "bc_util", "num_rev_accts", "bc_open_to_buy", "percent_bc_gt_75", 
  "total_bal_il", "total_il_high_credit_limit", "total_cu_tl"
  )

all_vars <- c(
  applicant_numeric, applicant_categorical, credit_numeric,
  NUMERIC_VARS_QB_20, NUMERIC_VARS_QB_5, NUMERIC_VARS_QB_10, NUMERIC_VARS_QB_50
)
```

Next, we proceed to select only the relevant columns for our model and remove any missing values in our `int_rate` variable.

```{r}
#| label: create-model-dataset
lendingclub_dat_cols <-
  lendingclub_dat_clean |>
  select(int_rate, all_of(all_vars)) |>
  filter(!is.na(int_rate))
```

To ensure our dataset is free of any missing values, we can run the below:

```{r}
#| label: check-nas
#| code-fold: false
colSums(is.na(lendingclub_dat_cols))
```

We can ensure that no factor columns have fewer than two factors by running the below:

```{r}
#| label: check-factors
#| code-fold: false
lendingclub_dat_cols |>
  select(where(is.factor)) |>
  select(where(~ nlevels(.) < 2))
```

## Model creation

With our dataset prepared, we can begin the modeling process. This involves creating train and test datasets, which we'll use to train and evaluate our model.

```{r}
#| label: create-test-train
set.seed(1234)

train_test_split <- initial_split(lendingclub_dat_cols)
lend_train <- training(train_test_split)
lend_test <- testing(train_test_split)
```

Using tidymodels, we can construct a "recipe" detailing the decisions we want to make for our data:

* `step_normalize()`: This recipe step normalizes numeric data to have a standard deviation of one and a mean of zero. Given that our dataset contains numeric values of various units (e.g., dollars, months), normalization ensures that variables with larger value ranges do not disproportionately influence the model.
* `step_impute_mean()`: This recipe step replaces missing values of numeric variables with the mean of those values in the training set. Performing this step now, rather than during the data cleaning phase, prevents information leakage from the testing set into the model. By calculating the mean solely using the training set, we avoid bias when applying it to the testing data during model evaluation.

```{r}
#| label: create-recipe
rec_obj <- recipe(int_rate ~ ., data = lend_train) |>
  step_normalize(applicant_numeric, credit_numeric) |>
  step_impute_mean(mean_impute_vals)
```

To verify that the recipe functions as intended, we can run `prep(rec_obj, lend_train) |> bake(newdata = NULL)`. This will display the data that the workflow will provide to the model.

```{r}
#| label: display-workflow-data
#| warning: false
#| message: false
prep(rec_obj, lend_train) |>
  bake(new_data = NULL)
```

Now, we proceed with a linear model. `linear_reg()` specifies a model capable of predicting numeric values from predictors using a linear function. We construct a workflow incorporating our recipe alongside the linear model.

```{r}
#| label: run-linear-workflow
lend_linear <- linear_reg()

lend_linear_wflow <-
  workflow() |>
  add_model(lend_linear) |>
  add_recipe(rec_obj)
```

Now, we can fit our model using `fit()` along with the training dataset.

```{r}
#| label: fit-linear-workflow
lend_linear_fit <-
  lend_linear_wflow |>
  fit(data = lend_train)
```

Below are our predicted lending rates:

```{r}
#| label: predict-linear-workflow
#| warning: false
predict(lend_linear_fit, lend_test)
```

Now, let's evaluate our performance. We can analyze our results by calculating the coefficient of determination, which is 0.6 in this case. While this indicates some level of explanatory power, it suggests that our model's estimate may not be very robust.

```{r}
#| label: rsq-linear-workflow
#| warning: false
lend_linear_results <-
  bind_cols(predict(lend_linear_fit, lend_train)) |>
  bind_cols(lend_train |>
    select(int_rate))

rsq(lend_linear_results, truth = int_rate, estimate = .pred)
```

However, one of the advantages of tidymodels is the ability to quickly pivot with the steps we've already executed. Instead of a linear model, we can now define a random forest model.

```{r}
#| label: set-ranger-workflow
lend_rand <- rand_forest(mode = "regression") |>
  set_engine("ranger", importance = "permutation")
```

Then, we can pass that model into our workflow and proceed to fit the model.

<!---The random forest model takes several minutes to run. To expedite document rendering, I've saved the model in a pin and called it from the chunk below. -->

```{r}
#| label: pull-ranger-pin
#| include: false
board <-
  board_connect(
    auth = "manual",
    server = Sys.getenv("CONNECT_SERVER"),
    key = Sys.getenv("CONNECT_API_KEY")
  )

lend_ranger_fit <-
  pin_read(
    board = board,
    name = "isabella.velasquez@posit.co/lend_ranger_fit"
  )
```

```{r}
#| label: run-fit-ranger-workflow
#| eval: false
lend_ranger_wflow <-
  workflow() |>
  add_model(lend_rand) |>
  add_recipe(rec_obj)

lend_ranger_fit <-
  lend_ranger_wflow |>
  fit(data = lend_train)
```

Recalculating the coefficient of determination, we get:


```{r}
#| label: rsq-ranger-workflow
lend_ranger_results <-
  bind_cols(predict(lend_ranger_fit, lend_train)) |>
  bind_cols(lend_train |> select(int_rate))

rsq(lend_ranger_results, truth = int_rate, estimate = .pred)
```

Now, that's a significant improvement!

Given the number of variables in our model, it's natural to want to identify the most important ones. We can compute variable importance by running `vi` to achieve this.

```{r}
#| label: vi-ranger-workflow
vip:::vi(lend_ranger_fit) |>
  arrange(desc(Importance))
```

According to this analysis, the most important variables are "term", "installment_pct_inc", "bc_open_to_buy", "installment", and "percent_bc_gt_75".

# Model logging and artifact storage

If we aim to log our model for use in other contexts, like a Shiny app, we can utilize the [vetiver package]. This tool facilitates deploying and maintaining machine learning models in production, allowing us to store models in a pin for convenient access and reference.

Let's reconstruct our model using only the top five variables of importance.

```{r}
#| label: select-ranger-workflow
#| eval: false
imp_var <- c("term", "installment_pct_inc", "bc_open_to_buy", "installment", "percent_bc_gt_75")

lendingclub_dat_cols_select <-
  lendingclub_dat_cols |>
  select(int_rate, all_of(imp_var))

train_test_split_select <- initial_split(lendingclub_dat_cols_select)

lend_train_select <- training(train_test_split_select)
lend_test_select <- testing(train_test_split_select)

rec_obj_select <- recipe(int_rate ~ ., data = lend_train_select) |>
  step_normalize(c("installment", "installment_pct_inc", "percent_bc_gt_75")) |>
  step_impute_mean("bc_open_to_buy") |>
  step_other()

lend_select_wflow <-
  workflow() |>
  add_model(lend_rand) |>
  add_recipe(rec_select_obj)

lend_select_fit <-
  lend_select_wflow |>
  fit(data = lend_train_select)
```

We can create a vetiver object `v` to store the trained model.

```{r}
#| label: create-vetiver-obj
#| eval: false
v <- vetiver_model(lend_select_fit, "lend_fit")
```

We can deploy the model by creating a special Plumber router in R with the plumber package. We add a POST endpoint for making predictions. Following that, we connect to the destination where we intend to store our model artifact. In our case, this would be Posit Connect. Then, we save the vetiver model to a pin for future access.

```{r}
#| label: add-post-endpoint
#| eval: false
board <-
  board_connect(
    auth = "manual",
    server = Sys.getenv("CONNECT_SERVER"),
    key = Sys.getenv("CONNECT_API_KEY")
  )

board |> vetiver_pin_write(v)
```

Now, we can deploy the model to Posit Connect to generate predictions as needed.

```{r}
#| label: deploy-vetiver-model
#| eval: false
vetiver_deploy_rsconnect(
  board = board,
  name = "isabella.velasquez@posit.co/lend_fit",
  predict_args = list(debug = TRUE)
)
```

Now that our model is available through an API, we can use it in other places -- such as a Shiny app!

# Interactive Shiny app

Since our model is pinned in an API, it can be retrieved by other applications, such as a Shiny app. Within the `server` section of our Shiny app, we retrieve the data by connecting to our pin board:

```{r}
#| label: connect-shiny-app
#| eval: false
#| code-fold: false
board <-
  board_connect(server = Sys.getenv("CONNECT_SERVER"),
                key = Sys.getenv("CONNECT_API_KEY"))
```

Access the endpoint URL with `vetiver_endpoint()`:

```{r}
#| label: access-shiny-endpoint
#| code-fold: false
#| eval: false
url <- "https://pub.demo.posit.team/public/lend-fit-vetiver-api/"

endpoint <- vetiver_endpoint(url)
```

Also in the server portion of the Shiny app, establish a reactive variable (named `predictions_df` here) that generates a tibble from the user's inputs and passes it to the endpoint.

```{r}
#| label: make-shiny-predictions
#| eval: false
#| code-fold: false
predictions_df <- reactive({
  req(
    input$select_term,
    input$input_installment_pct_inc,
    input$input_bc_open_to_buy,
    input$input_installment,
    input$percent_bc_gt_75
  )

  pred_tibble <-
    tibble(
      term = input$select_term,
      installment_pct_inc = input$input_installment_pct_inc,
      bc_open_to_buy = input$input_bc_open_to_buy,
      installment = input$input_installment,
      percent_bc_gt_75 = input$percent_bc_gt_75
    )

  url <- "https://pub.demo.posit.team/public/lend-fit-vetiver-api/predict"

  endpoint <- vetiver_endpoint(url)

  api_key <- Sys.getenv("CONNECT_API_KEY")

  predictions <- predict(
    endpoint,
    pred_tibble,
    httr::add_headers(Authorization = paste("Key", api_key))
  )
})
```

Below is our Shiny app for predicting lending rates. Each time you select an input, it calculates the rate based on our random forest model. The design and layout of the Shiny app is developed with [bslib](https://rstudio.github.io/bslib/).

```{r}
#| label: predit-shiny-app
#| eval: false
library(shiny)
library(bslib)
library(bsicons)
library(dplyr)
library(vetiver)
tidymodels::tidymodels_prefer(quiet = TRUE)

cards <- list(
  card(full_screen = TRUE,
       card_header(HTML("<br><br><br>Term of loan:")),
       card_body(
         selectInput(
           inputId = "select_term",
           choices = c("36 months", "60 months"),
           selected = "36 months",
           label = HTML("<br>Select 36 or 60 months:")
         )
       )), 
  card(full_screen = TRUE,
       card_header("Proportion of monthly income represented by the installment payment:"),
       card_body(
         numericInput(
           inputId = "input_installment_pct_inc",
           label = HTML("Input a number between<br>0 and 1:"),
           value = 0.07,
           min = 0,
           max = 1,
           step = 0.01
         )
       )),
  card(full_screen = TRUE,
       card_header(HTML("<br><br>Open to buy on revolving bankcards:")),
       card_body(
         numericInput(
           inputId = "input_bc_open_to_buy",
           label = HTML("Input a number between<br>0 and 500,000:"),
           value = 14467,
           min = 0,
           max = 500000,
           step = 1000
         )
       )),
  card(full_screen = TRUE,
       card_header(HTML("<br><br><br>Installment payment:")),
       card_body(
         numericInput(
           inputId = "input_installment",
           label = HTML("Input a number between<br>30 and 2000:"),
           value = 463,
           min = 30,
           max = 2000,
           step = 100
         )
       )),
  card(full_screen = TRUE,
       card_header(HTML("<br><br>Percentage of all bankcard accounts > 75")),
       card_body(
         numericInput(
           inputId = "input_percent_bc_gt_75",
           label = HTML("<br>Input a number<br>between 0 and 100:"),
           value = 50,
           min = 0,
           max = 100,
           step = 5
         )
       ))
)

vbs <- list(
  value_box(
    title = "Predicted interest rate",
    value = textOutput("pred_int"),
    style = "background-color: #082D46!important; color: #FFFFFF!important",
    showcase = bsicons::bs_icon("bank", size = "0.75em"),
    showcase_layout = "top right",
    full_screen = FALSE,
    fill = TRUE,
    height = NULL
  )
)

foot <-
  tags$div(
    style = "background-color: #FFFFFF; padding: 0px; text-align: center; bottom: 0; width: 100%;",
    HTML(
      "Powered by <a href='https://posit.co'><img src='https://www.rstudio.com/assets/img/posit-logo-fullcolor-TM.svg' alt='Posit Logo' style='width:55px;'></a> | Integrated with <a href='https://www.databricks.com'><img src='https://cdn.cookielaw.org/logos/29b588c5-ce77-40e2-8f89-41c4fa03c155/bc546ffe-d1b7-43af-9c0b-9fcf4b9f6e58/1e538bec-8640-4ae9-a0ca-44240b0c1a20/databricks-logo.png' alt='Databricks Logo' style='width:85px;'></a>. For more details, see our <a href='https://posit.co/blog/databricks-and-posit-announce-new-integrations/' target='_blank'>blog post</a> announcing the partnership."
    )
  )

ui <- bslib::page(
  title = "Interest rate prediction app",
  layout_columns(width = 1/5,
                 height = 275,
                 cards[[2]], cards[[3]], cards[[4]], cards[[5]], cards[[1]]),
  layout_columns(vbs[[1]]),
  card_footer(foot)
)

server <- function(input, output, session) {
  
  board <-
    board_connect(server = Sys.getenv("CONNECT_SERVER"),
                  key = Sys.getenv("CONNECT_API_KEY"))
  
  predictions_df <- reactive({
    
    req(input$select_term, input$input_installment_pct_inc, input$input_bc_open_to_buy, input$input_installment, input$input_percent_bc_gt_75)
    
    pred_tibble <-
      tibble(
        term = input$select_term,
        installment_pct_inc = input$input_installment_pct_inc,
        bc_open_to_buy = input$input_bc_open_to_buy,
        installment = input$input_installment,
        percent_bc_gt_75 = input$input_percent_bc_gt_75
      )
    
    url <-
      "https://pub.demo.posit.team/public/lend-fit-vetiver-api/predict"
    
    endpoint <- vetiver_endpoint(url)
    
    apiKey = Sys.getenv("CONNECT_API_KEY")
    
    predictions <-
      predict(endpoint,
              pred_tibble,
              httr::add_headers(Authorization = paste("Key", apiKey)))
    
  })
  
  output$pred_int <- renderText({
    predictions_df()$.pred
  })
  
}

shinyApp(ui, server)
```


:::{.column-page}

<iframe src="https://pub.demo.posit.team/public/lend-rate-prediction-app/" width="100%" height="500px"></iframe>

:::

In summary, we've gone on a comprehensive journey to predict lending rates using machine learning techniques within the context of financial analysis. We began by accessing our data from Databricks, cleaning our data, and using tidymodels for modeling and evaluation. We developed two predictive models, evaluated their performance, and identified key variables driving interest rate predictions.

Leveraging the power of vetiver, we deployed our model for seamless integration into production environments, such as Shiny apps. This holistic approach can help enhance decision-making processes within the financial domain.

# Learn more about the Databricks x Posit partnership

We believe our products are better together. Learn more about our partnership.

* Visit the [Databricks x Posit Solutions Page](https://posit.co/solutions/databricks/).
* View our [co-presented webinar](https://www.youtube.com/watch?v=iShpyDxzMeE) where we talked about improved productivity for your data teams.
* [Schedule a demo](https://posit.co/schedule-a-call/?booking_calendar__c=Databricks).