forked from stepthom/sandbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
kiva_answers.Rmd
122 lines (82 loc) · 3.65 KB
/
kiva_answers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: 'Using Text Analytics to Predict Loan Default'
subtitle: "A case study of Kiva"
author: "Dr. Stephen W. Thomas, Smith School of Business, Queen's University"
date: "January 2018"
documentclass: article
output:
pdf_document:
highlight: pygments
number_sections: yes
toc: no
toc_depth: '2'
latex_engine: xelatex
word_document:
toc: no
toc_depth: '2'
---
# Teaching Note
```{r setup, include=FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE, fig.align='center')
```
## Intro Questions
- Who has used Kiva?
- Why did you choose Kiva?
- How did you choose the borrower?
- What was your experience like?
- Did you continue to lend to Kiva, and why or why not?
## Case Discussion Questions
_1. What is Kiva's value proposition?_
Kiva provides a connection between loan seekers with lenders.
_4. What factors might go into a lender's decision to lend money to a borrower?_
Answers will vary. Here is a previous class's answer:
- Loan amount
- Country
- Purpose
- Repayment history
- Backed by partner or not
- Feeling good about life in general
_5. How does text data affect a prediction model's ability to predict a default?_
It increases the accuracy and all the other metrics.
_7. According to the decision tree models, which variables best predict a default?_
- `nonpayment`. If "partner", then the loan very likely to be paid.
- `Topic 8`. If present, then more likely to be paid. Topic is "children, esperanza, support, women".-
- `Topic 12`. If present, then more likely to be paid. Topic is "rice, farmers, baba, sector."
- `country`. Kenya is most likely to default.
- `loan_amount`. Large loan are more likely to be paid.
_7. What other text analytics/NLP techniques might be applied to the dataset to improve the prediction model?_
- More cleaning: stemming, fixing mispelling
- Translating all to English
- Using n-grams/phrases
- Using word2vec representations
- Extracting some structured data:
- How many kids
- The name
- Whether they have a job
- The name of the field partner
- Age
_8. What additional information might lead to a better predictive model?_
Ideas from previous classes:
- Loan history
- separate models per country
- translating all to english
- extracting age from text
- extract number of children from text.
- More data
_9. How should Kiva operationalize the prediction model? What technical challenges and risks do you envision? What procedural challenges and risks do you envision?_
One way to deploy the model is to publish the prediction next to each loan request.
Some technical challenges:
- The model must perform the predition in a timely manner.
- There must be sufficient integration between the model and the webpage.
- The model must be constantly updated to remain current.
Some procedural challenges:
- Borrowers might figure out the model (i.e., which words/phrases will predict a default), and change their stories accordingly, reducing the effectivenes of the model.
- Overall lending might decrease, since anyone predicted to be risky would stop getting loans.
- On the other hand, overall lending might increase, as lenders might be willing to lend more money to borrowers who have a positive prediction.
- Overall lending might be more concentrated than it ought to be.
Some other deployment ideas:
- Another option is to use the risk factor to not allow the borrower to post the request if the model predicts below a certain threshold.
- Another option still is to just show the predicted value to Kiva employees, who could (e.g.) use it to write a blog, report, or white paper that potential lenders could read.