forked from ndphillips/FFTrees
-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
318 lines (216 loc) · 15.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please only edit the .Rmd file! -->
```{r setup, include = FALSE}
# Chunk options:
knitr::opts_chunk$set(collapse = FALSE,
comment = "#>",
message = FALSE,
warning = FALSE,
# Default figure options:
fig.path = "man/figures/README-",
fig.align = 'center', # ignored
fig.width = 7.0,
fig.height = 6.0,
out.width = "650")
# URLs:
url_pkg_CRAN <- "https://CRAN.R-project.org/package=FFTrees"
url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees"
url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues"
url_doc_GitHub <- "https://ndphillips.github.io/FFTrees/"
url_app_shiny <- "https://econpsychbasel.shinyapps.io/shinyfftrees/"
url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html"
url_JDM_html <- "https://journal.sjdm.org/17/17217/jdm17217.html"
url_JDM_pdf <- "https://journal.sjdm.org/17/17217/jdm17217.pdf"
url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"
doi_JDM <- "10.1017/S1930297500006239"
```
<!-- Title, version and logo: -->
# FFTrees `r packageVersion("FFTrees")` <img src = "man/figures/logo.png" align = "right" alt = "FFTrees" width = "160" />
<!-- Devel badges start: -->
[![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees)
[![Downloads/month](https://cranlogs.r-pkg.org/badges/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml)
<!-- Devel badges end. -->
<!-- Release badges start: -->
<!-- [![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees) -->
<!-- [![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees) -->
<!-- Release badges end. -->
<!-- ALL badges start: -->
<!-- [![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees) -->
<!-- [![Build Status](https://travis-ci.org/ndphillips/FFTrees.svg?branch=master)](https://travis-ci.org/ndphillips/FFTrees) -->
<!-- [![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml) -->
<!-- [![Downloads/month](https://cranlogs.r-pkg.org/badges/FFTrees?color=brightgreen)](https://www.r-pkg.org/pkg/FFTrees) -->
<!-- [![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='b22222')](https://www.r-pkg.org/pkg/FFTrees) -->
<!-- ALL badges end. -->
<!-- Pkg goal: -->
The R package **FFTrees** creates, visualizes and evaluates _fast-and-frugal decision trees_ (FFTs) for solving binary classification tasks, using the algorithms and methods described in Phillips, Neth, Woike & Gaissmaier (2017, [`r doi_JDM`](`r url_JDM_doi`)).
## What are fast-and-frugal trees (FFTs)?
_Fast-and-frugal trees_\ (FFTs) are simple and transparent decision algorithms for solving binary classification problems.
The key feature making FFTs faster and more frugal than other decision trees is that every node allows making a decision.
When predicting novel cases, the performance of FFTs competes with more complex algorithms and machine learning techniques, such as logistic regression\ (LR), support-vector machines\ (SVM), and random forests\ (RF).
Apart from being faster and requiring less information, FFTs tend to be robust against overfitting, and are easy to interpret, use, and communicate.
<!-- Quote (cited in guide.Rmd): -->
<!-- In the words of @burton2020: -->
<!-- "human users could interpret, justify, control, and interact with a fast-and-frugal decision aid..." -->
<!-- [@burton2020. p.\ 229] -->
<!-- Full quote: -->
<!-- These fast-and-frugal trees (Hafenbrädl et al.,2016; Phillips, Neth, Woike, & Gaissmaier, 2017) are especially relevant to the algorithm aversion discussion not only because they allow the human decision maker to dictate the external measures upon which an augmented decision will be judged, but also because they are transparent. This in turn suggests that human users could interpret, justify, control, and interact with a fast-and-frugal decision aid, which touches on virtually all the drivers of algorithm aversion covered in this review. -->
<!-- Source: [@burton2020. p.\ 229] -->
## Installation
The latest release of **FFTrees** is available from [CRAN](https://CRAN.R-project.org/) at <`r url_pkg_CRAN`>:
```{r install-cran, eval = FALSE}
install.packages("FFTrees")
```
The current development version can be installed from its [GitHub](https://github.com) repository at <`r url_pkg_GitHub`>:
```{r install-gitub, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ndphillips/FFTrees", build_vignettes = TRUE)
```
## Getting started
As an example, let's create a FFT predicting patients' heart disease status (_Healthy_ vs. _Disease_) based on the `heartdisease` dataset included in **FFTrees**:
```{r load-pkg, message = FALSE}
library(FFTrees) # load package
```
### Using data
The `heartdisease` data provides medical information for 303\ patients that were examined for heart disease.
The full data contains a binary criterion variable describing the true state of each patient and were split into two subsets:
A `heart.train` set for fitting decision trees, and `heart.test` set for a testing these trees.
Here are the first rows and columns of both subsets of the `heartdisease` data:
- `heart.train` (the training / fitting data) describes `r nrow(heart.train)` patients:
```{r data-train, echo = FALSE}
n_train <- nrow(heart.train)
c_train <- paste0("**Table 1**: Beginning of the `heart.train` subset (using the data of ", n_train, " patients for fitting/training FFTs).")
knitr::kable(head(heart.train), caption = c_train)
```
- `heart.test` (the testing / prediction data) describes `r nrow(heart.test)` different patients on the same variables:
```{r data-test, echo = FALSE}
n_test <- nrow(heart.test)
c_test <- paste0("**Table 2**: Beginning of the `heart.test` subset (used to predict `diagnosis` for ", n_test, " new patients).")
knitr::kable(head(heart.test), caption = c_test)
```
Our challenge is to predict each patient's `diagnosis` ---\ a column of logical values indicating the true state of each patient (i.e., `TRUE` or\ `FALSE`, based on the patient suffering or not suffering from heart disease)\ --- from the values of potential predictors.
### Questions answered by FFTs
To solve binary classification problems by FFTs, we must answer two key questions:
- Which of the variables should we use to predict the criterion?
- How should we use and combine predictor variables into FFTs?
Once we have created some FFTs, additional questions include:
- How accurate are the predictions of a specific FFT?
- How costly are the predictions of each algorithm?
The **FFTrees** package answers these questions by creating, evaluating, and visualizing FFTs.
### Creating fast-and-frugal trees (FFTs)
We use the main `FFTrees()` function to create FFTs for the `heart.train` data and evaluate their predictive performance on the `heart.test` data:
```{r set-seed, echo = FALSE}
set.seed(246) # for reproducible randomness
```
- The main `FFTrees()` function allows creating an `FFTrees` object for the `heartdisease` data:
```{r example-heart-create, collapse = FALSE, results = 'hide'}
# Create an FFTrees object from the heartdisease data:
heart_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"))
```
```{r example-2-simple, echo = FALSE, eval = FALSE}
# Create FFTs for the heartdisease data:
x <- FFTrees(formula = diagnosis ~.,
data = heartdisease,
decision.labels = c("Healthy", "Disease"))
```
Evaluating `FFTrees()` analyzes the training data, creates several FFTs, and applies them to the test data.
The results are stored in an object `heart_fft`, which can be printed, plotted and summarized (with options for selecting specific data or trees).
<!-- - Printing an `FFTrees` object shows basic information and summary statistics (on the best training tree, FFT\ #1): -->
```{r example-heart-print, echo = FALSE, eval = FALSE}
# Print:
heart_fft
```
- Let's plot our `FFTrees` object to visualize a tree and its predictive performance (on the `test` data):
```{r example-heart-plot, eval = TRUE}
# Plot the best tree applied to the test data:
plot(heart_fft,
data = "test",
main = "Heart Disease")
```
**Figure\ 1**: A fast-and-frugal tree (FFT) predicting heart disease for `test` data and its performance characteristics.
- A summary of the trees in our `FFTrees` object and their key performance statistics can be obtained by `summary(heart_fft)`.
<!-- FFTs by verbal description: -->
### Building FFTs from verbal descriptions
FFTs are so simple that we even can create them 'from words' and then apply them to data.
For example, let's create a tree with the following three nodes and evaluate its performance on the `heart.test` data:
1. If `sex = 1`, predict _Disease_.
2. If `age < 45`, predict _Healthy_.
3. If `thal = {fd, normal}`, predict _Healthy_,
otherwise, predict _Disease_.
These conditions can directly be supplied to the `my.tree` argument of `FFTrees()`:
```{r example-heart-verbal, eval = TRUE}
# Create custom FFT 'in words' and apply it to test data:
# 1. Create my own FFT (from verbal description):
my_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"),
my.tree = "If sex = 1, predict Disease.
If age < 45, predict Healthy.
If thal = {fd, normal}, predict Healthy,
Otherwise, predict Disease.")
# 2. Plot and evaluate my custom FFT (for test data):
plot(my_fft,
data = "test",
main = "My custom FFT")
```
**Figure\ 2**: An FFT predicting heart disease created from a verbal description.
The performance measures (in the bottom panel of **Figure\ 2**) show that this particular tree is somewhat biased:
It has nearly perfect _sensitivity_ (i.e., is good at identifying cases of _Disease_) but suffers from low _specificity_ (i.e., performs poorly in identifying _Healthy_ cases).
Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms.
Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (created above) strike a better balance.
<!-- A range of options, rather than 1 optimum: -->
Overall, what counts as the "best" tree for a particular problem depends on many factors (e.g., the goal of fitting vs. predicting data and the trade-offs between maximizing accuracy vs. incorporating the costs of cues or errors).
To explore this range of options, the **FFTrees** package enables us to design and evaluate a range of FFTs.
<!-- Available resources: -->
## Resources
The following versions of **FFTrees** and corresponding resources are available:
Type: | Version: | URL: |
:---------------------------|:----------------|:-------------------------------|
A. **FFTrees** (R package): | [Release version](`r url_pkg_CRAN`) | <`r url_pkg_CRAN`> |
| [Development version](`r url_pkg_GitHub`) | <`r url_pkg_GitHub`> |
B. Other resources: | [Online documentation](`r url_doc_GitHub`) | <`r url_doc_GitHub`> |
| [Online demo](`r url_app_shiny`) (running v1.3.3) | <`r url_app_shiny`> |
<!-- Citation and references: -->
## References
We had fun creating the **FFTrees** package and hope you like it too!
As a comprehensive, yet accessible introduction to FFTs, we recommend our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).
**Citation** (in APA format):
- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017).
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees.
_Judgment and Decision Making_, _12_ (4), 344–368.
doi\ [`r doi_JDM`](`r url_JDM_doi`)
We encourage you to read the article to learn more about the history of FFTs and how the **FFTrees** package creates, visualizes, and evaluates them.
When using **FFTrees** in your own work, please cite us and share your experiences (e.g., [on GitHub](`r url_pkg_issues`)) so we can continue developing the package.
<!-- Examples uses/publications (with links): -->
By\ 2024, over 130\ scientific publications have used or cited **FFTrees** (see [Google Scholar](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=205528310591558601) for the full list).
Examples include:
- Lötsch, J., Haehner, A., & Hummel, T. (2020). Machine-learning-derived rules set excludes risk of Parkinson’s disease in patients with olfactory or gustatory symptoms with high accuracy.
_Journal of Neurology_, _267_(2), 469--478.
doi\ [10.1007/s00415-019-09604-6](https://doi.org/10.1007/s00415-019-09604-6)
- Kagan, R., Parlee, L., Beckett, B., Hayden, J. B., Gundle, K. R., & Doung, Y. C. (2020).
Radiographic parameter-driven decision tree reliably predicts aseptic mechanical failure of compressive osseointegration fixation.
_Acta Orthopaedica_, _91_(2), 171--176.
doi\ [10.1080/17453674.2020.1716295](https://doi.org/10.1080/17453674.2020.1716295)
- Klement, R. J., Sonke, J. J., Allgäuer, M., Andratschke, N., Appold, S., Belderbos, J., ... & Mantel, F. (2020).
Correlating dose variables with local tumor control in stereotactic body radiotherapy for early stage non-small cell lung cancer: A modeling study on 1500 individual treatments.
_International Journal of Radiation Oncology * Biology * Physics_.
doi\ [10.1016/j.ijrobp.2020.03.005](https://doi.org/10.1016/j.ijrobp.2020.03.005)
- Nobre, G. G., Hunink, J. E., Baruth, B., Aerts, J. C., & Ward, P. J. (2019).
Translating large-scale climate variability into crop production forecast in Europe.
_Scientific Reports_, _9_(1), 1--13.
doi\ [10.1038/s41598-018-38091-4](https://doi.org/10.1038/s41598-018-38091-4)
- Buchinsky, F. J., Valentino, W. L., Ruszkay, N., Powell, E., Derkay, C. S., Seedat, R. Y., ... & Mortelliti, A. J. (2019).
Age at diagnosis, but not HPV type, is strongly associated with clinical course in recurrent respiratory papillomatosis.
_PloS One_, _14_(6).
doi\ [10.1371/journal.pone.0216697](https://doi.org/10.1371/journal.pone.0216697)
<!-- footer: -->
----
[File `README.Rmd` last updated on `r Sys.Date()`.]
<!-- eof. -->