Skip to content

Commit

Permalink
Added `ROC (receiver operating characteristic curve) Curve and AUC (a…
Browse files Browse the repository at this point in the history
…rea under curve)` section
  • Loading branch information
joshiayush committed Sep 9, 2023
1 parent 9da7d19 commit 9148452
Showing 1 changed file with 115 additions and 0 deletions.
115 changes: 115 additions & 0 deletions ai/ml/Machine_Learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3577,6 +3577,121 @@
"metadata": {
"id": "1rW7LFjjM39P"
}
},
{
"cell_type": "markdown",
"source": [
"## ROC Curve and AUC"
],
"metadata": {
"id": "2VdE-uz2RK6V"
}
},
{
"cell_type": "markdown",
"source": [
"### ROC Curve\n",
"\n",
"An __ROC curve (receiver operating characteristic curve)__ is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:\n",
"\n",
"* True positive rate\n",
"* False positive rate\n",
"\n",
"__True Positive Rate (TPR)__ is a synonym for recall and is therefore defined as follows:\n",
"\n",
"$\\mathrm{TPR} = \\dfrac{TP}{TP + FN}$\n",
"\n",
"__False Positive Rate (FPR)__ is defined as follows:\n",
"\n",
"$\\mathrm{FPR} = \\dfrac{FP}{FP + TN}$"
],
"metadata": {
"id": "MNvPzfLWRXP-"
}
},
{
"cell_type": "markdown",
"source": [
"An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.\n",
"\n",
"<div align=\"center\">\n",
"\n",
"<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/ROCCurve.svg\" width=\"400\" height=\"400\" />\n",
"\n",
"<strong>Figure 4. TP vs. FP rate at different classification thresholds.</strong>\n",
"\n",
"</div>"
],
"metadata": {
"id": "ap-mrLWdSI6z"
}
},
{
"cell_type": "markdown",
"source": [
"To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient. Fortunately, there's an efficient, sorting-based algorithm that can provide this information for us, called AUC."
],
"metadata": {
"id": "13PGBuFwSSxy"
}
},
{
"cell_type": "markdown",
"source": [
"## AUC: Area Under the ROC Curve\n",
"\n",
"__AUC__ stands for \"Area under the ROC Curve.\" That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).\n",
"\n",
"<div align=\"center\">\n",
"\n",
"<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/AUC.svg\" width=\"400\" height=\"400\" />\n",
"\n",
"<strong>Figure 5. AUC (Area under the ROC Curve).</strong>\n",
"\n",
"</div>"
],
"metadata": {
"id": "QVUsqcpbSrag"
}
},
{
"cell_type": "markdown",
"source": [
"AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. For example, given the following examples, which are arranged from left to right in ascending order of logistic regression predictions:\n",
"\n",
"<div align=\"center\">\n",
"\n",
"<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/AUCPredictionsRanked.svg\" />\n",
"\n",
"<strong>Figure 6. Predictions ranked in ascending order of logistic regression score.</strong>\n",
"\n",
"</div>"
],
"metadata": {
"id": "oAyf1QqgTJe9"
}
},
{
"cell_type": "markdown",
"source": [
"AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.\n",
"\n",
"AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.\n",
"\n",
"AUC is desirable for the following two reasons:\n",
"\n",
"* AUC is __scale-invariant__. It measures how well predictions are ranked, rather than their absolute values.\n",
"* AUC is __classification-threshold-invariant__. It measures the quality of the model's predictions irrespective of what classification threshold is chosen.\n",
"\n",
"However, both these reasons come with caveats, which may limit the usefulness of AUC in certain use cases:\n",
"\n",
"* __Scale invariance is not always desirable__. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.\n",
"\n",
"* __Classification-threshold invariance is not always desirable__. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization."
],
"metadata": {
"id": "kRtN5DlETSCg"
}
}
],
"metadata": {
Expand Down

0 comments on commit 9148452

Please sign in to comment.