diff --git a/notebooks/14_dimensionality_reduction.ipynb b/notebooks/14_dimensionality_reduction.ipynb index 24c5f4f..7607931 100644 --- a/notebooks/14_dimensionality_reduction.ipynb +++ b/notebooks/14_dimensionality_reduction.ipynb @@ -2095,6 +2095,24 @@ "- What made you intuitively decide which `n_neighbors` setting is best?" ] }, + { + "cell_type": "markdown", + "id": "9278b816", + "metadata": {}, + "source": [ + "## Limitations of dimentionality reduction\n", + "\n", + "Dimensionality reduction can be a very powerful tool in a data science analysis, both to better understand the data but also to communicate results or make them more intuitively explorable. The more intuitive display of large numbers of datapoints in 2D, or more rarely in 3D, can render large and complex datasets more accessible to us. However, we have to keep in mind that the reduction of high-dimensional datapoints to 2D or 3D goes along with a loss of information. This can be understood by looking at the example in {numref}`fig_dimensionality_reduction_limits`.\n", + "\n", + "As a consequence, no matter which technique we will choose, and no matter how much we optimize the respective parameters those methods we usually never end up with a perfect representation of the dataset. More specifically, we cannot expect that all similar datapoints will always end up close to each other in the resulting plot, nor will all distant datapoints be placed at adequate distance in 2D. This is not because the methods are not well-designed! It is simply that 2D is *too small* to represent all relationships of high-dimensional data (and 3D is not so much better).\n", + "\n", + "```{figure} ../images/fig_dimensionality_reduction_limits.png\n", + ":name: fig_dimensionality_reduction_limits\n", + "\n", + "In virtually all cases dimensionality reduction goes along with a loss of information. This can intuitively be understood by looking at the here displayed example: A case where 4 datapoints are all equally far from each other can be depicted easily in 3D (a tetrahedon). In 2D, however, such a situation is not possible. We could say that 2D is simply *too small* to display such a setting. In practice, the same will happen to projections of even higher dimensional data into 3D and so forth. \n", + "```\n" + ] + }, { "cell_type": "markdown", "id": "85f69656-0fa7-4cf5-aec8-6b50d9d67590",