You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2data-number="11.4" class="anchored" data-anchor-id="comparing-loss-functions"><spanclass="header-section-number">11.4</span> Comparing Loss Functions</h2>
767
767
<p>We’ve now tried our hand at fitting a model under both MSE and MAE cost functions. How do the two results compare?</p>
768
768
<p>Let’s consider a dataset where each entry represents the number of drinks sold at a bubble tea store each day. We’ll fit a constant model to predict the number of drinks that will be sold tomorrow.</p>
<p>From our derivations above, we know that the optimal model parameter under MSE cost is the mean of the dataset. Under MAE cost, the optimal parameter is the median of the dataset.</p>
<p>Notice that the MSE above is a <strong>smooth</strong> function – it is differentiable at all points, making it easy to minimize using numerical methods. The MAE, in contrast, is not differentiable at each of its “kinks.” We’ll explore how the smoothness of the cost function can impact our ability to apply numerical optimization in a few weeks.</p>
786
786
<p>How do outliers affect each cost function? Imagine we replace the largest value in the dataset with 1000. The mean of the data increases substantially, while the median is nearly unaffected.</p>
<p>This means that under the MSE, the optimal model parameter <spanclass="math inline">\(\hat{\theta}\)</span> is strongly affected by the presence of outliers. Under the MAE, the optimal parameter is not as influenced by outlying data. We can generalize this by saying that the MSE is <strong>sensitive</strong> to outliers, while the MAE is <strong>robust</strong> to outliers.</p>
800
800
<p>Let’s try another experiment. This time, we’ll add an additional, non-outlying datapoint to the data.</p>
<p>Other goals in addition to linearity are possible, for example, making data appear more symmetric. Linearity allows us to fit lines to the transformed data.</p>
872
872
<p>Let’s revisit our dugongs example. The lengths and ages are plotted below:</p>
<p>Looking at the plot on the left, we see that there is a slight curvature to the data points. Plotting the SLR curve on the right results in a poor fit.</p>
903
903
<p>For SLR to perform well, we’d like there to be a rough linear trend relating <code>"Age"</code> and <code>"Length"</code>. What is making the raw data deviate from a linear relationship? Notice that the data points with <code>"Length"</code> greater than 2.6 have disproportionately high values of <code>"Age"</code> relative to the rest of the data. If we could manipulate these data points to have lower <code>"Age"</code> values, we’d “shift” these points downwards and reduce the curvature in the data. Applying a logarithmic transformation to <spanclass="math inline">\(y_i\)</span> (that is, taking <spanclass="math inline">\(\log(\)</span><code>"Age"</code><spanclass="math inline">\()\)</span> ) would achieve just that.</p>
904
904
<p>An important word on <spanclass="math inline">\(\log\)</span>: in Data 100 (and most upper-division STEM courses), <spanclass="math inline">\(\log\)</span> denotes the natural logarithm with base <spanclass="math inline">\(e\)</span>. The base-10 logarithm, where relevant, is indicated by <spanclass="math inline">\(\log_{10}\)</span>.</p>
<p>For some constants <spanclass="math inline">\(C\)</span> and <spanclass="math inline">\(k\)</span>.</p>
939
939
<p><spanclass="math inline">\(y\)</span> is an <em>exponential</em> function of <spanclass="math inline">\(x\)</span>. Applying an exponential fit to the untransformed variables corroborates this finding.</p>
0 commit comments