diff --git a/curriculum/3_modeling_and_machine_learning/machine-learning/CV_methods_comparison.ipynb b/curriculum/3_modeling_and_machine_learning/machine-learning/CV_methods_comparison.ipynb index 610bacfe..b25806ba 100644 --- a/curriculum/3_modeling_and_machine_learning/machine-learning/CV_methods_comparison.ipynb +++ b/curriculum/3_modeling_and_machine_learning/machine-learning/CV_methods_comparison.ipynb @@ -2,7 +2,10 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "deletable": true, + "editable": true + }, "source": [ "# Testing Cross Validation Strategies\n", "Many data science projects aim for generalizable models---models that perform well on new data---such as scoring new credit card applicants or predicting who will win a football game. A common approach to modeling is to split your data, using different splits for different tasks. If you have a lot of data, you might split it into three subsets:\n", @@ -45,18 +48,13 @@ }, { "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/joewalsh/Library/Python/3.6/lib/python/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n", - " from pandas.core import datetools\n" - ] - } - ], + "execution_count": 3, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [], "source": [ "import csv\n", "from itertools import product\n", @@ -64,6 +62,8 @@ "import pickle\n", "import random\n", "import warnings\n", + "import timeit\n", + "import math\n", "warnings.filterwarnings(action=\"ignore\", module=\"scipy\", message=\"^internal gelsd\")\n", "\n", "import numpy as np\n", @@ -74,7 +74,10 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "deletable": true, + "editable": true + }, "source": [ "## Regression Example: Predicting US Gross Domestic Product\n", "Let's say it's December 31, 2017, and we want to predict next year's nominal US Gross Domestic Product (GDP). We have two intermediate tasks:\n", @@ -113,7 +116,11 @@ { "cell_type": "code", "execution_count": 3, - "metadata": {}, + "metadata": { + "collapsed": true, + "deletable": true, + "editable": true + }, "outputs": [], "source": [ "GDP_df = pd.read_csv('GDP.csv')" @@ -122,7 +129,11 @@ { "cell_type": "code", "execution_count": 4, - "metadata": {}, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, "outputs": [ { "data": { @@ -156,7 +167,10 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "deletable": true, + "editable": true + }, "source": [ "