Merge pull request #168 from yeemey/07-pandas

Correct column name to `patient1` in examples; fix minor typos.
zaneveld · Feb 6, 2024 · 59a443f · 59a443f
2 parents 5c70637 + 96bad31
commit 59a443f
Showing 1 changed file with 18 additions and 18 deletions.
diff --git a/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb b/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb
@@ -49,7 +49,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Option 2: Download the data direclty using the `urlretreive` function in python**."
+    "**Option 2: Download the data directly using the `urlretrieve` function in python**."
    ]
   },
   {
@@ -89,7 +89,7 @@
    "source": [
     "### Checking that the raw data is present in our current directory\n",
     "\n",
-    "As a first step, let's check that we have 'scenario1_otus.txt' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n",
+    "As a first step, let's check that we have 'scenario_1_otus_pandas.tsv' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n",
     "\n",
     "It can be useful to run it prior to trying to open files if you need to remember filenames. Let's import the `listdir` function and quickly check that our data is there..."
    ]
@@ -169,7 +169,7 @@
     "from pandas import read_csv\n",
     "\n",
     "#Load the text version of the table (a csv file) into python using pandas\n",
-    "feature_table = read_csv('scenario_1_otus_pandas.txt',sep=\"\\t\")"
+    "feature_table = read_csv('scenario_1_otus_pandas.tsv',sep=\"\\t\")"
    ]
   },
   {
@@ -402,7 +402,7 @@
     "\n",
     "We might want to access the rows or columns of our pandas data directly so that we can do calculations. A nice tutorial on this can be found here: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/.\n",
     "\n",
-    "In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the OTUs as columns, so we could access them with `feature_table[\"OTU1\"]`"
+    "In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the patients as columns, so we could access them with `feature_table[\"patient1\"]`"
    ]
   },
   {
@@ -471,7 +471,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Select all of the OTU1 column:\n",
+      "Select all of the patient1 column:\n",
       "OTU_ID\n",
       "OTU1    3\n",
       "OTU2    4\n",
@@ -484,7 +484,7 @@
    ],
    "source": [
     "#Recall that : means all, and we specify rows,columns when using .loc\n",
-    "print(\"Select all of the OTU1 column:\")\n",
+    "print(\"Select all of the patient1 column:\")\n",
     "selected_column = feature_table.loc[:,'patient1']\n",
     "print(selected_column)"
    ]
@@ -493,7 +493,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Cool! So we see that we now get the count of microbe 1 ('OTU1') in each patient's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n",
+    "Cool! So we see that we now get the count of all microbes in patient1's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n",
     "\n",
     "#### Select a row from a pandas DataFrame with `.loc`\n"
    ]
@@ -508,7 +508,7 @@
      "output_type": "stream",
      "text": [
       "\n",
-      "Select the patient1 row:\n",
+      "Select the OTU1 row:\n",
       "patient1     3\n",
       "patient2     4\n",
       "patient3     2\n",
@@ -526,7 +526,7 @@
     }
    ],
    "source": [
-    "print(\"\\nSelect the patient1 row:\")\n",
+    "print(\"\\nSelect the OTU1 row:\")\n",
     "selected_row = feature_table.loc['OTU1',:]\n",
     "print(selected_row)\n",
     "\n"
@@ -611,7 +611,7 @@
      "output_type": "stream",
      "text": [
       "\n",
-      "feature_table['OTU1'] is an object of type: <class 'pandas.core.series.Series'>\n"
+      "feature_table['patient1'] is an object of type: <class 'pandas.core.series.Series'>\n"
      ]
     }
    ],
@@ -620,7 +620,7 @@
     "selected_column_type = type(selected_column)\n",
     "\n",
     "#WHAT IS THIS? Print the answer to screen\n",
-    "print(\"\\nfeature_table['OTU1'] is an object of type:\", selected_column_type)"
+    "print(\"\\nfeature_table['patient1'] is an object of type:\", selected_column_type)"
    ]
   },
   {
@@ -655,7 +655,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Log(OTU1 data): OTU_ID\n",
+      "Log(patient1 data): OTU_ID\n",
       "OTU1    1.386294\n",
       "OTU2    1.609438\n",
       "OTU3    0.693147\n",
@@ -674,9 +674,9 @@
     "#Select the patient1 column of our DataFrame\n",
     "OTU1_data = feature_table.loc[:,\"patient1\"]\n",
     "\n",
-    "log_OTU1_data = log(OTU1_data +1)\n",
+    "log_patient1_data = log(patient1_data +1)\n",
     "\n",
-    "print(\"Log(OTU1 data):\", log_OTU1_data)\n"
+    "print(\"Log(patient1 data):\", log_patient1_data)\n"
    ]
   },
   {
@@ -703,8 +703,8 @@
     }
    ],
    "source": [
-    "OTU1_data = list(feature_table.loc[:,\"patient1\"])\n",
-    "print(OTU1_data)"
+    "patient1_data = list(feature_table.loc[:,\"patient1\"])\n",
+    "print(patient1_data)"
    ]
   },
   {
@@ -732,8 +732,8 @@
     }
    ],
    "source": [
-    "OTU1_data_as_array = feature_table.loc[:,\"patient1\"].values\n",
-    "print(\"Data as a numpy array:\",OTU1_data_as_array)"
+    "patient1_data_as_array = feature_table.loc[:,\"patient1\"].values\n",
+    "print(\"Data as a numpy array:\",patient1_data_as_array)"
    ]
   },
   {