From d7a4ddbd4905f4b09eb57e372f5c177e7c3bb78a Mon Sep 17 00:00:00 2001 From: Yee Mey <4806863+yeemey@users.noreply.github.com> Date: Thu, 1 Feb 2024 00:30:26 -0800 Subject: [PATCH 1/2] Fix typos in analyzing_tabular_omics_data_in_pandas.ipynb --- .../analyzing_tabular_omics_data_in_pandas.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb b/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb index 67d9718..525d099 100644 --- a/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb +++ b/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb @@ -49,7 +49,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Option 2: Download the data direclty using the `urlretreive` function in python**." + "**Option 2: Download the data directly using the `urlretrieve` function in python**." ] }, { @@ -89,7 +89,7 @@ "source": [ "### Checking that the raw data is present in our current directory\n", "\n", - "As a first step, let's check that we have 'scenario1_otus.txt' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n", + "As a first step, let's check that we have 'scenario_1_otus_pandas.tsv' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n", "\n", "It can be useful to run it prior to trying to open files if you need to remember filenames. Let's import the `listdir` function and quickly check that our data is there..." ] @@ -169,7 +169,7 @@ "from pandas import read_csv\n", "\n", "#Load the text version of the table (a csv file) into python using pandas\n", - "feature_table = read_csv('scenario_1_otus_pandas.txt',sep=\"\\t\")" + "feature_table = read_csv('scenario_1_otus_pandas.tsv',sep=\"\\t\")" ] }, { From 96bad3177b3f53e382b83fd7042907e4ce5da2fe Mon Sep 17 00:00:00 2001 From: Yee Mey <4806863+yeemey@users.noreply.github.com> Date: Thu, 1 Feb 2024 00:45:04 -0800 Subject: [PATCH 2/2] Update example column names to patient1 --- ...alyzing_tabular_omics_data_in_pandas.ipynb | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb b/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb index 525d099..8929984 100644 --- a/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb +++ b/content/07_tabular_omics_data/analyzing_tabular_omics_data_in_pandas.ipynb @@ -402,7 +402,7 @@ "\n", "We might want to access the rows or columns of our pandas data directly so that we can do calculations. A nice tutorial on this can be found here: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/.\n", "\n", - "In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the OTUs as columns, so we could access them with `feature_table[\"OTU1\"]`" + "In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the patients as columns, so we could access them with `feature_table[\"patient1\"]`" ] }, { @@ -471,7 +471,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Select all of the OTU1 column:\n", + "Select all of the patient1 column:\n", "OTU_ID\n", "OTU1 3\n", "OTU2 4\n", @@ -484,7 +484,7 @@ ], "source": [ "#Recall that : means all, and we specify rows,columns when using .loc\n", - "print(\"Select all of the OTU1 column:\")\n", + "print(\"Select all of the patient1 column:\")\n", "selected_column = feature_table.loc[:,'patient1']\n", "print(selected_column)" ] @@ -493,7 +493,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Cool! So we see that we now get the count of microbe 1 ('OTU1') in each patient's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n", + "Cool! So we see that we now get the count of all microbes in patient1's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n", "\n", "#### Select a row from a pandas DataFrame with `.loc`\n" ] @@ -508,7 +508,7 @@ "output_type": "stream", "text": [ "\n", - "Select the patient1 row:\n", + "Select the OTU1 row:\n", "patient1 3\n", "patient2 4\n", "patient3 2\n", @@ -526,7 +526,7 @@ } ], "source": [ - "print(\"\\nSelect the patient1 row:\")\n", + "print(\"\\nSelect the OTU1 row:\")\n", "selected_row = feature_table.loc['OTU1',:]\n", "print(selected_row)\n", "\n" @@ -611,7 +611,7 @@ "output_type": "stream", "text": [ "\n", - "feature_table['OTU1'] is an object of type: \n" + "feature_table['patient1'] is an object of type: \n" ] } ], @@ -620,7 +620,7 @@ "selected_column_type = type(selected_column)\n", "\n", "#WHAT IS THIS? Print the answer to screen\n", - "print(\"\\nfeature_table['OTU1'] is an object of type:\", selected_column_type)" + "print(\"\\nfeature_table['patient1'] is an object of type:\", selected_column_type)" ] }, { @@ -655,7 +655,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Log(OTU1 data): OTU_ID\n", + "Log(patient1 data): OTU_ID\n", "OTU1 1.386294\n", "OTU2 1.609438\n", "OTU3 0.693147\n", @@ -674,9 +674,9 @@ "#Select the patient1 column of our DataFrame\n", "OTU1_data = feature_table.loc[:,\"patient1\"]\n", "\n", - "log_OTU1_data = log(OTU1_data +1)\n", + "log_patient1_data = log(patient1_data +1)\n", "\n", - "print(\"Log(OTU1 data):\", log_OTU1_data)\n" + "print(\"Log(patient1 data):\", log_patient1_data)\n" ] }, { @@ -703,8 +703,8 @@ } ], "source": [ - "OTU1_data = list(feature_table.loc[:,\"patient1\"])\n", - "print(OTU1_data)" + "patient1_data = list(feature_table.loc[:,\"patient1\"])\n", + "print(patient1_data)" ] }, { @@ -732,8 +732,8 @@ } ], "source": [ - "OTU1_data_as_array = feature_table.loc[:,\"patient1\"].values\n", - "print(\"Data as a numpy array:\",OTU1_data_as_array)" + "patient1_data_as_array = feature_table.loc[:,\"patient1\"].values\n", + "print(\"Data as a numpy array:\",patient1_data_as_array)" ] }, {