Skip to content

Commit

Permalink
Merge pull request #168 from yeemey/07-pandas
Browse files Browse the repository at this point in the history
Correct column name to `patient1` in examples; fix minor typos.
  • Loading branch information
zaneveld authored Feb 6, 2024
2 parents 5c70637 + 96bad31 commit 59a443f
Showing 1 changed file with 18 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Option 2: Download the data direclty using the `urlretreive` function in python**."
"**Option 2: Download the data directly using the `urlretrieve` function in python**."
]
},
{
Expand Down Expand Up @@ -89,7 +89,7 @@
"source": [
"### Checking that the raw data is present in our current directory\n",
"\n",
"As a first step, let's check that we have 'scenario1_otus.txt' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n",
"As a first step, let's check that we have 'scenario_1_otus_pandas.tsv' in our current working directory. The `listdir` function in the builtin `os` module returns a list of all contents of a specified directory on your computer.\n",
"\n",
"It can be useful to run it prior to trying to open files if you need to remember filenames. Let's import the `listdir` function and quickly check that our data is there..."
]
Expand Down Expand Up @@ -169,7 +169,7 @@
"from pandas import read_csv\n",
"\n",
"#Load the text version of the table (a csv file) into python using pandas\n",
"feature_table = read_csv('scenario_1_otus_pandas.txt',sep=\"\\t\")"
"feature_table = read_csv('scenario_1_otus_pandas.tsv',sep=\"\\t\")"
]
},
{
Expand Down Expand Up @@ -402,7 +402,7 @@
"\n",
"We might want to access the rows or columns of our pandas data directly so that we can do calculations. A nice tutorial on this can be found here: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/.\n",
"\n",
"In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the OTUs as columns, so we could access them with `feature_table[\"OTU1\"]`"
"In brief, we can select a column by indexing into the pandas DataFrame object using a column name. I currently have the patients as columns, so we could access them with `feature_table[\"patient1\"]`"
]
},
{
Expand Down Expand Up @@ -471,7 +471,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Select all of the OTU1 column:\n",
"Select all of the patient1 column:\n",
"OTU_ID\n",
"OTU1 3\n",
"OTU2 4\n",
Expand All @@ -484,7 +484,7 @@
],
"source": [
"#Recall that : means all, and we specify rows,columns when using .loc\n",
"print(\"Select all of the OTU1 column:\")\n",
"print(\"Select all of the patient1 column:\")\n",
"selected_column = feature_table.loc[:,'patient1']\n",
"print(selected_column)"
]
Expand All @@ -493,7 +493,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Cool! So we see that we now get the count of microbe 1 ('OTU1') in each patient's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n",
"Cool! So we see that we now get the count of all microbes in patient1's sample. However, this data type looks kinda funky. That's because it is a pandas Series object. It therefore doesn't print in the same way as either a pandas DataFrame or a python list. We'll talk more about that in a minute. For now, let's continue to explore how to select data using .loc.\n",
"\n",
"#### Select a row from a pandas DataFrame with `.loc`\n"
]
Expand All @@ -508,7 +508,7 @@
"output_type": "stream",
"text": [
"\n",
"Select the patient1 row:\n",
"Select the OTU1 row:\n",
"patient1 3\n",
"patient2 4\n",
"patient3 2\n",
Expand All @@ -526,7 +526,7 @@
}
],
"source": [
"print(\"\\nSelect the patient1 row:\")\n",
"print(\"\\nSelect the OTU1 row:\")\n",
"selected_row = feature_table.loc['OTU1',:]\n",
"print(selected_row)\n",
"\n"
Expand Down Expand Up @@ -611,7 +611,7 @@
"output_type": "stream",
"text": [
"\n",
"feature_table['OTU1'] is an object of type: <class 'pandas.core.series.Series'>\n"
"feature_table['patient1'] is an object of type: <class 'pandas.core.series.Series'>\n"
]
}
],
Expand All @@ -620,7 +620,7 @@
"selected_column_type = type(selected_column)\n",
"\n",
"#WHAT IS THIS? Print the answer to screen\n",
"print(\"\\nfeature_table['OTU1'] is an object of type:\", selected_column_type)"
"print(\"\\nfeature_table['patient1'] is an object of type:\", selected_column_type)"
]
},
{
Expand Down Expand Up @@ -655,7 +655,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Log(OTU1 data): OTU_ID\n",
"Log(patient1 data): OTU_ID\n",
"OTU1 1.386294\n",
"OTU2 1.609438\n",
"OTU3 0.693147\n",
Expand All @@ -674,9 +674,9 @@
"#Select the patient1 column of our DataFrame\n",
"OTU1_data = feature_table.loc[:,\"patient1\"]\n",
"\n",
"log_OTU1_data = log(OTU1_data +1)\n",
"log_patient1_data = log(patient1_data +1)\n",
"\n",
"print(\"Log(OTU1 data):\", log_OTU1_data)\n"
"print(\"Log(patient1 data):\", log_patient1_data)\n"
]
},
{
Expand All @@ -703,8 +703,8 @@
}
],
"source": [
"OTU1_data = list(feature_table.loc[:,\"patient1\"])\n",
"print(OTU1_data)"
"patient1_data = list(feature_table.loc[:,\"patient1\"])\n",
"print(patient1_data)"
]
},
{
Expand Down Expand Up @@ -732,8 +732,8 @@
}
],
"source": [
"OTU1_data_as_array = feature_table.loc[:,\"patient1\"].values\n",
"print(\"Data as a numpy array:\",OTU1_data_as_array)"
"patient1_data_as_array = feature_table.loc[:,\"patient1\"].values\n",
"print(\"Data as a numpy array:\",patient1_data_as_array)"
]
},
{
Expand Down

0 comments on commit 59a443f

Please sign in to comment.