Skip to content

Commit 169b07d

Browse files
Typo, spellcheck and grammar correction of submodule 2
1 parent a172ce0 commit 169b07d

8 files changed

+161
-173
lines changed

Submodule_2/Submodule_2_Overview.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"\n",
1010
"Module 2 takes us from introducing fundamental Python characteristics and functions towards data science. \n",
1111
"\n",
12-
"The goal is to be able to use Python programming language to perform data analysis, manipulation, and visualization. This will enable you to use Python to extract meaningful insights from large datasets using the Python libraries of NumPy, Pandas, and Matplotlib.\n",
12+
"The goal is to be able to use Python programming language to perform data analysis, manipulation, and visualization. This will enable you to use Python to extract meaningful insights from large datasets using the Python libraries NumPy, Pandas, and Matplotlib.\n",
1313
"\n",
1414
"The **NumPy** library expands Python's functionality, focusing on arrays of one or more dimensions. These are the foundation for scientific computing and processing of, especially, numerical data. \n",
1515
"\n",
@@ -69,7 +69,7 @@
6969
"source": [
7070
"## Clean up\n",
7171
"\n",
72-
"Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges you can do this by stopping the notebook instance in the **Cloud console**"
72+
"Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the notebook instance in the **Cloud console**"
7373
]
7474
}
7575
],
@@ -89,7 +89,7 @@
8989
"name": "python",
9090
"nbconvert_exporter": "python",
9191
"pygments_lexer": "ipython3",
92-
"version": "3.12.4"
92+
"version": "3.12.9"
9393
}
9494
},
9595
"nbformat": 4,

Submodule_2/Submodule_2_Tutorial_1_NumPy.ipynb

Lines changed: 49 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@
55
"id": "e7cccaee",
66
"metadata": {},
77
"source": [
8-
"# Tutorial 1: Numerical Python (NumPy)\n",
9-
"\n",
10-
"-----------------------------------------------------------"
8+
"# Tutorial 1: Numerical Python (NumPy)\n"
119
]
1210
},
1311
{
@@ -17,12 +15,12 @@
1715
"source": [
1816
"## Overview\n",
1917
"\n",
20-
"Are you ready to supercharge your research and data analysis? As a biologist, you're swimming in data—DNA sequences, protein structures, population stats, or experimental results. Processing all that manually or with spreadsheets? That’s the slow lane. Let me introduce you to NumPy, your new powerhouse for scientific computing!\n",
18+
"Are you ready to supercharge your research and data analysis? As a biologist, you're swimming in data— DNA sequences, protein structures, population stats, or experimental results. Processing all that manually or with spreadsheets? That’s the slow lane. Let me introduce you to NumPy, your new powerhouse for scientific computing!\n",
2119
"\n",
2220
"**Why Should a Biologist Care About NumPy?**\n",
2321
"1. Handle Biological Data Like a Pro\n",
2422
"\n",
25-
"Whether you’re analyzing gene expression data, creating protein matrices, or modeling population dynamics, NumPy makes it seamless to store, manipulate, and analyze huge datasets—faster and cleaner than Excel or vanilla Python.\n",
23+
"Whether you’re analyzing gene expression data, creating protein matrices, or modeling population dynamics, NumPy makes it seamless to store, manipulate, and analyze huge datasets— faster and cleaner than Excel or vanilla Python.\n",
2624
"\n",
2725
"2. Speed Up Your Workflow\n",
2826
"\n",
@@ -45,7 +43,7 @@
4543
"- Dummy-code (one-hot-encode) categorical data into numerical data with a protein sequence\n",
4644
"\n",
4745
"## Prerequisites\n",
48-
"- basic python\n",
46+
"- Basic python knowledge\n",
4947
"\n",
5048
"## Getting Started\n",
5149
"- Run the next code box to prepare the quiz\n",
@@ -73,22 +71,14 @@
7371
"print(\"done installing required packages\")"
7472
]
7573
},
76-
{
77-
"cell_type": "markdown",
78-
"id": "379832e3",
79-
"metadata": {},
80-
"source": [
81-
"----------------------------------------------------------------------------------\n"
82-
]
83-
},
8474
{
8575
"cell_type": "markdown",
8676
"id": "8dcebcfb",
8777
"metadata": {},
8878
"source": [
8979
"## About Numpy\n",
9080
"Numerical Python​, or \"numpy\" is one of the most popular modules used in Python. Numpy is considered a foundational module for Python high-end scientific computing.​ \n",
91-
"<br> ​\n",
81+
"\n",
9282
"The common standard in Python is to import numpy with the alias **np**.​ You will see how often you need to use this alias!\n"
9383
]
9484
},
@@ -112,12 +102,10 @@
112102
"metadata": {},
113103
"source": [
114104
"## The Numpy Array \n",
115-
"Among the many components it provides is the **ndarray**.​\n",
116-
"\n",
117-
" - This is a high-performance array or vector and serves as one of the main classes for scientific computing.​\n",
118-
"\n",
119-
" - Only holds one type of element, much like a standard array.​\n",
105+
"Among the many components it provides is the **ndarray**.\n",
120106
"\n",
107+
" - This is a high-performance array or vector and serves as one of the main classes for scientific computing.\n",
108+
" - Only holds one type of element, much like a standard array.\n",
121109
" - It is created with the **array()** function.\n"
122110
]
123111
},
@@ -251,7 +239,7 @@
251239
"\n",
252240
"\\[row number, column number]\n",
253241
"\n",
254-
"You can also index a whole row or column using a colon, : , in place of the other dimension."
242+
"You can also index a whole row or column using a colon,\":\" in place of the other dimension."
255243
]
256244
},
257245
{
@@ -310,9 +298,9 @@
310298
"# Test Your Knowledge\n",
311299
"(see the solution in the next markdown box)\n",
312300
"\n",
313-
"> 1. Create an np array (\"let_arr\") of 10 elements (letters a-j). Note: there is no shortcut for a range of letters\n",
314-
"> 2. Change the 2nd element to 'q' & print to the console\n",
315-
"> 3. Change the 4th element to the string 'cat'. (does it work, what type is it?)"
301+
"1. Create an np array (\"let_arr\") of 10 elements (letters a-j). Note: there is no shortcut for a range of letters.\n",
302+
"2. Change the 2nd element to 'q' & print to the console.\n",
303+
"3. Change the 4th element to the string 'cat'. Does it work, what type is it?"
316304
]
317305
},
318306
{
@@ -367,8 +355,7 @@
367355
"\n",
368356
"By reshaping your NumPy arrays, you adapt your data to fit specific tasks—statistical analysis, visualization, or preparing input for machine learning models. Instead of wasting time manually reorganizing data, you let NumPy do the heavy lifting so you can focus on your biology!\n",
369357
"\n",
370-
"Numpy has built-in tools, as we've said, to work with arrays. A single array can be rearranged \n",
371-
"We can reshape an array with **array.reshape()**"
358+
"Numpy has built-in tools, as we've said, to work with arrays. A single array can be rearranged. We can reshape an array with **array.reshape()**"
372359
]
373360
},
374361
{
@@ -394,10 +381,10 @@
394381
"metadata": {},
395382
"source": [
396383
"## Test Your Knowledge\r",
397-
" 1. Create a 2x5 array of the numbers 1-0\n",
398-
"2. Transpose the array, change rows to columns, columns to rws\n",
399-
"3. . Add the numbers 11 1 \n",
400-
"4. 4. Reshape the array, make it a 2x2x3 array"
384+
"1. Create a 2x5 array of the numbers 1-0.\n",
385+
"2. Transpose the array, change rows to columns, columns to rws.\n",
386+
"3. Add the numbers 11 and 1.\n",
387+
"4. Reshape the array, make it a 2x2x3 array."
401388
]
402389
},
403390
{
@@ -528,7 +515,8 @@
528515
"metadata": {},
529516
"source": [
530517
"## Test your knowledge\n",
531-
"> -Although you could easily look through the list yourself, use the gene_names and expression_levels arrays again to create a new clean_gene_name array that has removed all genes with zero expression. *This kind of filtering is commonly needed in biological data sets.*"
518+
"\n",
519+
"Although you could easily look through the list yourself, use the gene_names and expression_levels arrays again to create a new clean_gene_name array that has removed all genes with zero expression. *This kind of filtering is commonly needed in biological data sets.*"
532520
]
533521
},
534522
{
@@ -638,7 +626,7 @@
638626
"\n",
639627
"NumPy provides many functions you would expect in highly statistical applications such as data science. We have already been using some of them, such as enumerate and reshape\n",
640628
"\n",
641-
"Operations that would take several lines of nested loops in Python can be done in one line with NumPy. Whether you're doing matrix multiplication, statistical analysis, or random sampling, NumPy's tools simplify and accelerate your workflow. We will focus here on ones you might need for bioinformatics. Once you see how these work, you will easily be able to edit and use other functions. The full list is in the [numpy documentation](https://numpy.org/doc/2.1/reference/routines.math.html)\n",
629+
"Operations that would take several lines of nested loops in Python can be done in one line with NumPy. Whether you're doing matrix multiplication, statistical analysis, or random sampling, NumPy's tools simplify and accelerate your workflow. We will focus here on ones you might need for bioinformatics. Once you see how these work, you will easily be able to edit and use other functions. The full list is in the [numpy documentation](https://numpy.org/doc/2.1/reference/routines.math.html).\n",
642630
"\n",
643631
"- Mathematical functions\n",
644632
" * np.sum: Calculates the sum of elements along a specific axis\n",
@@ -647,12 +635,11 @@
647635
" * np.min / np.max: Find the minimum and maximum values in an array\n",
648636
" * np.std: Calculates the standard deviation\n",
649637
" * np.log10: Calculates the log of the value *commonly needed in biolinformatics*\n",
650-
" * \n",
651638
"- Logical and Comparison Functions\n",
652639
" * np.where: Returns the indices of elements meeting a condition, or replaces elements based on a condition\n",
653640
" * np.any / np.all: Checks if any or all elements of an array meet a condition\n",
654-
"- Tools to generate arrays\r\n",
655-
" - np.random.normal\r\n",
641+
"- Tools to generate arrays\r",
642+
" * np.random.normal\r\n",
656643
"- For multidimensional arrays\r\n",
657644
" * row_means = np.mean(arr, axis=0)\r\n",
658645
" * column_means= np.mean(arr, axis =1)\r\n",
@@ -729,9 +716,9 @@
729716
"source": [
730717
"<div style=\"font-size:18px\">More on Arrays</div> \n",
731718
"\n",
732-
"Arrays can also be split, i.e. taking a single array and breaking it up into multiple sub-arrays\n",
719+
"Arrays can also be split, i.e. taking a single array and breaking it up into multiple sub-arrays.\n",
733720
"\n",
734-
"The following code takes one array and splits it into 3 (equal parts)"
721+
"The following code takes one array and splits it into 3 (equal parts)."
735722
]
736723
},
737724
{
@@ -753,9 +740,9 @@
753740
"id": "e22c5629",
754741
"metadata": {},
755742
"source": [
756-
"NumPy allows you to conduct array searches using a **where()** method\n",
743+
"NumPy allows you to conduct array searches using a **where()** method.\n",
757744
"\n",
758-
"The return value is an array of indexes where the search condition was satisfied"
745+
"The return value is an array of indexes where the search condition was satisfied."
759746
]
760747
},
761748
{
@@ -777,11 +764,9 @@
777764
"id": "97c395d4",
778765
"metadata": {},
779766
"source": [
780-
"Finally you can sort arrays using the sort() method​\n",
781-
"\n",
782-
" - The return value is a copy of the array sorted​\n",
767+
"Finally you can sort arrays using the sort() method. The return value is a copy of the array sorted.\n",
783768
"\n",
784-
"Note the sort is only ascending. To do a descending sort you need to reverse the array using slicing"
769+
"Note the sort is only ascending. To do a descending sort you need to reverse the array using slicing."
785770
]
786771
},
787772
{
@@ -803,9 +788,10 @@
803788
"metadata": {},
804789
"source": [
805790
"## Test Your Knowledge\r",
806-
"> - \n",
807-
"Use a version of our previous gene expression array (arr = np.array([\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\", \"GeneE\", \"GeneF\", \"GeneG\", 5.1, 0.3, 8.7, 1.2, 6.5, 0.0, 2.3]) to perform math function\n",
808-
"> - after you rearrange this into a 5x2 numpy array where the gene names are row 1. (you'll need to cope with the way that numpy interprets this single type array)"
791+
" \n",
792+
"Use a version of our previous gene expression array `arr = np.array([\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\", \"GeneE\", \"GeneF\", \"GeneG\", 5.1, 0.3, 8.7, 1.2, 6.5, 0.0, 2.3)` to perform math function.\n",
793+
"\n",
794+
"After you rearrange this into a 5x2 numpy array where the gene names are row 1. You'll need to cope with the way that numpy interprets this single type array."
809795
]
810796
},
811797
{
@@ -921,13 +907,13 @@
921907
"## Test your knowledge\n",
922908
"\n",
923909
"Can you make a similar tool for dummy coding a protein sequence? You'll use this to answer the quiz questions.\n",
924-
"> - Import a protein sequence from NCBI as your sequence (the quiz questions are based on the FASTA sequence human leptin from the **protein** database, id XP_005250397.1.... To fetch them, you can use the project solution from module 1.)\n",
925-
"> - Create a mapping matrix for the amino acids A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y\n",
926-
"> - Create a one-hot array for the protein sequence.\n",
927-
"> - Print the head or the tail of the one-hot matrix (the first 10 rows) for practice & to confirm\n",
928-
"> - Get the index of lysine (K) from the array you make of the amino acids.\n",
929-
"> - Count number of lysines for the whole sequence & display the value\n",
930-
"> - Check that answer by using the string.count(character) method from module 1"
910+
"* Import a protein sequence from NCBI as your sequence (the quiz questions are based on the FASTA sequence human leptin from the **protein** database, id XP_005250397.1.... To fetch them, you can use the project solution from module 1.)\n",
911+
"* Create a mapping matrix for the amino acids A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y\n",
912+
"* Create a one-hot array for the protein sequence.\n",
913+
"* Print the head or the tail of the one-hot matrix (the first 10 rows) for practice & to confirm\n",
914+
"* Get the index of lysine (K) from the array you make of the amino acids.\n",
915+
"* Count number of lysines for the whole sequence & display the value\n",
916+
"* Check that answer by using the string.count(character) method from module 1"
931917
]
932918
},
933919
{
@@ -1024,7 +1010,7 @@
10241010
"source": [
10251011
"# Conclusion\n",
10261012
"\n",
1027-
"In this tutorial, you have learned the many functions associated with the NumPy array. \n",
1013+
"In this tutorial, you have learned the many functions associated with the NumPy array.\n",
10281014
"\n",
10291015
"The next module introduces you to [Pandas](./Submodule_2_Tutorial_2_Pandas.ipynb)"
10301016
]
@@ -1037,6 +1023,14 @@
10371023
"## Clean up\r\n",
10381024
"Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the noteboko **compute*k instance from the Cloud console."
10391025
]
1026+
},
1027+
{
1028+
"cell_type": "code",
1029+
"execution_count": null,
1030+
"id": "0983f205-7499-43b7-b051-c4e996785872",
1031+
"metadata": {},
1032+
"outputs": [],
1033+
"source": []
10401034
}
10411035
],
10421036
"metadata": {
@@ -1055,7 +1049,7 @@
10551049
"name": "python",
10561050
"nbconvert_exporter": "python",
10571051
"pygments_lexer": "ipython3",
1058-
"version": "3.12.4"
1052+
"version": "3.12.9"
10591053
}
10601054
},
10611055
"nbformat": 4,

0 commit comments

Comments
 (0)