A hands-on introductory course to the fundamentals of Python programming in data science for students with minimal or no programming experience. Students will learn while working on scientific problems and leveraging scientific datasets. The data science Python ecosystem includes easy-to-use packages for working with data and is the foundation for most deep learning frameworks, which will be used in subsequent courses. Students will develop skills in object-oriented programming in Python3; usage of packages for efficiently working with scientific data; customizing their environment; Anaconda; developing electronic notebooks for reusing and sharing code; reading data specific to the sciences (Biology, Chemistry, Math, or Physics); improving the efficiency of Python code; and visualizing data. At the end of the courses, students will have the skills to design and deploy a python-based data science solution for a small scientific challenge.
The main topics will include:
- Brief introduction to Python3 and the development ecosystem
- Introduction to object-oriented and procedural programming models and basic software architecture principles
- Professional programming techniques for modern software development: version control and team development (Git and GitHub), coding standards, unit and regression testing (PyTest) and continuous integration (TravisCI)
- Introduction to R and RStudio
- Developing reusable, sharable, and interactive electronic notebooks with Jupyter
- Python environment management: Virtualenv and Anaconda
- Fundamentals of data structures and their implementation in Python
- Python packages for science and data science: NumPy, SciPy, Pandas, StatsModels
- Data processing techniques for small, medium and large datasets
- Manual and programmatic metadata standards
- Data Analytics with Python: Optimization, Linear and Non-Linear Regression, Mathematical Modeling, Monte Carlo Sampling, Distributions, and Clustering
By the end of the course, a student should be able to developer re-usable, well-performing Python-based data analytics software using modern best-practices in software development. The following are the main learning objectives:
- Be able to, given a task in data analytics, be able to pick the right algorithm and apply Python tools to solve the task.
- Gain insight into the process of how Python and Python-based tools work and can be applied properly for data analytics.
- Be able to apply the techniques learned in the class to be able to manipulate, process, and manage data sets from small and simple to large and complex.
- Be able to present their software and develop effective software packages in Python within a team development environment.
Success in this course should be defined as the student becoming a better scientific software developer that can apply various techniques to data analysis.
Prerequisites: Prior basic exposure to Python programming language.
Reference Texts: None, online resources will be provided, and a reading list of various articles will be developed.
Evaluation: The course will include 3 programming projects (20% each) and a final project (40%). The programming projects will be submitted as Jupyter Notebooks and deployed through Github. Late homework will not be accepted.
Syllabus Change Policy: This syllabus is a guide for the course and is subject to change with appropriate advance notice.
Assume a reply will require about 24h during the week. If emailed during the weekend, expect replies on Monday. For grading questions, please contact me through your Andrew account or set-up a time to meet. The University forbids faculty to reply to non-Andrew accounts with information concerning grades.
Devices (iPads, smart phones etc.) tend to hinder classroom participation and discussions. Hence, unless explicitly stated otherwise or using them to take notes, please close or turn off all such devices when in class.
Student recording of class: Not allowed.
All CMU academic integrity policies apply to this class. Please look through https: //www.cmu.edu/academic-integrity/.
Academic integrity refers to the implicit commitment that every member makes to all others in the community to practice those principles that underlie the mission of the university and define academic integrity. These are: honesty and good faith; clarity in the communication of core values; professional conduct of work; mutual trust and respect; and fairness and exemplary behavior. In this course, cheating will not be tolerated and could lead to expulsion from the university. Please remind yourself of the policy here: http://www.cmu.edu/academic-integrity/
There is a zero-tolerance policy on cheating. If you are found to be cheating on an exam, you will automatically receive a failing grade for the exam (which cannot be dropped) and you will be reported to the Dean of Students for further disciplinary action. This usually means an academic review board meeting and possible suspension/expulsion from the university.
If you wish to request an accommodation due to a documented disability, please inform the instructors and contact: Disability Resources, 102 Whitfield Hall, 412-268-2013, (access@andrew.cmu.edu) as soon as possible. For ongoing documented classroom accommodations, one week's notice is required prior to each exam.
As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may diminish your academic performance and/or reduce your ability to participate in daily activities. CMU services are available, and treatment does work. You can learn more about confidential mental health services available on campus at: http://www.cmu.edu/counseling/. Support is always available (24/7) from Counseling and Psychological Services: 412-268-2922.