Teaching is in the form of "interactive lectures", where students follow instructions to complete specific learning tasks introduced by the lecturers.
Monday 11 Sep, Tue 12 Sep, Fri 15 Sep, 2017
Each day from 9 am to 4 pm, with following breaks:
- 10:15 - 10:30
- 12:00 - 12:30 (Lunch)
- 14:14 - 14:30
Please install VirtualBox 5.1 on your laptop. Choose the download depending on your operating system:
NB On laptops installed by HEALTH-IT, you will have access to VirtualBox via the Managed Software-application!
Select the 64-bit version of the Python 3.6 installer, as appropriate to your operating system. More details will be provided in-class.
You are welcome to try to install the package yourself. Just use the default 'advanced' options when prompted.
Optionally, browse through some of these links for inspiration/context.
- Mission statement
- Why Every Health Care Organization Needs a Data Science Strategy
-
Of particular importance are “boundary spanners” who can establish links among data science staff, the organization’s management, and its clinicians. They can identify data query priorities that are both organizationally and clinically relevant and can help users of data understand the full range of analysis that is available to them (such as near real-time queries regarding particular patient populations, medications, or treatment outcomes).
-
- Data scientists in healthcare
-
the health care industry does not currently appreciate the inherent value of these data, which can only be fully harnessed through better data analytics.
-
- Transforming Insights into Action
- What is "data science"?
- what is "data"!?
- Data-driven science?
- the OSEMN model
- What is a computer?
- How are storage and computation achieved in computing devices? Some definitions.
- Get to know your PC!
- Installing a scientific data-analysis "environment"
- Linux on VirtualBox
- Anaconda Framework
- Files and file system(s)
- The Command Line: telling the computer what to do
- Terminal & Bash
- directory navigation
- basic file manipulations
- the power of the command line
- Computer programs
- Introduction to variables and data types
- Interactive development vs. scripts
- More on variable types and manipulations
- Basic control flow
- Functions & arithmetic
- (computing) resources
- Introduction to programming assignment
- Data 'munging/scrubbing'
- Efficient manipulation of large textual data blocks
- Examples and suggestions for further study
(Brief description:) We have generated a synthetic dataset based on a published study (Luck et al., 2009) comparing reaction times (RT) of schizophrenic patients to those of normal controls, while the subjects were performing a particular task (details not relevant). The dataset consists of 40 log files (20 for each group), each a couple of thousand lines long. Write a program to "parse" the 40 log files to extract median RT and accuracy values, and write out a single CSV file like this
Subjid | Group | Cond | Median | Accuracy |
---|---|---|---|---|
{str} | Patient/Control | Freq/Rare | {float} | {float} |
... | ... | ... | ... | ... |
Also write out summary stats for median and accuracy values, separately for each group and condition. Compare these to the results in the paper (Table 3).