Introduction to Biomedical Data Science
and Health Informatics
June 8th, 2020 - June 12th, 2020
Overview
Welcome!
Join us for an introduction to basic biomedical data science knowledge and health informatics skills. This course is targeted for beginners in informatics. No previous experience is required. However, if programming is completely new to you, we encourage you to check out the introductory lecture for Harvard's CS50 course: Computational Thinking.
Most lectures are available in advance, although Thursday's lecture and Friday's lecture will only be available as a live-stream on the respective mornings. Morning office hours (9am to 12pm) will be available via Zoom to answer questions about the lectures. Afternoons (1pm to 5pm) will begin with a brief Q&A, followed by hands-on exercises via Zoom.
Communication
We will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TAs, and ourselves. Rather than emailing questions to the teaching staff, post your questions on Piazza. We'll be conducting all class-related discussion here. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you'll benefit from the collective knowledge of your classmates and instructors. We encourage you to ask questions when you're struggling to understand a concept - you can even do so anonymously. You can find our class signup link here: https://piazza.com/yale/summer2020/ycmi_cbdssummercourse.
If you have problems signing up for Piazza (e.g. if you only have a va.gov email), please write to the course instructors, and we can add you to the course manually.
Communication in Piazza is facilitated by notes. Notes are organized into folders, with each folder corresponding to a content area for the course. Please browse through the questions that have already been asked before asking a question, as someone else may have already asked your question!
Posts should be typed as questions, asking the class about a specific question you have. If you know the answer, feel free to answer questions and collaborate with your fellow students, however, instructors will be monitoring the Q&A to assist during the course.
Class Structure
What is a "5x5" course?
This course is structured as 5x5, in that it is over five days comprising approximately five hours of instruction per day. The American Medical Informatics Association offers continuing education in the form of 10x10 courses, which are seen as equivalent to a 3 credit hour semester-long course. As such, this course is approximately half as much in both breadth and depth.
- Feldman, Sue S., and William Hersh. "Evaluating the AMIA-OHSU 10x10 program to train healthcare professionals in medical informatics." In AMIA Annual Symposium Proceedings, vol. 2008, p. 182. American Medical Informatics Association, 2008.
Online Delivery
We are using what is known as an inverted classroom structure, where class time is focused on applying the material. An inverted classroom is one in which the class time is spent doing those tasks which would traditionally take place outside of the classroom and vice versa. As such, class time is spent reviewing problem sets, expounding on concepts with input from students, rather than the didactic relaying of information. That information transmission is relegated to the students' non-instructional time, through such methods as pre-recorded lectures, exercises or reading material. This shift in structure requires a different investiture of resources from the student, and as a result the incentives required have been found to shift in response:
- Reviewing the material provided for each lecture, be it readings, exercises or pre-recorded lectures is crucial for functioning in an inverted classroom.
- Classroom time is predominantly an exercise in information integration, following both the thread of discussion and relating it to the student's own skill level. Active learning is a must.
- As such, student reflection and conversing with instructors and making sure skill development is taking apace with the content delivered is a key to successful learning.
Course Materials
The data for the course can be found in this Google Drive folder. To add it to your own Google Drive, click on the course folder at the top next to the "Shared with me" header, select "Add shortcut to drive" from the dropdown and then create the shortcut by selecting "My Drive" (or your subfolder of choice).
Solutions for each set of exercises will be posted in the evenings after each class.
Monday: An Introduction to Python for Data Science
Basic calculations, variables, data types | Lecture (20m 2s) | Colab notebook | Exercises | Solutions |
Functions, Methods, f-strings | Lecture (24m 12s) | Colab notebook | Exercises | Solutions |
Looping (for loops) and making choices (if statements) | Lecture (30m 23s) | Colab notebook | Exercises | Solutions |
Loading and using libraries (modules) | Lecture (9m 34s) | Slides | Exercises | Solutions |
Loading and manipulating data with pandas | Lecture (36m 37s) | Slides | Exercises | Solutions |
Visualizing data with ggplot | Lecture (15m 3s) | Slides | Exercises | Solutions |
Tuesday: Data Management and Databases
Data Management and Databases | Lecture (27m 44s) |
Slides Colab notebook |
Exercises | Solutions |
You can download a transcript for the recorded presentation by clicking on this next link.
Wednesday: Data Cleaning and Data Visualization
Lectures and Slides
Exploratory Data Analysis | Lecture (41m 42s) | Slides | ||
Applied Visualization | Lecture (15m 4s) | Slides | ||
The Grammar of Graphics | Lecture (6m 41s) | Slides | ||
Data Cleaning I: Overview | Lecture (6m 4s) | Slides | ||
Data Cleaning II: Common challenges | Lecture (28m 17s) | Slides | ||
Data Cleaning III: Missing variables I: What does it mean | Lecture (7m 27s) | Slides | ||
Data Cleaning IV: Missing variables II: Why is it missing and what can be done | Lecture (16m 10s) | Slides |
Exercises
Thursday: Machine Learning and Bioinformatics
Note: lectures on Thursday are live via Zoom at 9am EDT and not pre-recorded. (recording)
Slides
Machine Learning | Slides | |||
Bioinformatics | Slides |