-
Notifications
You must be signed in to change notification settings - Fork 11
3b. Data Design
Six lessons of communicating with data:
-
Understand the context - this means knowing your audience and conveying a clear message about what you want your audience to know or do with the information you are providing.
-
Choose an appropriate visual display
-
Eliminate clutter - you should only provide information to the user that helps convey your message.
-
Focus attention where you want it - build visualizations that pull attention to the message you want to highlight.
-
Think like a designer - ask the right questions and find solutions.
-
Tell a story - your visualizations should give the audience a story. The most powerful data visualizations move people to take action.
There are two main reasons for creating visuals using data:
Exploratory Analysis is done when you are searching for insights. These visualizations don't need to be perfect. You are using plots to find insights, but they don't need to be aesthetically appealing. You are the consumer, and you need to be able to find the answer to your question from these plots.
Explanatory Analysis is done when you are providing your results for others. These visualizations need to provide you the emphasis you need to convey your message. They should be accurate, insightful, and visually appealing.
The five steps of the data analysis process:
- Extract - Obtain the data from a spreadsheet, SQL, the web, etc.
- Clean - Here we could use exploratory visuals.
- Explore - Here we use exploratory visuals.
- Analyze - Here we might use either exploratory or explanatory visuals.
- Share - Here is where explanatory visuals live.
Experts and researchers have determined the visual patterns that allow humans to best identify certain patterns. In general, humans are able to best understand data encoded with positional changes (as we see with scatterplots) and length changes (as we see with bar charts).
Alternatively, humans struggle with understanding data encoded with color hue changes (as is commonly used as an additional variable encoding in scatter plots) and area changes (as we see in pie charts).
The data-ink ratio, credited to Edward Tufte, is directly related to the idea of chart junk. The more of the ink in your visual that is related to conveying the message in the data, the better.
Limiting chart junk increases the data-ink ratio.
It is key that when you build plots you maintain integrity for the underlying data. One of the main ways discussed here for looking at data integrity was with the lie factor. See: How to Spot Visualization Lies
Colour can both help and hurt a data visualization. Choose colour for a reason. Three tips for using color effectively.
-
Before adding colour to a visualization, start with black and white.
-
When using colour, use less intense colours - not all the colours of the rainbow, which is the default in many software applications.
-
Colour for communication. Use colour to highlight your message and separate groups of interest. Don't add colour just to have color in your visualization.
To be sensitive to those with colorblindness, you should use color pallets that do not move from red to green without using another element to distinguish this change like shape, position, or lightness. Both of these colors appear in a yellow tint to individuals with the most common types of colorblindness. Instead, use colors on a blue to orange pallet.
Only use these additional encodings when absolutely necessary. Often these additional encodings suggest you are providing too much information in a single plot. Instead, it might be better to break the information into multiple individual messages, so the audience can understand every aspect of your message.
In general, color and shape are best for categorical variables, while the size of marker can assist in adding additional quantitative data.
Telling stories with data follows these steps:
- Start with a Question
- Repetition is a Good Thing
- Highlight the Answer
- Call Your Audience To Action
- Images from Udacity: Data Foundations Nanodegree program