While graphics guru Edward Tufte recently claimed that "R coders and users just can't do words on graphics and typography" and need additonal tools to make graphics that aren't "clunky", data journalists at major publications beg to differ. The BBC has been creating graphics "purely in R" for some time, with a typography style matching that of the BBC website. Senior BBC Data Journalist Christine Jeavans offers several examples, including this chart of life expectancy differences between men and women:
... and this chart on gender pay gaps at large British banks:
Meanwhile, the chart below was made for the Financial Times using just R and the ggplot2 package, "down to the custom FT font and the white bar in the top left", according to data journalist John Burn-Murdoch.
There are also entire collections devoted to recreating Tufte's own visualizations in R, presumably meeting his typography standards. Tufte later clarified saying "Problem is not code, problem is published practice", which is true of any programming environment, which is why it was strange that he'd call out R in particular.
And the L.A. Times uses Python. Their recent use of Altair is pretty impressive. You should check it out.
Posted by: Nicholas McCarty | June 27, 2018 at 21:02
Your preaching to the chior on your blog and R-bloggers. What readers will want to know is if this is all ggplot2, or if there’s more beyond that.
Posted by: Chuck Burks | June 28, 2018 at 04:47
Interesting..
Posted by: James Walmlsey | June 28, 2018 at 21:10
Anyone know or can share how to create a plot like those, it's very interesting how to go from the basic plot in R to that kind of plots
Posted by: Charlie Rock | June 29, 2018 at 14:09
As to Charlie Rock's question, a plot like the gender difference plot turned out to be simple in ggplot2.
To start off, have data in tidy form, i.e. country-gender-age so that there are two entries for each country. Then use the point geom for the data points. Use the line geom with aes(group = country) for the connectors. That's the part that took me a little spadework to find. The rest is done with the usual techniques of appearance modification.
Anything that can be done in ggplot2 can be done in base R I think, perhaps with a few more keystrokes.
Posted by: Charlotte Mack | July 02, 2018 at 12:33
hello james,
i tried to replicate the graph on Gap in life expectancy between men and women in Eastern European countries. i made a data frame estimating the age values:
> life_expectancy
# A tibble: 20 x 3
Country Gender Age
1 Russia Male 65.0
2 Russia Female 76.0
3 Lithuania Male 70.0
4 Lithuania Female 81.0
5 Belarus Male 68.0
6 Belarus Female 79.0
7 Syria Male 62.0
8 Syria Female 72.0
9 Ukraine Male 67.0
10 Ukraine Female 77.0
11 Latvia Male 70.0
12 Latvia Female 80.0
13 Georgia Male 69.0
14 Georgia Female 79.0
15 Cape Verde Male 68.0
16 Cape Verde Female 78.0
17 Mongolia Male 63.0
18 Mongolia Female 72.0
19 Kazakhstan Male 67.0
20 Kazakhstan Female 76.0
Here's the code:
le <- life_expectancy
library(ggplot2)
diff_le <- le %>%
group_by(Country) %>%
spread(Gender, Age) %>%
mutate(diff = Female - Male)
library(tidyverse)
p <- le %>%
ggplot(aes(y = Country, x = Age, color = Gender)) +
geom_line(aes(group = Country), size = 1.5, color = "grey") +
geom_point(size=3) +
labs(x = "", y = "") +
scale_x_continuous(breaks=c(65, 70, 75, 80),
labels=c("65 years", "70", "75", "80 years")) +
guides(color = FALSE) + # Use a larger dot
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
UBM <- le %>% filter(Country %in% c("Ukraine","Belarus")) %>%
filter(Gender %in% c("Male"))
UBF <- le %>% filter(Country %in% c("Ukraine","Belarus")) %>%
filter(Gender %in% c("Female"))
p <- p + geom_point() +
geom_text(data=UBM,
aes(Age,Country,label=factor(Gender), hjust = 1.2, vjust = 0.3))
p <- p + geom_text(data=UBF,
aes(Age,Country,label=factor(Gender), hjust = -0.1, vjust = 0.3))
p <- p + geom_text(data=UBF,
aes(Age,Country,label=factor(Gender), hjust = -0.1, vjust = 0.3))
Ukraine <- le %>% filter(Country %in% c("Ukraine"))
Syria <- le %>% filter(Country %in% c("Syria"))
Russia <- le %>% filter(Country %in% c("Russia"))
Mongolia <- le %>% filter(Country %in% c("Mongolia"))
Lithuania <- le %>% filter(Country %in% c("Lithuania"))
Latvia <- le %>% filter(Country %in% c("Latvia"))
Kazakhstan <- le %>% filter(Country %in% c("Kazakhstan"))
Georgia <- le %>% filter(Country %in% c("Georgia"))
CapeVerde <- le %>% filter(Country %in% c("Cape Verde"))
Belarus <- le %>% filter(Country %in% c("Belarus"))
p +
geom_text(data = Ukraine, aes(median(Age), Country, label = paste0(diff_le$diff[10], " Years")), size = 4, vjust = -1) +
geom_text(data = Syria, aes(median(Age), Country, label = diff_le$diff[9]), size = 4, vjust = -1) +
geom_text(data = Russia, aes(median(Age), Country, label = diff_le$diff[8]), size = 4, vjust = -1) +
geom_text(data = Mongolia, aes(median(Age), Country, label = diff_le$diff[7]), size = 4, vjust = -1) +
geom_text(data = Lithuania, aes(median(Age), Country, label = diff_le$diff[6]), size = 4, vjust = -1)+
geom_text(data = Latvia, aes(median(Age), Country, label = diff_le$diff[5]), size = 4, vjust = -1)+
geom_text(data = Kazakhstan, aes(median(Age), Country, label = diff_le$diff[4]), size = 4, vjust = -1)+
geom_text(data = Georgia, aes(median(Age), Country, label = diff_le$diff[3]), size = 4, vjust = -1)+
geom_text(data = CapeVerde, aes(median(Age), Country, label = diff_le$diff[2]), size = 4, vjust = -1)+
geom_text(data = Belarus, aes(median(Age), Country, label = paste0(diff_le$diff[1], " Years")), size = 4, vjust = -1)
Posted by: narcilili | July 05, 2018 at 00:17