« Because it's Friday: The lioness sleeps tonight | Main | Should you learn R or Python for data science? »

June 27, 2018

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

And the L.A. Times uses Python. Their recent use of Altair is pretty impressive. You should check it out.

Your preaching to the chior on your blog and R-bloggers. What readers will want to know is if this is all ggplot2, or if there’s more beyond that.

Interesting..

Anyone know or can share how to create a plot like those, it's very interesting how to go from the basic plot in R to that kind of plots

As to Charlie Rock's question, a plot like the gender difference plot turned out to be simple in ggplot2.

To start off, have data in tidy form, i.e. country-gender-age so that there are two entries for each country. Then use the point geom for the data points. Use the line geom with aes(group = country) for the connectors. That's the part that took me a little spadework to find. The rest is done with the usual techniques of appearance modification.

Anything that can be done in ggplot2 can be done in base R I think, perhaps with a few more keystrokes.

hello james,
i tried to replicate the graph on Gap in life expectancy between men and women in Eastern European countries. i made a data frame estimating the age values:

> life_expectancy
# A tibble: 20 x 3
Country Gender Age

1 Russia Male 65.0
2 Russia Female 76.0
3 Lithuania Male 70.0
4 Lithuania Female 81.0
5 Belarus Male 68.0
6 Belarus Female 79.0
7 Syria Male 62.0
8 Syria Female 72.0
9 Ukraine Male 67.0
10 Ukraine Female 77.0
11 Latvia Male 70.0
12 Latvia Female 80.0
13 Georgia Male 69.0
14 Georgia Female 79.0
15 Cape Verde Male 68.0
16 Cape Verde Female 78.0
17 Mongolia Male 63.0
18 Mongolia Female 72.0
19 Kazakhstan Male 67.0
20 Kazakhstan Female 76.0


Here's the code:

le <- life_expectancy
library(ggplot2)

diff_le <- le %>%
group_by(Country) %>%
spread(Gender, Age) %>%
mutate(diff = Female - Male)

library(tidyverse)

p <- le %>%
ggplot(aes(y = Country, x = Age, color = Gender)) +
geom_line(aes(group = Country), size = 1.5, color = "grey") +
geom_point(size=3) +
labs(x = "", y = "") +
scale_x_continuous(breaks=c(65, 70, 75, 80),
labels=c("65 years", "70", "75", "80 years")) +
guides(color = FALSE) + # Use a larger dot
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())

UBM <- le %>% filter(Country %in% c("Ukraine","Belarus")) %>%
filter(Gender %in% c("Male"))
UBF <- le %>% filter(Country %in% c("Ukraine","Belarus")) %>%
filter(Gender %in% c("Female"))


p <- p + geom_point() +
geom_text(data=UBM,
aes(Age,Country,label=factor(Gender), hjust = 1.2, vjust = 0.3))
p <- p + geom_text(data=UBF,
aes(Age,Country,label=factor(Gender), hjust = -0.1, vjust = 0.3))

p <- p + geom_text(data=UBF,
aes(Age,Country,label=factor(Gender), hjust = -0.1, vjust = 0.3))

Ukraine <- le %>% filter(Country %in% c("Ukraine"))
Syria <- le %>% filter(Country %in% c("Syria"))
Russia <- le %>% filter(Country %in% c("Russia"))
Mongolia <- le %>% filter(Country %in% c("Mongolia"))
Lithuania <- le %>% filter(Country %in% c("Lithuania"))
Latvia <- le %>% filter(Country %in% c("Latvia"))
Kazakhstan <- le %>% filter(Country %in% c("Kazakhstan"))
Georgia <- le %>% filter(Country %in% c("Georgia"))
CapeVerde <- le %>% filter(Country %in% c("Cape Verde"))
Belarus <- le %>% filter(Country %in% c("Belarus"))

p +
geom_text(data = Ukraine, aes(median(Age), Country, label = paste0(diff_le$diff[10], " Years")), size = 4, vjust = -1) +
geom_text(data = Syria, aes(median(Age), Country, label = diff_le$diff[9]), size = 4, vjust = -1) +
geom_text(data = Russia, aes(median(Age), Country, label = diff_le$diff[8]), size = 4, vjust = -1) +
geom_text(data = Mongolia, aes(median(Age), Country, label = diff_le$diff[7]), size = 4, vjust = -1) +
geom_text(data = Lithuania, aes(median(Age), Country, label = diff_le$diff[6]), size = 4, vjust = -1)+
geom_text(data = Latvia, aes(median(Age), Country, label = diff_le$diff[5]), size = 4, vjust = -1)+
geom_text(data = Kazakhstan, aes(median(Age), Country, label = diff_le$diff[4]), size = 4, vjust = -1)+
geom_text(data = Georgia, aes(median(Age), Country, label = diff_le$diff[3]), size = 4, vjust = -1)+
geom_text(data = CapeVerde, aes(median(Age), Country, label = diff_le$diff[2]), size = 4, vjust = -1)+
geom_text(data = Belarus, aes(median(Age), Country, label = paste0(diff_le$diff[1], " Years")), size = 4, vjust = -1)


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)

Search Revolutions Blog




Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr