Statistical Algorithms blog attempted to recreate a graph depicting the growing colour selection of Crayola crayons in ggplot2 (original graph below via FlowingData).
He also asked the following questions: Is there an easier way to do this? How can I make the axes more like the original? What about the white lines between boxes and the gradual change between years? The sort order is also different.
I will present my version in this post, trying to address some of these questions.
data:image/s3,"s3://crabby-images/57f02/57f02f182e0748f9069d6ffc590460c06d72d31b" alt="crayons_small.png"
Data Import
The list of Crayola crayon colours is available on Wikipedia, and also contains one duplicate colour (#FF1DCE) that was excluded to make further processing easier.
> library(XML) > library(ggplot2) |
> theurl <- "http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors" > html <- htmlParse(theurl) > crayola <- readHTMLTable(html, stringsAsFactors = FALSE)[[2]] > crayola <- crayola[, c("Hex Code", "Issued", "Retired")] > names(crayola) <- c("colour", "issued", "retired") > crayola <- crayola[!duplicated(crayola$colour), + ] > crayola$retired[crayola$retired == ""] <- 2010 |
Plotting
Instead of geom_rect() I will show two options of plotting the same data using geom_bar() and geom_area() to plot the data, and need to ensure that there’s one entry per colour per year it was(is) in the production.
> colours <- ddply(crayola, .(colour), transform, + year = issued:retired) |
The plot colours are manually mapped to the original colours using scale_fill_identity().
> p <- ggplot(colours, aes(year, 1, fill = colour)) + + geom_bar(width = 1, position = "fill", binwidth = 1) + + theme_bw() + scale_fill_identity() |
data:image/s3,"s3://crabby-images/cd7c2/cd7c2b6d60c5bbc70a9172f924f71203c6549770" alt="crayola_colours-006.png"
And now the geom_area() version:
> p1 <- ggplot(colours, aes(year, 1, fill = colour)) + + geom_area(position = "fill", colour = "white") + + theme_bw() + scale_fill_identity() |
data:image/s3,"s3://crabby-images/9cd54/9cd54b59be1aeb9f2770f2c6c9cea54643b0e438" alt="crayola_colours-008.png"
Final Formatting
Next, the x-axis labels suggested by ggplot2 will be manualy overridden. Also I use a little trick to make sure that the labels are properly aligned.
> labels <- c(1903, 1949, 1958, 1972, 1990, 1998, + 2010) > breaks <- labels - 1 > x <- scale_x_continuous("", breaks = breaks, labels = labels, + expand = c(0, 0)) > y <- scale_y_continuous("", expand = c(0, 0)) > ops <- opts(axis.text.y = theme_blank(), axis.ticks = theme_blank()) |
> p + x + y + ops |
data:image/s3,"s3://crabby-images/5cb4f/5cb4f76072c13b29fc96626fabed0ad729daa5ca" alt="crayola_colours-011.png"
> p1 + x + y + ops |
data:image/s3,"s3://crabby-images/2e03b/2e03badaaa69f364aef2643f857c74b33160f59d" alt="crayola_colours-013.png"
The order of colours could be changed by sorting the colours by some common feature, unfortunately I did not find an automated way of doing this.
Sorting by Colour
Thanks to Baptiste who showed a way to sort the colours, the final version of the area plot resembles the original even more closely.
> library(colorspace) |
> sort.colours <- function(col) { + c.rgb = col2rgb(col) + c.RGB = RGB(t(c.rgb) %*% diag(rep(1/255, 3))) + c.HSV = as(c.RGB, "HSV")@coords + order(c.HSV[, 1], c.HSV[, 2], c.HSV[, 3]) + } > colours = ddply(colours, .(year), function(d) d[rev(sort.colours(d$colour)), + ]) |
> last_plot() %+% colours |
data:image/s3,"s3://crabby-images/35e81/35e819361b95624a3156b0829c5b67dd3c1925da" alt="crayola_colours-017.png"