tidygapminder

R-CMD-check test-coverage codecov CRAN status CRAN downloads Project Status: Active


Gapminder data, minus the mess.

Gapminder is a goldmine of global development data — life expectancy, income, CO₂ emissions, literacy rates, and hundreds more indicators spanning centuries. The catch? Every sheet looks like this:

life expectancy years | 1800 | 1801 | 1802 | ...
----------------------|------|------|------|----
Afghanistan           | 28.2 | 28.2 | 28.2 | ...
Albania               | 35.4 | 35.4 | 35.4 | ...
...

Countries as rows, years as columns, the indicator name hiding in cell A1. Great for a spreadsheet. Terrible for R.

tidygapminder fixes that in one function call.

Installation

# From CRAN
install.packages("tidygapminder")

# Development version
pak::pak("ebedthan/tidygapminder")

Two functions. That’s it.

tidy_index(): one file at a time

Point it at a Gapminder .csv, .xlsx, or .xls file and get back a clean tibble:

library(tidygapminder)

csv_path <- system.file("extdata/life_expectancy_years.csv", package = "tidygapminder")

tidy_index(csv_path)

Three columns: country, year, and the indicator, ready to filter, plot, or model.

tidy_bunch(): a whole folder at once

Downloaded ten indicators? No problem. Point tidy_bunch() at the folder:

dir_path <- system.file("extdata", package = "tidygapminder")

# Returns a named list of tibbles, one per file
result <- tidy_bunch(dir_path)
names(result)

Want everything in one data frame joined by country and year?

tidy_bunch(dir_path, combine = TRUE)

Why tidygapminder?

Getting help