ma_salmon
maelle
😷 ❤️ In December 2015 I was a statistician & data manager with the CHAI project (Cardiovascular Health Effects of Air Pollution in Telangana, India.) ➡️ interest in open air quality data
😷 ❤️ In December 2015 I was a statistician & data manager with the CHAI project (Cardiovascular Health Effects of Air Pollution in Telangana, India.) ➡️ interest in open air quality data
📊 📉 Work with R, An open-source language and environment for statistical computing. ➡️ interest in wrangling such data with R
😷 ❤️ In December 2015 I was a statistician & data manager with the CHAI project (Cardiovascular Health Effects of Air Pollution in Telangana, India.) ➡️ interest in open air quality data
📊 📉 Work with R, An open-source language and environment for statistical computing. ➡️ interest in wrangling such data with R
Hear of OpenAQ... 💥
😷 ❤️ In December 2015 I was a statistician & data manager with the CHAI project (Cardiovascular Health Effects of Air Pollution in Telangana, India.) ➡️ interest in open air quality data
📊 📉 Work with R, An open-source language and environment for statistical computing. ➡️ interest in wrangling such data with R
Hear of OpenAQ... 💥
👨🏭 Decision to create an R client for OpenAQ API!
That client now exists: ropenaq
, docs.ropensci.org/ropenaq
Actively maintained
Distributed on CRAN install.packages("ropenaq")
That client now exists: ropenaq
, docs.ropensci.org/ropenaq
Actively maintained
Distributed on CRAN install.packages("ropenaq")
That client now exists: ropenaq
, docs.ropensci.org/ropenaq
Actively maintained
Distributed on CRAN install.packages("ropenaq")
🙋 No longer with the CHAI project but still into R and air quality data. 😉
Open-source, free language with many capabilities:
Data wrangling, modeling, visualization;
Reproducible reports and analyses;
Interactive plots and apps.
The OpenAQ R client is an R "package":
An add-in to R;
A collection of code and documentation.
Gets data into R
From OpenAQ's API endpoints (measurements, latest and countries, cities, locations);
as tables called "data.frames", common format in R.
library("ropenaq")aq_latest(country = "IN")
library("ropenaq")aq_latest(country = "IN")
## # A tibble: 5,527 x 15## location city country distance parameter value lastUpdated unit ## <chr> <chr> <chr> <dbl> <chr> <dbl> <dttm> <chr>## 1 AAQMS K… Pune IN 1.41e7 co 1220 2018-02-22 03:00:00 µg/m³## 2 AAQMS K… Pune IN 1.41e7 pm10 165 2018-02-22 03:00:00 µg/m³## 3 AAQMS K… Pune IN 1.41e7 pm25 66 2018-02-22 03:00:00 µg/m³## 4 AAQMS K… Pune IN 1.41e7 o3 15.8 2018-02-22 03:00:00 µg/m³## 5 AAQMS K… Pune IN 1.41e7 so2 22.8 2018-02-22 03:00:00 µg/m³## 6 AP Tiru… Chit… IN NA no2 49.9 2016-06-30 13:00:00 µg/m³## 7 AP Tiru… Chit… IN NA o3 32.1 2016-06-30 13:00:00 µg/m³## 8 AP Tiru… Chit… IN NA so2 5.7 2016-06-30 13:00:00 µg/m³## 9 AP Tiru… Chit… IN NA so2 6.9 2016-07-04 06:15:00 µg/m³## 10 AP Tiru… Chit… IN NA pm10 12 2016-07-04 06:15:00 µg/m³## # … with 5,517 more rows, and 7 more variables: sourceName <chr>,## # averagingPeriod_value <dbl>, averagingPeriod_unit <chr>, latitude <dbl>,## # longitude <dbl>, cityURL <chr>, locationURL <chr>
delhi_locations <- aq_locations( city = "Delhi", country = "IN", parameter = "pm25" )delhi_locations
delhi_locations <- aq_locations( city = "Delhi", country = "IN", parameter = "pm25" )delhi_locations
## # A tibble: 71 x 25## id country city cities location locations sourceName sourceNames## <chr> <chr> <chr> <list> <chr> <list> <chr> <list> ## 1 IN-2… IN Delhi <chr … Alipur,… <chr [1]> caaqm <chr [1]> ## 2 IN-1… IN Delhi <chr … Anand V… <chr [1]> caaqm <chr [1]> ## 3 IN-60 IN Delhi <chr … Anand V… <chr [2]> caaqm <chr [2]> ## 4 IN-6 IN Delhi <chr … Anand V… <chr [2]> CPCB <chr [2]> ## 5 IN-2… IN Delhi <chr … Ashok V… <chr [1]> caaqm <chr [1]> ## 6 IN-1… IN Delhi <chr … Ashok V… <chr [1]> caaqm <chr [1]> ## 7 IN-71 IN Delhi <chr … Aya Nag… <chr [2]> caaqm <chr [2]> ## 8 IN-1… IN Delhi <chr … Bawana,… <chr [1]> caaqm <chr [1]> ## 9 IN-1… IN Delhi <chr … Burari … <chr [2]> caaqm <chr [2]> ## 10 IN-8 IN Delhi <chr … Civil L… <chr [1]> CPCB <chr [1]> ## # … with 61 more rows, and 17 more variables: sourceType <chr>,## # sourceTypes <list>, firstUpdated <dttm>, lastUpdated <dttm>,## # countsByMeasurement <list>, count <int>, longitude <dbl>, latitude <dbl>,## # pm25 <lgl>, pm10 <lgl>, no2 <lgl>, so2 <lgl>, o3 <lgl>, co <lgl>, bc <lgl>,## # cityURL <chr>, locationURL <chr>
For each location, useful metadata to select the optimal location(s) for your needs!
head(delhi_locations$sourceName)
## [1] "caaqm" "caaqm" "caaqm" "CPCB" "caaqm" "caaqm"
head(delhi_locations$pm25)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
min(delhi_locations$firstUpdated)
## [1] "2015-04-09 06:00:00 UTC"
head(delhi_locations$location)
## [1] "Alipur, Delhi - DPCC" "Anand Vihar, Delhi - DPCC"## [3] "Anand Vihar, Delhi - DPCC" "Anand Vihar, Delhi - DPCC"## [5] "Ashok Vihar, Delhi - DPCC" "Ashok Vihar, Delhi - DPCC"
head(delhi_locations$locationURL)
## [1] "Alipur%2C+Delhi+-+DPCC" "Anand+Vihar%2C+Delhi+-+DPCC"## [3] "Anand+Vihar%2C+Delhi+-+DPCC" "Anand+Vihar%2C+Delhi+-+DPCC"## [5] "Ashok+Vihar%2C+Delhi+-+DPCC" "Ashok+Vihar%2C+Delhi+-+DPCC"
💡 In aq_latest()
and aq_measurements()
use the URLencoded city / location.
res <- aq_latest(location = "Alipur, Delhi - DPCC")
## Error: This location/city/country combination is not available within the platform. See ?aq_locations
res <- aq_latest(location = "Alipur%2C+Delhi+-+DPCC")
data <- aq_measurements(location = "US+Diplomatic+Post%3A+New+Delhi",
parameter = "pm25")
# dplyr by Hadley Wickham et aldata <- dplyr::filter(data, value > 0)
data
## # A tibble: 2,158 x 12## location parameter value unit country city latitude longitude## <chr> <chr> <int> <chr> <chr> <chr> <dbl> <dbl>## 1 US Dipl… pm25 12 µg/m³ IN Delhi 28.6 77.2## 2 US Dipl… pm25 18 µg/m³ IN Delhi 28.6 77.2## 3 US Dipl… pm25 16 µg/m³ IN Delhi 28.6 77.2## 4 US Dipl… pm25 19 µg/m³ IN Delhi 28.6 77.2## 5 US Dipl… pm25 44 µg/m³ IN Delhi 28.6 77.2## 6 US Dipl… pm25 38 µg/m³ IN Delhi 28.6 77.2## 7 US Dipl… pm25 17 µg/m³ IN Delhi 28.6 77.2## 8 US Dipl… pm25 15 µg/m³ IN Delhi 28.6 77.2## 9 US Dipl… pm25 13 µg/m³ IN Delhi 28.6 77.2## 10 US Dipl… pm25 15 µg/m³ IN Delhi 28.6 77.2## # … with 2,148 more rows, and 4 more variables: dateUTC <dttm>,## # dateLocal <dttm>, cityURL <chr>, locationURL <chr>
library("ggplot2") # Hadley Wickham et al
ggplot(data) +
geom_point(aes(x = dateLocal, y = value),
col = "CornflowerBlue") +
geom_hline(yintercept = 25,
size = 1.2,
col = "darkred") +
ylab(expression(paste("PM2.5 concentration (", mu, "g/",m^3,")")))+
xlab("Time (days)") +
ggtitle(data$location[1]) +
hrbrthemes::theme_ipsum(
base_size = 16,
axis_title_size = 16)
data <- aq_measurements(location = "London+Westminster",
parameter = "pm25")
data <- dplyr::filter(data, value > 0)
ggplot(data) + geom_point(aes(x = dateLocal, y = value), col = "CornflowerBlue") + geom_hline(yintercept = 25, size = 1.2, col = "darkred") + ylab(expression(paste("PM2.5 concentration (", mu, "g/",m^3,")")))+ xlab("Time (days)") + ggtitle(data$location[1]) + hrbrthemes::theme_ipsum( base_size = 16, axis_title_size = 16)
Maximum number of results for any query is 100 (pages) times 10000 (results per page).
Maximum number of results for any query is 100 (pages) times 10000 (results per page).
You don't need to use the page
and limit
arguments, ropenaq
handles all of this for you. Just wait a bit. ⌚
Maximum number of results for any query is 100 (pages) times 10000 (results per page).
You don't need to use the page
and limit
arguments, ropenaq
handles all of this for you. Just wait a bit. ⌚
res <- aq_measurements(country = "IN")nrow(res)
## [1] 1000000
attr(res, "meta")$found
## [1] 29175743
date_from
and date_to
. Or see following item.More data than 1,000,000 results? Split your query, e.g. using date_from
and date_to
. Or see following item.
More data than the last 90 days? OpenAQ data older than 90 days is on Amazon Web Services (AWS) Athena. More setup for you but worth it and transferrable skills 💪
Set up an AWS account (root and IAM)
Create an S3 bucket. To avoid S3 fees, set up a lifecycle policy.
Set up an AWS account (root and IAM)
Create an S3 bucket. To avoid S3 fees, set up a lifecycle policy.
💰 Will I lose all my money? 💰 For OpenAQ on AWS you can't avoid Athena fees. They're super small. If you're nervous, do like me, check the billing interface often until you're confident.
Go to the Athena console. Joe Flasher's gist
Go to the Athena console. Joe Flasher's gist
Check your location
Create the OpenAQ table
Go to the Athena console. Joe Flasher's gist
Check your location
Create the OpenAQ table
Run a query like in the gist to see if it works
Go to the Athena console. Joe Flasher's gist
Check your location
Create the OpenAQ table
Run a query like in the gist to see if it works
Get your credentials (go into your "My security credentials" settings from the drop-down menu at the top right of the console where your username is.)
Now, install AWR.Athena
. It depends on rJava
. 🍀. Another newer R package without rJava: noctua
👀
Now, install AWR.Athena
. It depends on rJava
. 🍀. Another newer R package without rJava: noctua
👀
Set up credentials. I used AWS CLI and ran aws configure
, YMMV. 🤷
Now, install AWR.Athena
. It depends on rJava
. 🍀. Another newer R package without rJava: noctua
👀
Set up credentials. I used AWS CLI and ran aws configure
, YMMV. 🤷
In a script, try setting up a connection.
library("AWR.Athena") # Neal Fultz and Gergely Daróczilibrary("DBI") # Kirill Müller et alcon <- dbConnect(AWR.Athena::Athena(), region = 'us-east-1', S3OutputLocation = 's3://ropenaq', Schema = 'default')query <- "SELECT *FROM openaqWHERE location='US Diplomatic Post: Hanoi'LIMIT 10"meas <- dbGetQuery(con, query)
library("AWR.Athena") # Neal Fultz and Gergely Daróczilibrary("DBI") # Kirill Müller et alcon <- dbConnect(AWR.Athena::Athena(), region = 'us-east-1', S3OutputLocation = 's3://ropenaq', Schema = 'default')query <- "SELECT *FROM openaqWHERE location='US Diplomatic Post: Hanoi'LIMIT 10"meas <- dbGetQuery(con, query)
Now on to cooler stuff!
Selecting locations in Wuhan with ropenaq
.
locations <- ropenaq::aq_locations( country = "CN", longitude = 114.3055, latitude = 30.5928, radius = 5000) unique(locations$city)
## [1] "武汉市"
Prepare your query with glue
.
query <- glue::glue( "SELECT * FROM openaq WHERE city='{locations$city[1]}' AND parameter='pm25'" )query
## SELECT *## FROM openaq## WHERE city='武汉市'## AND parameter='pm25'
Make the query with AWR.Athena
library("AWR.Athena")library("DBI")con <- dbConnect(AWR.Athena::Athena(), region = 'us-east-1', S3OutputLocation = 's3://ropenaq', Schema = 'default')meas <- dbGetQuery(con, query)
Make the query with AWR.Athena
library("AWR.Athena")library("DBI")con <- dbConnect(AWR.Athena::Athena(), region = 'us-east-1', S3OutputLocation = 's3://ropenaq', Schema = 'default')meas <- dbGetQuery(con, query)
Now save meas
and munge it, plot it, etc.
In short,
ropenaq
alone: only OpenAQ API data i.e. last 90 days, no setup, no credit card info.
AWR.Athena
(or noctua
?): All OpenAQ data via AWS, more setup, pennies.
Take your time, document steps.
In both cases, you can do a lot from R once you got the data! 🎉
Need OpenAQ API data in R?
Or want an engaging dataset to learn how to use R for further projects?
install.packages("ropenaq")
browseURL("https://docs.ropensci.org/ropenaq")
Need OpenAQ API data in R?
Or want an engaging dataset to learn how to use R for further projects?
install.packages("ropenaq")
browseURL("https://docs.ropensci.org/ropenaq")
AWR.Athena
package (or noctua
?). Use ropenaq
to select locations.Some resources
R for Data Science book by Hadley Wickham and Garrett Grolemund, free online version;
Data Science specialization by John Hopkins University on Coursera;
Some resources
R for Data Science book by Hadley Wickham and Garrett Grolemund, free online version;
Data Science specialization by John Hopkins University on Coursera;
Where/how to get help: my post, Sam Tyner's post;
ropenaq R package, R client for the OpenAQ API
ropenaq R package, R client for the OpenAQ API
ropenaq R package, R client for the OpenAQ API
Use cases: report them on rOpenSci forum
Docs? Code? Let's talk at github.com/ropensci/ropenaq Code cool stuff: HTTP requests with crul
, asynchronous queries, tests with mocking.
ropenaq R package, R client for the OpenAQ API
Use cases: report them on rOpenSci forum
Docs? Code? Let's talk at github.com/ropensci/ropenaq Code cool stuff: HTTP requests with crul
, asynchronous queries, tests with mocking.
Bugs? GitHub too! 🐛
ropenaq R package, R client for the OpenAQ API
Use cases: report them on rOpenSci forum
Docs? Code? Let's talk at github.com/ropensci/ropenaq Code cool stuff: HTTP requests with crul
, asynchronous queries, tests with mocking.
Bugs? GitHub too! 🐛
😷 ❤️ In December 2015 I was a statistician & data manager with the CHAI project (Cardiovascular Health Effects of Air Pollution in Telangana, India.) ➡️ interest in open air quality data
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |