+ - 0:00:00
Notes for current slide
Notes for next slide

Our package reviews in review!
Introducing & analyzing rOpenSci onboarding

Maëlle Salmon, editor at rOpensci onboarding
ma_salmon maelle masalmon.eu

2018/06/23

1 / 68

My place in the R community

row1 <- c(magick::image_read("assets/locke.png"),
magick::image_read("assets/rweekly.png")) %>%
magick::image_append()
row2 <- c(magick::image_read("assets/ropensci.png"),
magick::image_read("assets/rladies.png")) %>%
magick::image_append()
magick::image_blank(225, 200, col = "white") %>%
magick::image_composite(row1, offset = "+0+100") %>%
magick::image_composite(row2, offset = "+50+25")

2 / 68

What's rOpenSci?

3 / 68

What's rOpenSci?

  • Community of researchers and software developers

  • R packages for open and reproducible science

  • Community and staff contributions

3 / 68

rOpenSci packages suite

https://ropensci.org/packages/

4 / 68

rOpenSci packages suite

https://ropensci.org/packages/

4 / 68

rOpenSci packages suite

https://ropensci.org/packages/

5 / 68

rOpenSci packages suite

https://ropensci.org/packages/

6 / 68

rOpenSci packages suite: UK specific packages!

  • PostcodesioR by Eryk Walczak: API wrapper around postcodes.io - free UK postcode lookup and geocoder.

  • nomisr by Evan Odell: Access UK official statistics from the Nomis database through R.

  • refimpact by Perry Stephenson: API wrapper for UK Research Excellence Framework 2014 Impact Case Studies Database.

  • rdefra by Claudia Vitolo: Interact with the UK AIR Pollution Database from DEFRA.

7 / 68

rOpenSci onboarding!

How to ensure quality in the whole suite?

8 / 68

rOpenSci onboarding!

How to ensure quality in the whole suite?

Open software reviews.

8 / 68

rOpenSci onboarding!

How to ensure quality in the whole suite?

Open software reviews.

  • drive adoption of best practices and standards
8 / 68

rOpenSci onboarding!

How to ensure quality in the whole suite?

Open software reviews.

  • drive adoption of best practices and standards

  • build a community of practice

8 / 68

rOpenSci onboarding!

How to ensure quality in the whole suite?

Open software reviews.

  • drive adoption of best practices and standards

  • build a community of practice

  • partnerships with the Journal of Open Source Software and Methods in Ecology and Evolution

8 / 68

What to review for?

9 / 68

What to review for?

Jeff Leek https://github.com/jtleek/rpackages

9 / 68

Review criteria

  • Open-source initiative (OSI) compatible license

  • Complete docs

  • High test coverage

  • Readable code

  • Usability

10 / 68

Review criteria

  • Open-source initiative (OSI) compatible license

  • Complete docs

  • High test coverage

  • Readable code

  • Usability

https://ropensci.github.io/dev_guide/building.html

10 / 68

Review criteria

  • Open-source initiative (OSI) compatible license

  • Complete docs

  • High test coverage

  • Readable code

  • Usability

https://ropensci.github.io/dev_guide/building.html

A whole bookdown! https://ropensci.github.io/dev_guide

10 / 68

How to review?

  • Open & non-adversarial
11 / 68

How to review?

  • Open & non-adversarial

  • No rejections

11 / 68

How to review?

  • Open & non-adversarial

  • No rejections

  • Makes the process constructive for everyone involved

11 / 68

How to review?

  • Open & non-adversarial

  • No rejections

  • Makes the process constructive for everyone involved

  • Technically, using GitHub infrastructure

11 / 68

rOpenSci onboarding

12 / 68

The issues tracker

13 / 68

Submitting a package

14 / 68

Submitting a package

15 / 68

Submitting a package

16 / 68

Rejection

No rejections... but out-of-scope packages not onboarded.

https://ropensci.github.io/dev_guide/policies.html

17 / 68

Rejection

No rejections... but out-of-scope packages not onboarded.

https://ropensci.github.io/dev_guide/policies.html

  • fit in our categories: data retrieval, data extraction, database access, data munging, data deposition, reproducibility, geospatial data, text analysis.
17 / 68

Rejection

No rejections... but out-of-scope packages not onboarded.

https://ropensci.github.io/dev_guide/policies.html

  • fit in our categories: data retrieval, data extraction, database access, data munging, data deposition, reproducibility, geospatial data, text analysis.

  • application in science

17 / 68

Rejection

No rejections... but out-of-scope packages not onboarded.

https://ropensci.github.io/dev_guide/policies.html

  • fit in our categories: data retrieval, data extraction, database access, data munging, data deposition, reproducibility, geospatial data, text analysis.

  • application in science

  • better than similar packages

17 / 68

Rejection

No rejections... but out-of-scope packages not onboarded.

https://ropensci.github.io/dev_guide/policies.html

  • fit in our categories: data retrieval, data extraction, database access, data munging, data deposition, reproducibility, geospatial data, text analysis.

  • application in science

  • better than similar packages

When in doubt, pre-submission enquiry!

17 / 68

Pre-submission enquiry?

18 / 68

Pre-submission enquiry?

19 / 68

The review process

20 / 68

The review process: editor checks

21 / 68

The review process: reviews

22 / 68

The review process: reviews

23 / 68

The review process

Ongoing discussion until acceptance and transfer

24 / 68

The review process

Ongoing discussion until acceptance and transfer

Often a blog post https://ropensci.org/tags/review/

24 / 68

A data-driven overview?

How to perform a data analysis of onboarding?

25 / 68

A data-driven overview?

How to perform a data analysis of onboarding?

Let's rectangle onboarding! Rectangling as coined by Jenny Bryan

Meme image by Allie Brosh

25 / 68

Onboarding data in the issue tracker

26 / 68

Weaving GitHub issue threads

27 / 68

Weaving GitHub issue threads

GitHub GraphQL API v4. Better than v3? Get only the data you need.

27 / 68

Weaving GitHub issue threads

GitHub GraphQL API v4. Better than v3? Get only the data you need.

My experience

27 / 68

Weaving GitHub issue threads

GitHub GraphQL API v4. Better than v3? Get only the data you need.

My experience

27 / 68

GitHub API V4 explorer

28 / 68

ghql and what?

  • Use the query defined previously via ghql

  • Get raw JSON

  • writeClipboard

29 / 68

Learn how to transform JSON with jq PLAY!

30 / 68

A minimal V4 example

get_contents function of my ghrecipes package

31 / 68

A minimal V4 example: client creation

token <- Sys.getenv("GITHUB_GRAPHQL_TOKEN")
cli <- ghql::GraphqlClient$new(
url = "https://api.github.com/graphql",
headers = httr::add_headers(Authorization = paste0("Bearer ", token))
)
cli$load_schema()
32 / 68

A minimal V4 example: query writing

query <- paste0('query{
repository(owner: "', owner, '", name:"', repo,'"){
ref(qualifiedName: "master") {
target {
... on Commit {
id
history(first: 1) {
edges {
node {
tree{
entries {
name
}
}
}
}
}
}
}
}
}
}
')
33 / 68

A minimal V4 example: request\&wrangling

qry <- ghql::Query$new()
qry$query('foobar', query)
cli$exec(qry$queries$foobar) %>%
jqr::jq("..|.entries?|select(.!=null)|.[].name") %>%
as.character() %>%
stringr::str_replace_all('\\\"', '')
34 / 68

A minimal V4 example: in action

## [1] "CNAME" "README.Rmd"
## [3] "README.md" "editor_template.md"
## [5] "editors_guide.md" "icon_lettering_color.png"
## [7] "issue_template.md" "news_template.md"
## [9] "packaging.png" "packaging_guide.md"
## [11] "policies.md" "review_request_template.md"
## [13] "reviewer_template.md" "reviewing_guide.md"
35 / 68

Result: woven GitHub threads

GitHub V4 magic + some wrangling later...

36 / 68

Result: woven GitHub threads

GitHub V4 magic + some wrangling later...

## # A tibble: 2,521 x 10
## title created_at closed_at body user issue package
## <chr> <dttm> <dttm> <chr> <chr> <int> <chr>
## 1 rrli~ 2015-03-31 00:25:14 2015-04-13 23:26:38 "- 1~ rich~ 6 rrlite
## 2 rrli~ 2015-04-01 17:30:51 2015-04-13 23:26:38 "hey~ scko~ 6 rrlite
## 3 rrli~ 2015-04-01 17:36:03 2015-04-13 23:26:38 "@sc~ kart~ 6 rrlite
## 4 rrli~ 2015-04-02 03:36:09 2015-04-13 23:26:38 "Sur~ jero~ 6 rrlite
## 5 rrli~ 2015-04-02 03:50:43 2015-04-13 23:26:38 "IMO~ gabo~ 6 rrlite
## 6 rrli~ 2015-04-02 03:53:57 2015-04-13 23:26:38 "Ide~ rich~ 6 rrlite
## 7 rrli~ 2015-04-02 18:58:53 2015-04-13 23:26:38 "> H~ kart~ 6 rrlite
## 8 stpl~ 2015-04-08 23:56:17 2015-10-29 14:14:35 "- 1~ Robi~ 10 stplanr
## 9 rrli~ 2015-04-10 21:52:39 2015-04-13 23:26:38 "@ri~ stew~ 6 rrlite
## 10 rrli~ 2015-04-10 22:10:48 2015-04-13 23:26:38 "Tha~ rich~ 6 rrlite
## # ... with 2,511 more rows, and 3 more variables: is_review <lgl>,
## # commenter <chr>, role <chr>
36 / 68

Result: woven GitHub threads

## [1] 70

37 / 68

Onboarding data in onboarded repos

38 / 68

Onboarding data in onboarded repos

38 / 68

Onboarding data in onboarded repos

39 / 68

Onboarding data in onboarded repos

39 / 68

Onboarding data in onboarded repos

39 / 68

Onboarding data in onboarded repos

39 / 68

Onboarding data in onboarded repos

39 / 68

Let's examine data

40 / 68

How much work is onboarding?

41 / 68

Work done in repositories

42 / 68

Work done in repositories

43 / 68

Work done in repositories

44 / 68

Apparent age at submission

45 / 68

Apparent age at submission

45 / 68

Reviewing time

46 / 68

How big are packages?

Work by authors

Work for reviewers

47 / 68

How big are packages? Reviewing time vs. no. of exports

48 / 68

Last notes on work quantification

49 / 68

Last notes on work quantification

  • Hard to define metrics
49 / 68

Last notes on work quantification

  • Hard to define metrics

  • Very hard working volunteers!

49 / 68

Last notes on work quantification

  • Hard to define metrics

  • Very hard working volunteers!

  • Decreasing time by automation

49 / 68

Last notes on work quantification

  • Hard to define metrics

  • Very hard working volunteers!

  • Decreasing time by automation

  • What about editors? In Tim Trice's words, "guiding angels from start to finish during the entire onboarding and review process". Typically handle a package every 1 or 2 month(s).

49 / 68

A high-quality and... friendly process?

50 / 68

A high-quality and... friendly process?

50 / 68

A high-quality and... friendly process?

51 / 68

Social weather of onboarding

Ann Gentle's essay in http://open-advice.org/

Impressions, examples... more general approach for onboarding?

52 / 68

Social weather of onboarding

Ann Gentle's essay in http://open-advice.org/

Impressions, examples... more general approach for onboarding?

Tidy text analysis for the win! Reminder of Nujcharee (เป็ด) 's talk earlier.

52 / 68

Social weather of onboarding

Ann Gentle's essay in http://open-advice.org/

Impressions, examples... more general approach for onboarding?

Tidy text analysis for the win! Reminder of Nujcharee (เป็ด) 's talk earlier.

52 / 68

Most common words

53 / 68

Most common bigrams

54 / 68

Pairwise correlations

55 / 68

Sentiment

56 / 68

Sentiment

56 / 68

Words in negative lines

57 / 68

Negative sample

## # A tibble: 15 x 2
## line sentiment
## <chr> <dbl>
## 1 @ultinomics no more things, although do make sure to add mor~ -1.63
## 2 not sure what you mean, but i'll use different object names ~ -1.20
## 3 error in .local(.object, ...) : -1
## 4 error: -1
## 5 #### miscellaneous -1
## 6 error: command failed (1) -0.866
## 7 - get_plate_size_from_number_of_columns: maybe throwing an e~ -0.786
## 8 "this code returns an error, which is good, but it would be ~ -0.744
## 9 0 errors | 0 warnings | 0 notes -0.722
## 10 once i get to use this package more, i'm sure i'll have more~ -0.721
## 11 - i now realize i've pasted the spelling mistakes without th~ -0.707
## 12 minor issues: -0.707
## 13 ## minor issues -0.707
## 14 replicates issue -0.707
## 15 visualization issue -0.707
58 / 68

Most positive lines!

## # A tibble: 15 x 2
## line sentiment
## <chr> <dbl>
## 1 absolutely - it's really important to ensure it really has b~ 1.84
## 2 overall, really easy to use and really nicely done. 1.73
## 3 this package is a great and lightweight addition to working ~ 1.46
## 4 i am very grateful for your approval and i very much look fo~ 1.26
## 5 thank you very much for the constructive thoughts. 1.24
## 6 thanks for the approval, all in all a very helpful and educa~ 1.22
## 7 - really good use of helper functions 1.14
## 8 - i believe the utf note is handled correctly and this is ju~ 1.13
## 9 seem more unified and consistent. 1.13
## 10 very much appreciated! 1.13
## 11 - well organized, readable code 1.1
## 12 - wow very extensive testing! well done, very thorough 1.1
## 13 - i'm delighted that you find my work interesting and i'm ve~ 1.08
## 14 thank you very much for your thorough and thoughtful review,~ 1.08
## 15 great, thank you very much for accepting this package. i am ~ 1.07
59 / 68

Automate all the things

Let humans focus on what humans are best at!

60 / 68

Current automation

R CMD check/BiocCheck #repository standards
testthat::test_package() #functionality
covr::package_coverage() #testing completeness
devtools::spell_check() #documentation
lintr::lint_package() #code style
goodpractice::gp() #antipatterns/complexity

https://masalmon.eu/2017/06/17/automatictools/

61 / 68

More automation

62 / 68

More automation

  • Submission from R? Like devtools::release
62 / 68

More automation

62 / 68

More automation

  • Submission from R? Like devtools::release

  • Setting up the review project: https://github.com/ropensci/pkgreviewr by editor Anna Krystalli

  • Matching reviewers and packages via better volunteering data

62 / 68

More automation

  • Submission from R? Like devtools::release

  • Setting up the review project: https://github.com/ropensci/pkgreviewr by editor Anna Krystalli

  • Matching reviewers and packages via better volunteering data

  • Running goodpractice::gp automatically and not locally

62 / 68

More automation

  • Submission from R? Like devtools::release

  • Setting up the review project: https://github.com/ropensci/pkgreviewr by editor Anna Krystalli

  • Matching reviewers and packages via better volunteering data

  • Running goodpractice::gp automatically and not locally

And even more ideas!

62 / 68

Get involved!

63 / 68

Thank you!

Thank you and thanks to...

64 / 68

Thank you!

Thank you and thanks to...

images_annotate <- function(images, texts, ...){
purrr::map2(images, texts, magick::image_annotate, ...) %>%
magick::image_join()
}
set.seed(42)
thank <- function(names){
load("adjectives.RData")
compliments <- sample(adjectives, size = length(names))
glue::glue("https://github.com/{names}.png") %>%
magick::image_read() %>%
magick::image_resize("200x200") %>%
as.list() %>%
images_annotate(glue::glue("So {compliments}!"),
boxcolor = "white",
size = 15)%>%
magick::image_join() %>%
magick::image_append()
}
64 / 68

Thank you! satRday organizers

thank(c("DaveParr", "brennanpincardiff", "satrdays"))

65 / 68

Thank you! Onboarding editors

thank(c("karthik", "noamross", "sckott"))

thank(c("lmullen", "annakrystalli", "ropensci"))

66 / 68

Thank you! The faces of onboarding

bit.do/cardiff18 | ma_salmon | maelle | masalmon.eu

67 / 68
68 / 68

My place in the R community

row1 <- c(magick::image_read("assets/locke.png"),
magick::image_read("assets/rweekly.png")) %>%
magick::image_append()
row2 <- c(magick::image_read("assets/ropensci.png"),
magick::image_read("assets/rladies.png")) %>%
magick::image_append()
magick::image_blank(225, 200, col = "white") %>%
magick::image_composite(row1, offset = "+0+100") %>%
magick::image_composite(row2, offset = "+50+25")

2 / 68
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow