Best tips & takeaways from RStudio Conference

R programming conference
Thinkstock

Here are the news, tips & tricks I learned from the 2017 RStudio Conference in Kissimmee, Florida. I am updating this blog throughout the conference. I hope you'll come back to see the latest!


Jan 14, 3:31 PM: Productivity tips for using R Notebooks, from RStudio software engineer Jonathan McPherson's presentation:

  • There's a new outline view for Notebooks in RStudio: Look for an icon at the top right or use ctrl-shift-O.
  • A notebook's progress bar doubles as a navigation bar to take you to the chunk that's currently running.
  • If you want to make a notebook more reproducible - restart R and run all chunks
  • You can customize keyboard shortcuts in newer versions of RStudio. Go to Tools > Modify Keyboard Shortcuts. McPherson suggests making keyboard shortcuts for collapsing and expanding all notebook chunks.

Jan 14, 2:54 PM: You can create websites with R using R Markdown and the relatively new blogdown package. It uses a static-site-generator called Hugo, which can be installed within R with the command blogdown::install_hugo.

blogdown helper functions include: new_site() to create a new site, new_post(), install_theme("Hugo theme URL"). serve_site() rebuilds your site and lets you preview your site locally. (Note: Some external Hugo themes may need to be tweaked to work with blogdown; this isn't documented yet). There's also a Live Preview add-in for RStudio that, as the name implies, lets you rebuild and preview your site locally. new_post() fills out some YAML metadata in a new R Markdown file for starting a new post.

In addition, brand new to blogdown today - Yihui Xie wrote it this afternoon at the conference: a "new post" RStudio add-in for one-click adding a new post.


Jan 14, 1:44 PM: A couple of RStudio IDE tips from Kevin Ushey:

If you type the partial name of file name inside quotation marks, there will be auto-complete options that will include the path to that file.

There are more diagnostics within RStudio than syntax problems. It can check for missing function arguments, typos in variable names and more. Look in Tools > Code > Diagnostics for options and to enable them.


Jan 14, 11:40 AM: Yihui Xie's slides from his Advanced R Markdown session are at http://bit.ly/2017-rc-rmd. A few of the tips I rounded up from the session:


Jan 14, 11:22 AM: You can add your own CSS file and HTML snippets into an R Markdown document that you are converting to HTML. Here's what part of the document YAML header would look like:

                              output:   html_document:       css: "mycssfile.css"       includes:           in_header: "header.html"           before_body: "before.html"           after_body: "after.html"                          

Jan 14, 11:05 AM: When rendering an R Markdown document, you can set the them option to NULL to reduce the HTML file size significantly . (The default uses a Bootstrap theme which is somewhat large.)


Jan 14, 10:40 AM: Another useful tip if you write documents that display R code:

```r
[some code here]
```

without curly brackets displays R code without evaluating it. Faster to type than

```{r eval = FALSE}
```


Jan 14, 9:10 AM: New on CRAN this week: an R package with data, code and stories from projects at FiveThirtyEight.com, created by academics who were looking for engaging data sets to help teach undergraduate statistics. FiveThirtyEight staff collaborated with the authors. Package authors rewrote some FiveThirtyEight code to make it more readable and to update it with newer packages in R's "tidyverse." You can find the fivethirtyeight package on CRAN.

There are 6 types of data stories, said former FiveThirtyEight data editor Andrew Flowers in his Saturday opening keynote address:

  • Novelty - A story that presents new data, or data that a reader/viewer hasn't understood before in a specific context. Danger: triviality. It's easy to report something that's not really meaningful. Tactics: Simple summaries - best to do simple analysis on new data and take a conservative approach. Also, ask yourself whether your finding is actually interesting to anyone besides you.
  • Outlier - This is the most common and the most effective data story, Flowers said. We're naturally drawn to stories about "best", "worst", etc. "They're the bread and butter of data journalists. It's like shooting fish in a barrel," Flowers said. Sports are one example: Why Stephen Curry and Lionel Messi are so great? Ask yourself: Is this really so different? Good tactic: Profile someone/something as an outlier for interest, and don't just present data.
  • Archetype - Telling story about something that's not necessarily different or new, but interesting. Example: Looking at Ferguson, Missouri -- yes, it's poor and unequal, but it's not so different from many other American communities? Danger: Oversimplification. Tactic: Modeling. Make sure to ask yourself: What variables am I leaving out? You want to keep it simple, but not oversimplify, he said.
  • Trend - What's changed? What's new? Can be effective around a breaking-news event. Danger: Regression to the mean. Is this signal or noise? You can look silly when things come back to equilibrium. Be conservative when telling data stories involving time series and trends. Make sure to ask yourself whether something is signal or noise. Make sure you have strong evidence before drawing conclusions.
  • Debunking - Attacking an alleged misperception. Example: The Dollar-And-Cents Case Against Excluding Women in Hollywood. The misperception was that Hollywood movies focused on females don't do well financially. FiveThirtyEight combined analysis of which movies did and didn't accurately portray women in films, the movies' budgets and return on investment. Data and code is available in the fivethirtyeight package. Dangers of this type of story: your own biases. Flowers advises to ask yourself: How much do I want to debunk this? (I'd suggest checking out the fun Spurious Correlations site for more examples).
  • Forecast - These are typically done with probability models, simulations and scenarios. The danger with any forecast model is "overfitting," assuming that data fits a model when perhaps it doesn't. Ask yourself: Am I properly conveying the uncertainty in my model?

Jan 13, 5:43 PM: The bsplus package is designed so you can get "more stuff in your Shiny app," says creator Ian Lyttle.wraps Bootstrap components, including accordion sidebar, carousel, tooltips, popover, help links and more. It was inspired by the shinyBS package, he said. Nothing in bsplus depends on the server part of Shiny, it's all in the UI side, which means it will work in R Markdown documents as well.


Jan 13, 5:34 PM: Karl Broman, professor at University of Wisconsin-Madison, has started a GitHub repo to collect links to conference presentation slides .


Jan 13, 5:21 PM: The ggedit package gives an interactive GUI for editing a ggplot2 graphic or theme -- and then lets you see the code behind the change.


Jan 13, 5:12 PM: Friday afternoon lighting talk:

  • rOpensci packages to consider for your arsenal
  • magick - R access to the ImageMagick image editing capability
  • hunspell - spell check in R
  • tesseract - gives R access to an optical character recognition enginetravis and tic -- tools to make it easy to work in travis

See slide presentation


Jan 13, 5:27 PM: The corrr package makes it easy to explore correlations in R -- get correlation data analysis into a data frame for more analysis. Can be piped, "pretty printed," visualized and more.


Jan 13, 5:05 PM: The easymake package creates make files in R, so you don't keep running code on data that hasn't updated. It includes an RStudio add-in.


Jan 13, 4:07 PM: Julia Silge is presenting on tidy text mining. If you're interested in text analysis in R and aren't in the session, take a look at Tidy Text Mining in R.


Jan 13, 4:03 PM: Vectors don't have to be atomic, notes Jenny Bryan in her presentation on list-cols. Vectors can be lists, too. So you can add a list to a data frame as a data frame column. Four skills to cultivate if you are adding such complex columns:

  • inspect
  • index
  • compute
  • simplify

You'll be happier if it's a tibble, she noted, but a data frame with a list-column is a valid data frame. Aside: The listviewer package has a nice html widget for viewing complex data. In general, though, you're going to want to learn the purrr package if you want to deal with this, she said. She's got a tutorial posted at https://jennybc.github.io/purrr-tutorial/.


Jan 13, 3:26 PM: If you do nothing else, when you're coding, think data first with your function arguments, advises IT security pro and R package author Bob Rudis. That makes your code pipe-friendly (as in %>%).

And, a pipe group should be designed to do one thing.

New to me: The httr package has a stop_for_status() function that converts http errors to R errors or warnings. It's a useful concept for other coding, Rudis said.


Jan 13, 2:39 PM: Do you want to pull data from APIs into R? RStudio's Amanda Gadrow posted several useful (and commented) example scripts at https://github.com/ajmcoqui/webAPIsR.


Jan 13, 2:03 PM: New to me: Commoncrawl.org, a project to crawl the Web and "that can be accessed and analyzed by anyone." After loading that project's files into Spark, you can use the sparkwarc package to read them into R. Conference demo showed things like finding most-used keywords and JavaScript libraries in a file with more than 100 million records. Interesting way to analyze Web content. Presentation slides are at bit.ly/2ilaQmi.


Jan 13, 2:00 PM: sparklyr version 0.5 is now on CRAN, useful for those who work with R and Apache Spark data. There are several new functions and improved compatibility, according to a presentation Friday afternoon.


Jan 13, 1:53 PM: Do you work with databases in R? Some news from the RStudio conference this afternoon: The company plans for RStudio version 1.1 to include a tab with information about database connections, as well as a dialog box to easily re-establish previously used connections. You'll also be able to view database drivers, tables and schemas currently available on your system.


Jan 13, 1:45 PM: An R database package is the works called odbc for connecting with databases using a DBI-compliant interface with ODBC drivers. . It's not yet on CRAN, but you can install with devtools::install_github("rstats-db/odbc") . There already is an RODBC package for R, but odbc aims to be faster and provide things like native support for dates. A conference demo should features like parameterized queries and adding SQL queries to R Markdown documents and interactive Shiny apps. If you pull data from databases with R, it's something you'll likely want to investigate.


Jan 13, 12:20 PM: RStudio has two different packages for creating dashboards. flexdashboard is for people who already know (or are willing to learn) R Markdown. shinydashboard is for people who know or are willing to learn the Shiny Web framework for R, which has a somewhat steeper learning curve.


Jan 13, 11:18 AM: What if you want to do something in Shiny that's slightly outside of what reactivity does, such as a function that also returns a previous value? Shiny creator Joe Cheng said he's working on a package currently called rxtools that "tries to wrap up some of those idioms" for those of us who don't have a deep, under-the-hood knowledge of Shiny. This package is still under development, he warned, so don't use it for any production work; and it will likely be renamed so as not to be confusing with Microsoft reactivity. But meanwhile you should be able to find it on GitHub.


Jan 13, 11:02 AM: If you find yourself copying and pasting code in Shiny, stop and ask yourself if you should be using a reactive expression, Joe Cheng advises. If you're not familiar with reactive expressions in Shiny search for talks on this from the Shiny developer conference. Warning: Don't just search for shiny videos. Those won't get you what you want (and in fact may give you pages of porn results, he said.)


Jan 13, 10:18 AM: Tutorial files for the Building Shiny Dashboards session are at: devtools::install_github("jcheng5/dashtutorial"). Then run dashtutorial::summon() to get exercise files.


Jan 13, 9:57 AM: tidyverse creator Hadley Wickham: "Importing data is either boring or horrifying. Exporting data is boring." (On why he writes packages for data import but not export.)


Jan 13, 9:56 AM: Hadley was asked about concerns in the R community of potentially causing a rift between tidyverse lovers and tidyverse skeptics. "That is honestly not something I spend much time worrying about," he said. "I worry about it a little bit," he admitted, but he said he's motivated by helping people get as far as they can in data analysis. He wants to create what he calls a "pit of success" - something people can easy fall into.

The tidyverse is a great place for people to start, he said, but knows that "in order to do real work you need to go out of the tidyverse."


Jan 13, 9:52 AM: It's currently rather cumbersome to easily look at R lists and json data. Hadley said this is a problem RStudio wants to solve.


Jan 13, 9:40 AM: Hadley Wickham: A function should either compute something or do something. It should never do both.


Jan 13, 9:26 AM: Do you like using %>% pipes in R? Wickham says R functions fit best into a pipe when:
The first argument is the "data"
The data is the same type across a family of functions