Shiny App calculations are very slow with large dataset

Question

Currently I'm developing a Shiny App for a customer with a large dataset (1.5 GB CSV, which I compressed to a 150 MB RDS). I'm having trouble each time a user changes an input, it seems that the slowest step, the data import is executed with every change. Here's a minimal example (the app is a bit more complex but the problem is the same).

UI.R (basic example from R studio, nothing relevant here just a select input and a ggplot):

library(shiny)

# Define UI for application that draws a histogram
shinyUI(fluidPage(

  # Application title
  titlePanel("Old Faithful Geyser Data"),

  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
      selectInput("select_z", "Z Value", selected = 387.5,c(seq(293.5,443.5,1)))
    ),

    # Show a plot of the generated distribution
    mainPanel(
       plotOutput("distPlot")
    )
  )
))

Server.R (a readRDS statement out of the server function, and a simple dplyr filter)

library(shiny)
library(dplyr)
library(magrittr)
library(ggplot2)

data <- readRDS('./data.rds')

# Define server logic required to draw a histogram
shinyServer(function(input, output) {

  output$distPlot <- renderPlot({

    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == input$select_z)

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))

  })

})

The initial load takes approximately 10 seconds (as normal) but the problem is everytime the user changes the inputs.

I tested the same setup in a non-reactive environment and the times are way faster, showing that the only constraint is the data import, the rest of the operations take less than a second.

system.time(readRDS('./data.rds'))
   user  system elapsed 
  3.121   0.396   3.524 
> system.time(filtered_data <- data %>% filter(Z == 384.5))
   user  system elapsed 
  0.048   0.011   0.059 
> system.time(ggplot(filtered_data)+geom_histogram(aes(X)))
   user  system elapsed 
  0.001   0.000   0.001

I think the problem is because the data import statement is executed every time an input changes, but I haven't found a way to stop this from happening.

Thanks

Since you've run the `readRDS` function outside the `shinyServer` function it should only run once. Filtering and plotting a large dataset can take time. Try [profiling your shiny app](http://rstudio.github.io/profvis/examples.html#example-3---profiling-a-shiny-application) to see exactly where time is being spent. — MrFlick, Oct 02 '18 at 14:50
How much of the data is left after filtering ? Maybe the guilty is ggplot ? Try with a simpler output. Data is also embedded in ggplot, you are using only X, maybe keep only X column when you filter data — Billy34, Jun 09 '20 at 11:26

score 4 · Answer 1 · answered Oct 02 '18 at 15:15

Ideally, you shouldn't need to load files this big into memory and use databases instead, have a look at these storage options from rstudio website.
What might improve the user interaction is the use of debounce where the selectInput will be delayed by some amount before firing

shinyServer(function(input, output,session) {

  selection <- reactive({
    input$select_z
  })

  # add a delay of 1 sec
  selected_z <- selection %>% debounce(1000)

  output$distPlot <- renderPlot({
    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == selected_z())

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))
  })
})

Shiny App calculations are very slow with large dataset

1 Answers1