1

Currently I'm developing a Shiny App for a customer with a large dataset (1.5 GB CSV, which I compressed to a 150 MB RDS). I'm having trouble each time a user changes an input, it seems that the slowest step, the data import is executed with every change. Here's a minimal example (the app is a bit more complex but the problem is the same).

UI.R (basic example from R studio, nothing relevant here just a select input and a ggplot):

library(shiny)

# Define UI for application that draws a histogram
shinyUI(fluidPage(

  # Application title
  titlePanel("Old Faithful Geyser Data"),

  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
      selectInput("select_z", "Z Value", selected = 387.5,c(seq(293.5,443.5,1)))
    ),

    # Show a plot of the generated distribution
    mainPanel(
       plotOutput("distPlot")
    )
  )
))

Server.R (a readRDS statement out of the server function, and a simple dplyr filter)

library(shiny)
library(dplyr)
library(magrittr)
library(ggplot2)

data <- readRDS('./data.rds')

# Define server logic required to draw a histogram
shinyServer(function(input, output) {

  output$distPlot <- renderPlot({

    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == input$select_z)

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))

  })

})

The initial load takes approximately 10 seconds (as normal) but the problem is everytime the user changes the inputs.

I tested the same setup in a non-reactive environment and the times are way faster, showing that the only constraint is the data import, the rest of the operations take less than a second.

system.time(readRDS('./data.rds'))
   user  system elapsed 
  3.121   0.396   3.524 
> system.time(filtered_data <- data %>% filter(Z == 384.5))
   user  system elapsed 
  0.048   0.011   0.059 
> system.time(ggplot(filtered_data)+geom_histogram(aes(X)))
   user  system elapsed 
  0.001   0.000   0.001 

I think the problem is because the data import statement is executed every time an input changes, but I haven't found a way to stop this from happening.

Thanks

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
Esteban Angel
  • 51
  • 2
  • 4
  • 2
    Since you've run the `readRDS` function outside the `shinyServer` function it should only run once. Filtering and plotting a large dataset can take time. Try [profiling your shiny app](http://rstudio.github.io/profvis/examples.html#example-3---profiling-a-shiny-application) to see exactly where time is being spent. – MrFlick Oct 02 '18 at 14:50
  • How much of the data is left after filtering ? Maybe the guilty is ggplot ? Try with a simpler output. Data is also embedded in ggplot, you are using only X, maybe keep only X column when you filter data – Billy34 Jun 09 '20 at 11:26

1 Answers1

4
  • Ideally, you shouldn't need to load files this big into memory and use databases instead, have a look at these storage options from rstudio website.
  • What might improve the user interaction is the use of debounce where the selectInput will be delayed by some amount before firing

shinyServer(function(input, output,session) {

  selection <- reactive({
    input$select_z
  })

  # add a delay of 1 sec
  selected_z <- selection %>% debounce(1000)

  output$distPlot <- renderPlot({
    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == selected_z())

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))
  })
})
Pork Chop
  • 28,528
  • 5
  • 63
  • 77