1

Given this dataframe

row | time | name
-----------------
1   | 2 min| bob
2   | 7 min| john
3   | 1 hr 5 min| jess

I want to process the time column to a numeric column that holds the number of minutes. I have a function to process the string into a number, but when I try to apply it to mutate/transform the original data frame, data.frame(apply(dataframe, 2, parse_str)), it crashes or just doesn't work. Once I can get the transformation function applied, I plan on converting the character column to numeric via df = as.numeric(as.character(dataframe$time)), but haven't tested it yet.

Any ideas on how I can get my preprocessing function to correctly mutate/transform/create a new dataframe?

James L.
  • 12,893
  • 4
  • 49
  • 60
  • 2
    `dataframe$time2 <- parse_str(dataframe$time)`? `apply` is for doing the same thing to _all_ columns, and typically you wouldn't use it at _all_ on a data frame, only a matrix. – joran Feb 13 '18 at 21:07
  • 1
    It's unlikely that you would use the `apply` function. When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show the code you tried and we can help you fix it. – MrFlick Feb 13 '18 at 21:07
  • 1
    an alternative option could be to append a `0 hr` to everything that doesen't have hours, i.e. regex it as something like `^d+ hr`, using [tag:stringi] and then using [tag:lubridate]' s `hm()` to convert them when they are in the right format, i.e. `lubridate::hm(c('0 hr 2 min', '0 hr 7 min', '1 hr 5 min'))` .. – Eric Fail Feb 13 '18 at 21:37

2 Answers2

2

As the comments have said, this is best done without apply:

> df <- data.frame(time=c('2 min', '7 min', '1 hr 5 min'), name = c('bob', 'john', 'jess'))
> df
        time name
1      2 min  bob
2      7 min john
3 1 hr 5 min jess
> df$time <- as.numeric(parse_str(df$time))
> df
        time name
1          2  bob
2          7 john
3         65 jess

If your parse_str function returns numeric as you say, then you don't even need the as.numeric call.

C. Braun
  • 5,061
  • 19
  • 47
  • Awesome, thank you! Spent hours on this not know what to do lol, so simple. – James L. Feb 13 '18 at 21:32
  • parse_str acts on a single string, not a column. When I try this code it's setting every row to the result of the first row's `parse_str` . Any idea on how I can apply it to every row? – James L. Feb 13 '18 at 22:22
  • I would suggest making `parse_str` work on a character vector of length greater than 1 (it's good practice to vectorize functions that can be). If you don't want to do that, then how about `df$time <- sapply(df$time, parse_str)`? – C. Braun Feb 13 '18 at 22:30
  • Thanks! Any tips on how to vectorize functions? Should I just add a loop inside the function? That seems like a slow, inelegant way to do it. – James L. Feb 13 '18 at 22:31
  • 1
    [This link](http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html) is a good place to start and has lots more references at the bottom. Basically you can vectorize by only using R's built in vectorized functions, but if your logic is too complicated it's not really worth the struggle. – C. Braun Feb 13 '18 at 22:40
1

Here's another option using and , for anyone who might want to reproduce your results, but do not have your function. Using the date from C. Braun's answer,

# install.packages(c("tidyverse", "lubridate"), dependencies = TRUE)
library(tidyverse)
library(lubridate)

df %>% mutate(
            `t formated` = str_replace(time, "(^[0-9] min)", "0 hr \\1"),
            `t hours minues` = hm(`t formated`),
            `t duration` = as.duration(`t hours minues`),
            `t numeric` = as.numeric(`t duration`, "minutes")
            ) %>% as_tibble()
#> # A tibble: 3 x 6
#>         time   name `t formated` `t hours minues`        `t duration` `t numeric`
#>       <fctr> <fctr>        <chr>     <S4: Period>      <S4: Duration>       <dbl>
#> 1      2 min    bob   0 hr 2 min            2M 0S   120s (~2 minutes)           2
#> 2      7 min   john   0 hr 7 min            7M 0S   420s (~7 minutes)           7
#> 3 1 hr 5 min   jess   1 hr 5 min         1H 5M 0S 3900s (~1.08 hours)          65
Eric Fail
  • 8,191
  • 8
  • 72
  • 128