I have a dataframe with the following columns:
> colnames(my.dataframe)
[1] "id" "firstName" "lastName"
[4] "position" "jerseyNumber" "currentTeamId"
[7] "currentTeamAbbreviation" "currentRosterStatus" "height"
[10] "weight" "birthDate" "age"
[13] "birthCity" "birthCountry" "rookie"
[16] "handednessShoots" "college" "twitter"
[19] "currentInjuryDescription" "currentInjuryPlayingProbability" "teamId"
[22] "teamAbbreviation" "fg2PtAtt" "fg3PtAtt"
[25] "fg2PtMade" "fg3PtMade" "ftMade"
[28] "fg2PtPct" "fg3PtPct" "ftPct"
[31] "ast" "tov" "offReb"
[34] "foulsDrawn" "blkAgainst" "plusMinus"
[37] "minSeconds"
And here is my code that isn't working:
my.dataframe %>%
dplyr::group_by(id) %>%
dplyr::summarise_at(vars(firstName:currentInjuryPlayingProbability), funs(min), na.rm = TRUE) %>%
dplyr::summarise_at(vars(fg2PtAtt:minSeconds), funs(sum), na.rm = TRUE) %>%
vars(), funs(min), na.rm = TRUE) %>%
dplyr::summarise(teamId = paste(teamId), teamAbbreviation = paste(teamAbbreviation))
First I group by id (which is not a unique column in my dataframe, despite it being called id). For the next 19 columns up until currentInjuryPlayingProbability, these columns are always the same when grouped_by the ID, and so I use the min
function to summarise / grab the value.
Next, I want to summarise all columns from fg2PtAtt
to the end with the mean value (these columns are all numeric / integer).
Lastly, for the columns teamId and teamAbbreviation (which are not the same when grouped_by id), I want to paste them into a single string each with summarise.
My approach doesn't work because I don't think I can call summarise_at, followed by another summarise_at, followed by a summarise. By the time the second summarise_at is called, the columns trying to be summarised were already removed by the first summarise_at
Any help with this is appreciated!I will update with a subset of my dataframe shortly that code can be tested on.
EDIT:
dput(my.dataframe)
structure(list(id = c(10138L, 9466L, 9360L, 9360L), firstName = c("Alex",
"Quincy", "Luke", "Luke"), lastName = c("Abrines", "Acy", "Babbitt",
"Babbitt"), currentInjuryPlayingProbability = c(NA_character_,
NA_character_, NA_character_, NA_character_), teamId = c(96L,
84L, 91L, 92L), teamAbbreviation = c("OKL", "BRO", "ATL", "MIA"
), fg2PtAtt = c(70L, 73L, 57L, 2L), fg3PtAtt = c(221L, 292L,
111L, 45L), minSeconds = c(67637L, 81555L, 34210L, 8676L)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
my.dataframe
id firstName lastName currentInjuryPlayingProbability teamId teamAbbreviation fg2PtAtt fg3PtAtt minSeconds
<int> <chr> <chr> <chr> <int> <chr> <int> <int> <int>
1 10138 Alex Abrines <NA> 96 OKL 70 221 67637
2 9466 Quincy Acy <NA> 84 BRO 73 292 81555
3 9360 Luke Babbitt <NA> 91 ATL 57 111 34210
4 9360 Luke Babbitt <NA> 92 MIA 2 45 8676
here is a shorted example with only 9 columns, but with enough data to highlight the problems. The resulting dataframe should look like this:
id firstName lastName currentInjuryPlayingProbability teamId teamAbbreviation fg2PtAtt fg3PtAtt minSeconds
<int> <chr> <chr> <chr> <chr> <chr> <int> <int> <int>
1 10138 Alex Abrines <NA> 96 OKL 70 221 67637
2 9466 Quincy Acy <NA> 84 BRO 73 292 81555
3 9360 Luke Babbitt <NA> 91, 92 ATL, MIA 57 156 42886