0

I have a music data in R and I have to determine the most popular song based on the number of streams for one specific artist. I have to create a new data.frame that only contains the songs from this artist, save it and sort it by number of streams.

The data provides a list of songs and includes columns, such as the number of streams, name of song, name of artist etc. I started like this, is there a simpler way to do it?

filter(music_data, artistName == "Billie Eilish")   
billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))   
billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),] 
print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))

Thank you!

dcarlson
  • 10,936
  • 2
  • 15
  • 18
Ana
  • 15
  • 3
  • You will need to provide a sample of the data using `dput()`. Your description does not provide enough information. Is the initial data frame just a list of songs or is it a list of songs and the number of times the song was streamed? – dcarlson Oct 13 '22 at 14:45
  • What have you tried and where did you get stuck? Have you looked at the FAQ [on sorting data](https://stackoverflow.com/q/1296646/903061)? – Gregor Thomas Oct 13 '22 at 14:48
  • @Ana Please put question-relevant information into the question via `edit`. Makes it far easier for everyone to understand the problem. Also, comments can change/vanish at any time. – Andre Wildberg Oct 13 '22 at 14:58
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems. One way of doing this is by using the `dput` function. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Oct 13 '22 at 23:46

1 Answers1

0

The code you've added looks pretty good. Here's some comments:

filter(music_data, artistName == "Billie Eilish") 
# this prints its result when you run it, but the result is not
# assigned with `<-` or `=`, so it is not saved.
# it's good to run code like this in your console, but you don't need it
# in the script file.


billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))   
# here you repeat the code above, assigning it. This emphasizes that the 
# first line could be deleted. Also `data.frame()` is unnecessary. 
# Change to `billie_songs <- filter(music_data, artistName == "Billie Eilish")`

billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),] 
# this is fine. This is a great way to order rows of data using base R.
# You used `dplyr` above with `filter`, the dplyr way would have you use
# `arrange(billie_songs, desc(streams))` instead

print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))
# The `print()` is unnecessary, but this is good

If I were writing it I would use all dplyr functions and not save the result each step, instead using the %>% pipe to chain the commands together, like this:

music_data %>%
  filter(artistName == "Billie Eilish") %>%
  arrange(desc(streams)) %>%
  head(1) %>%
  pull(trackName) %>%
  paste("Most Popular Song:", .)

Or I might use the dplyr convenience function slice_max that pulls the row with the maximum value of a particular column:

music_data %>%
  filter(artistName == "Billie Eilish") %>% 
  slice_max(order_by = streams, n = 1) %>%
  pull(trackName) %>%
  paste("Most Popular Song:", .)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • it makes sense, thank you! I would have one more question. How can I include songs that feature both Billie Eilish and other artists, for example Billie Eilish feat. etc...? – Ana Oct 14 '22 at 21:11
  • With `library(stringr)`, you can `filter(str_detect(artistName, pattern = "Billie Eilish"))` instead of the strictly `== "Billie Eilish"` we used above. – Gregor Thomas Oct 15 '22 at 01:14