I have previously received help on this issue, and below is the code I was given. As a first time coding/using R, it is hard to understand/manipulate for my specific data set. Initially i was trying to make a scatter plot comparing rainfall (y) and humidity (x), but because the data consists of daily rainfall it consists of a lot of zeroes which makes the scatterplot useless. So now, I am trying to create a scatter plot which gets the average humidity per month (x) and sum of rainfall in that month (y). The dataset is extensive, so to make it easier I just limited myself to the first 5 locations on the dataset: Albury, Badgery Creek, Cobar, Coffs Harbour and Moree, which is around 3000 rows. At the bottom is an example of the first couple of rows of the data set. Is this possible to achieve, and if it is how would I go about adding a Regression to it in order to assess it? Thanks for any help
library(data.table)
library(tidyverse)
df <- as_tibble(fread('realdata.csv'))
df <- add_column(df, Month = format(as.Date(df$Date), '%B %Y'), .after = 'Date') %>%
group_by(Month) %>%
summarize(sum(`Rainfall`), mean(`Humidity3pm`))
colnames(df)[2:3] <- c('Total Rainfall (mm)', 'Average 3 PM Relative Humidity (%)')
ggplot(df, aes(x = `Total Rainfall (mm)`, y = `Average 3 PM Relative Humidity (%)`)) + geom_point()