Trying to make a scatterplot in R, but struggling to find the sum of monthly rainfall and average monthly humidity to plot

Question

I have previously received help on this issue, and below is the code I was given. As a first time coding/using R, it is hard to understand/manipulate for my specific data set. Initially i was trying to make a scatter plot comparing rainfall (y) and humidity (x), but because the data consists of daily rainfall it consists of a lot of zeroes which makes the scatterplot useless. So now, I am trying to create a scatter plot which gets the average humidity per month (x) and sum of rainfall in that month (y). The dataset is extensive, so to make it easier I just limited myself to the first 5 locations on the dataset: Albury, Badgery Creek, Cobar, Coffs Harbour and Moree, which is around 3000 rows. At the bottom is an example of the first couple of rows of the data set. Is this possible to achieve, and if it is how would I go about adding a Regression to it in order to assess it? Thanks for any help

library(data.table)
library(tidyverse)
df <- as_tibble(fread('realdata.csv'))
df <- add_column(df, Month = format(as.Date(df$Date), '%B %Y'), .after = 'Date') %>%
  group_by(Month) %>%
  summarize(sum(`Rainfall`), mean(`Humidity3pm`))
colnames(df)[2:3] <- c('Total Rainfall (mm)', 'Average 3 PM Relative Humidity (%)')
ggplot(df, aes(x = `Total Rainfall (mm)`, y = `Average 3 PM Relative Humidity (%)`)) + geom_point()

This was the rainfall australia data I took from Kaggle: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

This is what I'm currently up to

This is the vim attempt

How your data looks? realdata.csv. paste first few lines with header. — kashiff007, Sep 10 '19 at 11:02
Date,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1/12/08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2/12/08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3/12/08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4/12/08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5/12/08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6/12/08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No — vi noob vi5, Sep 10 '19 at 11:15

kashiff007 · Answer 1 · 2019-09-10T13:35:15.533

0

First change your date format in you file. Your file looks like:

Date,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1/12/08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2/12/08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3/12/08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4/12/08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5/12/08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6/12/08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No

Change it to:

Day,Month-Year,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1,12-08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2,12-08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3,12-08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4,12-08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5,12-08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6,12-08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No

You can achieve this format using vim or simple editor. Here in "Date" column I changed the first "/" with "," and second "/" with "-". Open your file in vim by command

vim realdata.csv

Then type and enter after each line

:%s!/!,!g
:%s!,!-!
:%s!,!-!
:%s!-!,!

then make change on your header like shown in converted file and save your file by:

:x!

Then use R with following scripts

install.packages('dplyr')
library(dplyr)

df <- as.data.frame(read.table("realdata.csv",sep=",",header=TRUE))
df1 <- df %>% group_by(Month.Year) %>% summarize(Rainfall_sum=sum(Rainfall,na.rm=TRUE))
df2 <- df %>% group_by(Month.Year) %>% summarize(Humidity3pm_mean=mean(Humidity3pm,na.rm=TRUE))
df_f <- cbind(df1,df2)
ggplot(df_f, aes(x = df_f[,2], y = df_f[,4])) + geom_point()

edited Sep 10 '19 at 13:35

answered Sep 10 '19 at 11:35

kashiff007

376
2
12

struggling to import the dplry library apparently doesn't exist, I'll keep trying – vi noob vi5 Sep 10 '19 at 11:48
install.packages('dplyr') – kashiff007 Sep 10 '19 at 11:51
I've done that a couple of times now, and tried removing it re-installing, restarting with no success, currently looking at a different thread with the same issue still no luck https://stackoverflow.com/questions/54621706/error-in-librarydplyr-there-is-no-package-called-dplyr – vi noob vi5 Sep 10 '19 at 11:56
now stuck in this cycle in which R keeps telling me one or more of the packages will be updated by the current installation. Tells me to restart R, so I clicked yes. Presented with the same prompt 2 seconds later – vi noob vi5 Sep 10 '19 at 12:05
I am now trying to reinstall R studio – vi noob vi5 Sep 10 '19 at 12:10
Finally got dplyr package to work, having an issue with floor_date saying function doesn't exist – vi noob vi5 Sep 10 '19 at 12:22
Hey, I have update the solution. But you need to make little change in your files because your data format standard and separated by slash "/". – kashiff007 Sep 10 '19 at 13:07
sounds good man, i'll try it now once i figure out the / thing – vi noob vi5 Sep 10 '19 at 13:19
if it doesnt work let me know or if works please upvote ;) – kashiff007 Sep 10 '19 at 13:21
What's an easy way to change the file so that the date is separated by a /. Oh change so separated by dash? – vi noob vi5 Sep 10 '19 at 13:26
Yes, so there are a few rows in the "Date" column so i can't manually change the dates to how you said. So how do I change the first / to a , and and second / with - – vi noob vi5 Sep 10 '19 at 13:32
I have added the solution in the answer check it out – kashiff007 Sep 10 '19 at 13:36
The vim realdata.csv returned unexpected symbol – vi noob vi5 Sep 10 '19 at 13:40
Use vim directly from terminal not from R console – kashiff007 Sep 10 '19 at 13:42
Ah i will try that now – vi noob vi5 Sep 10 '19 at 13:43
I've just tried to do the vim and i posted a photo of the result, but the pattern after typing the first line did not work – vi noob vi5 Sep 10 '19 at 13:47
My dear brother please open existing file name "realdata.csv" not a new file. It seems you are different folder. – kashiff007 Sep 10 '19 at 13:51
Yeah i thought that might of been the problem. I'm sorry. What should the extension on the excel file be? .csv? – vi noob vi5 Sep 10 '19 at 13:54
The file is called realdata.csv, but on the excel file it still looks like an excel file and not a csv file. Not sure – vi noob vi5 Sep 10 '19 at 14:00
Ok got the csv uploaded properly this time – vi noob vi5 Sep 10 '19 at 14:15
Great what's the status now – kashiff007 Sep 10 '19 at 17:54
sorry man i fell asleep, i keep trying to upload the proper csv file but i don't think it should be in the table format – vi noob vi5 Sep 10 '19 at 23:13
take your time to learn R from here http://swcarpentry.github.io/r-novice-gapminder/ It will definitely help you in future. – kashiff007 Sep 11 '19 at 08:37

Trying to make a scatterplot in R, but struggling to find the sum of monthly rainfall and average monthly humidity to plot

1 Answers1