-2

I have previously received help on this issue, and below is the code I was given. As a first time coding/using R, it is hard to understand/manipulate for my specific data set. Initially i was trying to make a scatter plot comparing rainfall (y) and humidity (x), but because the data consists of daily rainfall it consists of a lot of zeroes which makes the scatterplot useless. So now, I am trying to create a scatter plot which gets the average humidity per month (x) and sum of rainfall in that month (y). The dataset is extensive, so to make it easier I just limited myself to the first 5 locations on the dataset: Albury, Badgery Creek, Cobar, Coffs Harbour and Moree, which is around 3000 rows. At the bottom is an example of the first couple of rows of the data set. Is this possible to achieve, and if it is how would I go about adding a Regression to it in order to assess it? Thanks for any help

library(data.table)
library(tidyverse)
df <- as_tibble(fread('realdata.csv'))
df <- add_column(df, Month = format(as.Date(df$Date), '%B %Y'), .after = 'Date') %>%
  group_by(Month) %>%
  summarize(sum(`Rainfall`), mean(`Humidity3pm`))
colnames(df)[2:3] <- c('Total Rainfall (mm)', 'Average 3 PM Relative Humidity (%)')
ggplot(df, aes(x = `Total Rainfall (mm)`, y = `Average 3 PM Relative Humidity (%)`)) + geom_point()

This was the rainfall australia data I took from Kaggle: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

This is what I'm currently up to

This is the vim attempt

  • How your data looks? realdata.csv. paste first few lines with header. – kashiff007 Sep 10 '19 at 11:02
  • ok sure thing, i will try – vi noob vi5 Sep 10 '19 at 11:03
  • Date,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1/12/08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2/12/08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3/12/08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4/12/08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5/12/08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6/12/08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No – vi noob vi5 Sep 10 '19 at 11:15

1 Answers1

0

First change your date format in you file. Your file looks like:

Date,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1/12/08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2/12/08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3/12/08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4/12/08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5/12/08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6/12/08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No

Change it to:

Day,Month-Year,Location,MinTemp,MaxTemp,Rainfall,Humidity3pm,Pressure3pm,Temp3pm,RainToday,RainTomorrow 1,12-08,Albury,13.4,22.9,0.6,22,1007.1,21.8,No,No 2,12-08,Albury,7.4,25.1,0,25,1007.8,24.3,No,No 3,12-08,Albury,12.9,25.7,0,30,1008.7,23.2,No,No 4,12-08,Albury,9.2,28,0,16,1012.8,26.5,No,No 5,12-08,Albury,17.5,32.3,1,33,1006,29.7,No,No 6,12-08,Albury,14.6,29.7,0.2,23,1005.4,28.9,No,No

You can achieve this format using vim or simple editor. Here in "Date" column I changed the first "/" with "," and second "/" with "-". Open your file in vim by command

vim realdata.csv

Then type and enter after each line

:%s!/!,!g
:%s!,!-!
:%s!,!-!
:%s!-!,!

then make change on your header like shown in converted file and save your file by:

:x!

Then use R with following scripts

install.packages('dplyr')
library(dplyr)

df <- as.data.frame(read.table("realdata.csv",sep=",",header=TRUE))
df1 <- df %>% group_by(Month.Year) %>% summarize(Rainfall_sum=sum(Rainfall,na.rm=TRUE))
df2 <- df %>% group_by(Month.Year) %>% summarize(Humidity3pm_mean=mean(Humidity3pm,na.rm=TRUE))
df_f <- cbind(df1,df2)
ggplot(df_f, aes(x = df_f[,2], y = df_f[,4])) + geom_point()
kashiff007
  • 376
  • 2
  • 12