I have a dataframe in R with four columns. I want to calculate the total percentage of citizens below 44 by state and by year.
How do I accomplish this in R, preserving the year and state column?
I was already able to use aggregate to get the sum and preserve the year/state/age values from a bigger dataset. I just just couldn't get the sum of the whole column, but now I'm not sure where to go from here to calculate percent.
|------------||------------||------------||------------|
| Year || State || Age || Pop |
|------------||------------||------------||------------|
| 2000 || VA || <44 || 150 |
|------------||------------||------------||------------|
| 2000 || VA || 44+ || 350 |
|------------||------------||------------||------------|
| 2000 || VA || Total || 500 |
Ideal Output:
|------------||------------||------------||------------|
| Year || State || Age || Pop |
|------------||------------||------------||------------|
| 2000 || VA || <44 || 0.3 |
|------------||------------||------------||------------|
| 2004 || VA || <44 || 0.2 |
|------------||------------||------------||------------|
| 2008 || VA || <44 || 0.4 |
This is the last bit of code I used to get the data frame to look how it does.
demos_sub <- aggregate(demos_sub$total_citizen_pop, by=list(Year=demos_sub$year, State=demos_sub$state, Age=demos_sub$age), FUN=sum)
names(demos_sub) <- c("year","state", "age", "total_citizen_pop")
demos_sub <- demos_sub[with(demos_sub, order(year)),]
demos_sub <- demos_sub[with(demos_sub, order(state)),]
I'm just not sure where to go from here to shrink it down further and calculate percentages.