How to Calculate Percentage Based On Other Row

Question

I have a dataframe in R with four columns. I want to calculate the total percentage of citizens below 44 by state and by year.

How do I accomplish this in R, preserving the year and state column?

I was already able to use aggregate to get the sum and preserve the year/state/age values from a bigger dataset. I just just couldn't get the sum of the whole column, but now I'm not sure where to go from here to calculate percent.

|------------||------------||------------||------------|
|    Year    ||    State   ||    Age     ||    Pop     |
|------------||------------||------------||------------|
|    2000    ||     VA     ||    <44     ||    150     |
|------------||------------||------------||------------|
|    2000    ||     VA     ||    44+     ||    350     |
|------------||------------||------------||------------|
|    2000    ||     VA     ||    Total   ||    500     |

Ideal Output:

|------------||------------||------------||------------|
|    Year    ||    State   ||    Age     ||    Pop     |
|------------||------------||------------||------------|
|    2000    ||     VA     ||    <44     ||    0.3     |
|------------||------------||------------||------------|
|    2004    ||     VA     ||    <44     ||    0.2     |
|------------||------------||------------||------------|
|    2008    ||     VA     ||    <44     ||    0.4     |

This is the last bit of code I used to get the data frame to look how it does.

demos_sub <- aggregate(demos_sub$total_citizen_pop, by=list(Year=demos_sub$year, State=demos_sub$state, Age=demos_sub$age), FUN=sum)
names(demos_sub) <- c("year","state", "age", "total_citizen_pop")
demos_sub <- demos_sub[with(demos_sub, order(year)),]
demos_sub <- demos_sub[with(demos_sub, order(state)),]

I'm just not sure where to go from here to shrink it down further and calculate percentages.

Can you show us what you have tried so far and what problem you faced please ? — godot, Apr 21 '19 at 22:04
Sorry should have been more precise but I meant can you add your code? It is much easier to explain if we can see what you're stuck on... (You can use a simpler dataset to get to your point.) — godot, Apr 21 '19 at 22:12
I added the code I last did to get to this point, but I don't even know where to go from here to move forward. I don't think aggregate will do the trick again... — carousallie, Apr 21 '19 at 22:20
Its difficult to say if we don't have any reproducible example, maybe you are looking for `?prop.table`. — P. Paccioretti, Apr 21 '19 at 22:25
Once you have filtered your data of all that must be removed, you should have a look at dplyr group_by & summarize function. — godot, Apr 21 '19 at 23:09
can you add a reproducible example along with expected output. Update your question with `dput(demos_sub)`. — Ronak Shah, Apr 21 '19 at 23:21
Possible duplicate of [Getting both column counts and proportions in the same table in R](https://stackoverflow.com/questions/9438193/getting-both-column-counts-and-proportions-in-the-same-table-in-r) — NelsonGon, Apr 22 '19 at 02:56

godot · Accepted Answer · 2019-04-22T07:34:43.393

0

This is the beginning of a solution:

library(dplyr)

Year <- rep(2000, 6)
State <- c(rep("VA", 4), rep("MA", 2))
Age <- c("<44", "44+", "44+", "<44", "<44", "44+")
Pop <- c(150, 350, 500, 200, 100, 100)

df <- data.frame(State = State, Age = Age, Pop = Pop, Year= Year)

df %>% filter(Age != "Total") %>% group_by(Year, State)  %>% summarize(Pop44 = sum(Pop[Age=="<44"]) / sum(Pop))

You don't have to filter the "Total" category but it's usually not a good idea to have a "total" category (better have a column for that)

edited Apr 22 '19 at 07:34

answered Apr 21 '19 at 23:36

godot

1,550
16
33

This is a great starting point, thank you so much! And this isn't my data, but I definitely wouldn't have put a put total category in the same column. – carousallie Apr 22 '19 at 18:13

How to Calculate Percentage Based On Other Row

1 Answers1