0

I have a large dataframe with 31181 observations and 9 variables. In this dataframe, the academic performance of students is registered.

I also have a second dataframe, in which each student is represented in 1 row. In this row I would like to store his/her results from the academic performance dataframe (i.e. grades).

Dataframe 1 (let's call it Academic) looks as follows:

      Programme Resits Student_ID Course_code Academic_year Course_Grade_Binned Graduated Master_Student Course.rating_M Rating.tutor_M Selfstudy_M
1            IB      0    9000006     ABC1198          2013                   B      TRUE              1             7.5            8.2        14.1
2            IB      0    9000006     ABC1192          2014                   B      TRUE              1             8.4            8.8        13.0
3            IB      0    9000006     ABC1277          2014                   A      TRUE              1             6.0            6.4        10.6
4            IB      0    9000006     ABC1448          2013                   B      TRUE              1             5.7            7.8        14.4
5            IB      0    9000006     ABC1120          2014                   B      TRUE              1             7.1            7.4        11.2
6            IB      0    9000006     ABC1362          2013                   B      TRUE              1             6.7            7.5        15.8
7            IB      0    9000006     ABC1213          2013                   C      TRUE              1             7.7            8.1        11.4
8            IB      0    9000006     ABC1382          2013                   B      TRUE              1             6.6            7.1        16.3
9            IB      0    9000006     ABC1108          2013                   C      TRUE              1             7.1            7.6        15.7
10           IB      1    9000006     ABC1329          2014                   B      TRUE              1             7.5            7.9        10.7
11           IB      0    9000006     ABC1126          2013                   B      TRUE              1             6.7            7.5        15.3
12           IB      0    9000006     ABC1003          2013                   B      TRUE              1             7.3            8.5        12.6
13           IB      0    9000014     ABC1309          2014                   B      TRUE              0             6.9            6.1        12.4
14           IB      0    9000014     ABC1198          2013                   A      TRUE              0             7.5            8.2        14.1
15           IB      0    9000014     ABC1277          2014                   A      TRUE              0             6.0            6.4        10.6
16           IB      0    9000014     ABC1448          2013                   A      TRUE              0             5.7            7.8        14.4
17           IB      0    9000014     ABC1362          2013                   B      TRUE              0             6.7            7.5        15.8
18           IB      0    9000014     ABC1213          2013                   B      TRUE              0             7.7            8.1        11.4
19           IB      0    9000014     ABC1152          2014                   A      TRUE              0             7.0            7.6        12.3
20           IB      0    9000014     ABC1382          2013                   A      TRUE              0             6.6            7.1        16.3
21           IB      0    9000014     ABC1108          2013                   B      TRUE              0             7.1            7.6        15.7
22           IB      0    9000014     ABC1455          2014                   A      TRUE              0             6.7            7.3        11.2
23           IB      0    9000014     ABC1126          2013                   B      TRUE              0             6.7            7.5        15.3
24           IB      0    9000014     ABC1003          2013                   A      TRUE              0             7.3            8.5        12.6
25           IB      1    9000028     ABC1213          2014                   C      TRUE              0             7.8            8.6        10.7
26           IB      0    9000028     ABC1198          2014                   B      TRUE              0             7.1            8.0        15.5

Now, I would like to extract the grade a student got for a course(ranging from A to F). As one can see from the Academic dataframe, student 9000006 got a B for ABC1198. I would like to copy this value for every course of each student to the second dataframe (let's call it NewData). The columns for these courses are already there. In total there are 162 courses so for each of them there is a course_name_Grade column. Since not all students have done all courses, some cells might be left with an NA.

The end result would look similar to this:

 Student_ID    Master    Resits     Programme    ABC1198_Grade    ABC1192_Grade ABC1277_Grade 
1    9000006      1        1            IB              B              B          A       
2    9000014      1        0            IB              A              NA          A       
jtsbattle
  • 185
  • 1
  • 8
  • 1
    what's wrong in doing `Academic %>% select(Student_ID, Course_code, Programme, Resits, Course_Grade_Binned) %>% spread(Course_code, Course_Grade_Binned) ` using `tidyverse` ? seems to give the output the way you want. – Ronak Shah Jun 13 '19 at 07:51
  • It gives me an error unfortunately: `Error: Duplicate identifiers for rows` – jtsbattle Jun 13 '19 at 08:31
  • Are you sure you are selecting the columns correctly? because when I use the data you have provided for `Academic` it works for me without any error. – Ronak Shah Jun 13 '19 at 09:01
  • 1
    I may have found where the errors come from. In some cases a course appears twice for a student (e.g. he failed the first sit and then did a resit). This should have been solved in an earlier step so I will check where this went wrong and then run your code again. – jtsbattle Jun 13 '19 at 09:22
  • @RonakShah your solution works very well now! One question however, rather than having the course_codes as column names. I'd like to have it as in the example. Hence: ABC1198_Grade, ABC1192_Grade etc. How would I do that? – jtsbattle Jun 13 '19 at 10:34
  • 1
    use `rename_at` for that. `Academic %>% select(Student_ID, Course_code, Programme, Resits, Course_Grade_Binned) %>% spread(Course_code, Course_Grade_Binned) %>% rename_at(vars(starts_with("ABC")), ~paste0(., "_Grades"))` – Ronak Shah Jun 13 '19 at 10:44

0 Answers0