1

My data : Information about 2 families: "Guillou" and "Cleach" . Person name (person), father name (father) and rank in the family (level). I reality names are note important. I work with numeric id for avoid homonym's problems.

            person           father         level
            Guillou Arthur     NA           1       
            Cleach  Marc       NA           1       
            Guillou Eric    Guillou Arthur  2       
            Guillou Jacques Guillou Arthur  2       
            Cleach Franck   Cleach Marc     2       
            Cleach Leo      Cleach Marc     2       
            Cleach Herbet   Cleach Leo      3       
            Cleach Adele    Cleach Herbet   4       
            Guillou Jean    Guillou Eric    3       
            Guillou Alan    Guillou Eric    3

This data frame is based on @Moody_Mudskipper answer (previous question about levels in family tree this post

Here'are the instructions that return the table :

data:

person <- c("Guillou Arthur",
          "Cleach Marc",
          "Guillou Eric",
          "Guillou Jacques", 
          "Cleach Franck",
          "Cleach Leo",
          "Cleach Herbet",
          "Cleach Adele",
          "Guillou Jean",
          "Guillou Alan" )
father <- c(NA, NA, "Guillou Arthur" , "Guillou Arthur", "Cleach Marc", "Cleach Marc", "Cleach Leo", "Cleach Herbet", "Guillou Eric", "Guillou Eric")


 family <- data.frame(person, father, stringsAsFactors = FALSE)

recursive function :

 father_line <- function(x){
 dad <- subset(family,person==x)$father
 if(is.na(dad)) return(x)
 c(x,father_line(dad))
 }

Example of function's output:

  father_line ("Guillou Alan")
 "Guillou Alan"   "Guillou Eric"   "Guillou Arthur"

table :

 library(tidyverse)
 family %>%
 mutate(family_line = map(person,father_line),
     level = lengths(family_line),
     patriarch = map(family_line,last)) %>%
  select(person,father,level)

My question : How can I differentiate the two families based on person / father relationships ? Considering I can't use familie's names : in my reproductible example, families have different names but not in reality

Expected output :

            person           father         level   family
            Guillou Arthur     NA           1       1
            Cleach  Marc       NA           1       2
            Guillou Eric    Guillou Arthur  2       1
            Guillou Jacques Guillou Arthur  2       1
            Cleach Franck   Cleach Marc     2       2
            Cleach Leo      Cleach Marc     2       2
            Cleach Herbet   Cleach Leo      3       2
            Cleach Adele    Cleach Herbet   4       2
            Guillou Jean    Guillou Eric    3       1
            Guillou Alan    Guillou Eric    3       1

With ids

    # data
    person <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    father <- c(NA, NA, 1 , 1, 2, 2, 6, 7, 3, 3)


     family <- data.frame(person, father)



     # function
     father_line <- function(x){
     dad <- subset(family,person==x)$father
     if(is.na(dad)) return(x)
     c(x,father_line(dad))
     }




     library(tidyverse)
     family %>%
     mutate(family_line = map(person,father_line),
         level = lengths(family_line),
         patriarch = map(family_line,last)) %>%
      select(person,father,level)
Wilcar
  • 2,349
  • 2
  • 21
  • 48

1 Answers1

4

You might want to take a look at package igraph.

Before using it, you need to change the NAs, I'm making an assumption that you cannot have 2 persons from the same family with NA.
So:

roots <- family[is.na(family)] <- seq(sum(is.na(family)))

Then you create a graph (with the different connections), the first column needs to be the fathers:

library(igraph)
family_tree <- graph_from_data_frame(family[, 2:1])

You can visulise it:

plot(family_tree)

enter image description here

Then you can compute the levels and family with distances to root:

tab_roots <- sapply(roots, function(root) distances(family_tree, family$person, root))

You had to the family data.frame:

family$level <- apply(tab_roots, 1, min)
family$family <- apply(tab_roots, 1, function(d) which(d!=Inf))
family
#            person         father level family
#1   Guillou Arthur              1     1      1
#2      Cleach Marc              2     1      2
#3     Guillou Eric Guillou Arthur     2      1
#4  Guillou Jacques Guillou Arthur     2      1
#5    Cleach Franck    Cleach Marc     2      2
#6       Cleach Leo    Cleach Marc     2      2
#7    Cleach Herbet     Cleach Leo     3      2
#8     Cleach Adele  Cleach Herbet     4      2
#9     Guillou Jean   Guillou Eric     3      1
#10    Guillou Alan   Guillou Eric     3      1
Jaap
  • 81,064
  • 34
  • 182
  • 193
Cath
  • 23,906
  • 5
  • 52
  • 86