0

I am trying to measure political ideology on Twitter (by using Rtweet). I now have a dataframe consisting of +100 politicians user_id's along with two ideal point scores on 'factor 1' and 'factor 2' (both factors have a range of 1-4). It looks like this (called kandidat):

Navne Faktor 1 Faktor 2
"Politician1" 3.5 1.0
"Politician2" 2.0 4.0
Etc... X X

I would then like to detect if random Twitter users follow one or more of the politicians from my dataset. If they e.g. follow two of the politicians in my dataset - "Politician1" and "Politician2" - I will then assign a mean of the two politicians ideal point scores on the two factors to the user. An example of a Twitteruser following these two politicians could then be factor 1 = (3.5+1.0)/2 = 2.25 and factor 2 = (2.0+4.0)/2 = 3.00.

So I've tried to create a simplified loop including only two journalists from Twitter called 'testusers', who both follow a large share of the politicians in my dataset. The loop should then check whether the respective journalists follow one or more of the politicians: If they follow, then the loop should assign the mean of the values like described above. If not, they should be automatically removed from the dataset. The loop below does run, but unfortunately provides a wrong output (see table below the code):

### loop ###

for(i in 1:ncol(testusers)){
  
  #pick politician1 of dataset
  politician1_friends <- get_friends(testusers$Navne[1])
  
  #intersect with candidate data
  ids_intersect = intersect(politician1_friends$user_id, kandidat$user_id)
  if(length(ids_intersect == 0)){
    testusers[i, "anyFriends"] <- FALSE #user has no friends in the politicians df
  } else {
    #assign values to user based on intersect
    politicians_friends = kandidat[kandidat$user_id %in% ids_intersect,]
    s1_mean <- mean(politicians_friends$faktor1, na.rm=TRUE)
    s2_mean <- mean(politicians_friends$faktor2, na.rm=TRUE)
    testusers[i, "faktor1"] <- s1_mean
    testusers[i, "faktor2"] <- s2_mean
    testusers[i, "anyFriends"] <- TRUE #user has friends in the politicians dataset
  }
  # etc.
}

The code above gives me this output:

Navne anyFriends
"Politician1" FALSE
"Politician2" NA

The structure of testusers is: structure(list(Navne = c("Politician1", "Politician2"), anyFriends = c(FALSE, NA)), row.names = 1:2, class = "data.frame"). And I can't post the whole structure of kandidat, since it's too big: but it's a dataframe consisting of politicians (with all the informations from the function look_up() like user_id, screen_name, text etc.

So I guess the code needs som minor changes, but I haven't figured them out yet. Ideally the output (df) should consist of "only" three dataframe columns: 1) UserID/Name 2) Faktor1 3) Faktor2?

StatGuy25
  • 15
  • 3

1 Answers1

0

I think what you want is another data.frame or so containing your users, and their 'scores'. R likes to work with such data frames rather than with lists.

I am now assuming, that you have a data.frame containing your politicians etc. and their scores along the two dimensions, as well as a data.frame with the users you're interested in, such like

kandidat <- data.frame(user_id = 1:2, name = c("Politician1", "Politican2"), Faktor1 = c(3.5, 2), Faktor2 = c(1,4))
my_users <- data.frame(name = c("Max", "Mara"))

Now if you want to work with a for-loop, you can do something like


find_f <- function(df){
  F1_mean <- c()
  F2_mean <- c()
  anyFriends <- c()
  
  for(i in 1:nrow(df)){
    #pick user1 of dataset
    user_friends <- get_friends(df$name[i])
    #intersect with our candidatedata
    ids_intersect = intersect(user_friends$user_id, politicians$user_id)
    if(length(ids_intersect)==0){
      anyFriends <- c(anyFriends, FALSE) # User has no friends in the politicians df
    } else {
      #assign values to user based on intersect - don't know what to do here
      kandidat_friends = kandidat[kandidat$user_id %in% ids_intersect,]
      F1_mean <- c(F1_mean, mean(kandidat_friends$Faktor1, na.rm=TRUE))
      F2_mean <- c(F2_mean, mean(kandidat_friends$Faktor2, na.rm=TRUE))
      anyFriends <- c(anyFriends, TRUE) # user has friends in the politicans dataset
    }
  }
  df$Faktor1 <- F1_mean
  df$Faktor2 <- F2_mean
  df$anyFriends <- anyFriends
  return(df[df$anyFriends,])
}

my_users2 <- find_f(my_users)

This is by far not a very brief solution, but I think it is easy to understand. The most important thing is, that you work with data.frames rather than lists, it is much easier in R. In each iteration, we get the friends of the user, see whether there is any intersection with the politicians. If not, we assign the boolean value FALSE to the anyFriends variable in the my_users dataframe, so we can easily filter them out in the end. If there is an intersection, we take the mean of the two scores of the selected politicians and assign them to the respective user entry.

No need for the IDEOLOGISCORE list in my opinion. Also, please be aware that I didn't test the code above and it might be that there are typos. Just check whether it works for you :)

Ben
  • 784
  • 5
  • 14
  • Can you please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Ben Dec 14 '20 at 10:11
  • Anyone commenting on this would need to know the structure of `testusers` and `kandidat`. For example, use `dput(testusers[1:5,])` to produce an output that anyone could work with. – Ben Dec 14 '20 at 14:58
  • The structures of `testusers` and `kandidat` are now added to the original question along with simplification. Unfortunately the loop still isn't assigning values the right way. – StatGuy25 Dec 16 '20 at 10:27
  • I have edited my post with a solution that should work (I have tested it with an example `user_friends` dataframe, but since I don't know how exactly `user_friends` looks for you, I can't say for sure. – Ben Dec 16 '20 at 10:52
  • The loop is now working as intended when using only two people! However, when I draw a random sample of e.g. 250 people and try to run the loop, I immediately end up exceeding the get_friends 15 people networks limit ("Warning: Rate limit exceeded - 88"). So I think the last step is to include some kind of _retryonratelimit = TRUE_ or _Sys.sleep()_ function? I think this should be either in continuation of the `the user_friends <- get_friends(df$name[i])` command or the `my_users2 <- find_f(my_users)`, but I can't get it to work at all. The code for my loop is identical to the one @Ben posted. – StatGuy25 Dec 17 '20 at 17:11
  • NOTE: The ("Warning: Rate limit exceeded - 88") occurs when running the `my_users2 <- find_f(my_users)` command at the bottom. – StatGuy25 Dec 17 '20 at 17:29
  • You should consider reading up on `try` / `tryCatch` in this regard to catch the error thrown by the Twitter API regarding the rate limit. In case of an error you could then call a 15 minute sleep or something like that. But this is another topic. – Ben Dec 17 '20 at 20:50
  • A sys.sleep (60*15) in the bottom of the loop? Or when running the `my_users2 <- find_f(my_users)` afterwards? – StatGuy25 Dec 18 '20 at 11:56