1

I am quite new in R and has to work out an example of an operation that makes a new row after a certain string has occured in a single vector.

The vector is illustrated here:

address_list <- c("Road","Number","City","Zipcode","Telephone","House","Road","Number","City","Zipcode","House","Road","Number","City","Zipcode","Telephone","House")

The operation is to make a new row for every time "House" occurs. Leaving the vector into a matrix that goes:

Road,Number,City,Zipcode,Telephone,House
road,Number,City,Zipcode,,House
road,Number,City,Zipcode,Telephone,House

I do not know anything about Excel or VBA. But I could imagine that this question was sort of the same operation as I am looking to construct in R.

VBA example

I came up with some pseudo-code that might give a more intuitive example of how i should think in order to solve this operation.

gsub(list, \s, ",")
For 
  every "House" in list as i
rbind(list, \n, i)
Community
  • 1
  • 1
MichaelR
  • 152
  • 8

1 Answers1

3

We get the unique elements from the vector ('address_list'), loop over those and extract the elements in 'address_list' (or use split i.e. lst <- split(address_list, address_list)), pad NA at the end for list elements that have length less than the maximum length, cbind it to create a matrix ('m1') and paste with the sequence created using ave.

 lst <- lapply(unique(address_list), function(x) address_list[address_list==x])
 m1 <- do.call(cbind, lapply(lst, `length<-`, max(lengths(lst))))
 m1[] <- ifelse(is.na(m1), NA, paste0(m1, ave(m1, m1, FUN = seq_along)))
 m1
 #     [,1]    [,2]      [,3]    [,4]       [,5]         [,6]    
 #[1,] "Road1" "Number1" "City1" "Zipcode1" "Telephone1" "House1"
 #[2,] "Road2" "Number2" "City2" "Zipcode2" "Telephone2" "House2"
 #[3,] "Road3" "Number3" "City3" "Zipcode3" NA           "House3"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think I made a mistake, naming the vector as "list", i am not sure whethever "list" is data or function. – MichaelR Sep 30 '16 at 09:12
  • @Michael `list` is a function – akrun Sep 30 '16 at 09:16
  • I have made some changes in the question. I realized I was making some communicative mistakes that was understood very literal. Besides named the vector adress_list for easier recognitio. Sorry in advance! – MichaelR Sep 30 '16 at 09:29
  • @Michael It's okay.. Please check the output as I updated with your new name – akrun Sep 30 '16 at 09:29
  • could you elaborate on the `function(x) var[var==x]` a bit? I can't find any documentation that enlighten the function you created. The data I am processing comes from a txt-file that i import with `as.data.frame(read.table())` i think this may be why i cannot get the preferred output in the end. – MichaelR Oct 01 '16 at 07:53
  • @Michael The `address_list==x` returns a logical vector of TRUE/FALSE values, extract the elements of 'address_list' that corresponds to the TRUE values with `address_list[..` – akrun Oct 01 '16 at 07:58
  • I found the the column is of another data-construction, so if I made the address_list as `data.frame(list(c("Road","Number","City","Zipcode","Telephone","House","Road","Number","City","Zipcode","House","Road","Number","City","Zipcode","Telephone","House")))` What would be had to made different? – MichaelR Oct 01 '16 at 09:27
  • @Michael In that case `address_list[[1]] <- as.character(address_list[[1]]) ; lst <- lapply(unique(address_list[[1]]), function(x) address_list[[1]][address_list[[1]]==x])` – akrun Oct 01 '16 at 09:44
  • It doublicates some of the strings, that way i get a table that is enormous and most of then are NA's. Maybe the fact that i can't replicate the table i work with in terms of creating a column is the center of the issue. I don't get what is the difference when the two tables are alike and the case is the same. The table i sit with is just a *.txt that contains the words we work with. I import it by `as.data.frame(read.csv(file, encoding="CP1254", header=FALSE, dec=",", quote="")` .... wouldn't that be the same as `data.frame(list(c()))` ? – MichaelR Oct 01 '16 at 10:15
  • @Michael The `read.csv` create a data.frame`, you don't need `as.data.frame` there. as it is already a data.frame – akrun Oct 01 '16 at 11:43