0

head(covreage)

chr     Pos             Val
X       129271111       10
X       129271112       10
X       129271113       10
X       129271114       10
X       129271115       10
X       129271116       11
X       129271117       11
X       129271118       11
X       129271119       11
X       129271120       11
X       129271121       11
X       129271122       11
X       129271123       11
X       129271124       11
X       129271125       11
X       129271126       11
X       129271127       11
X       129271128       11
X       129271129       11
X       129271130       11
X       129271131       11
X       129271132       11
X       129271133       11

head(annotation)

chr Region  start       end         Gene    status
X   Exon    129271053   129271110   AIFM1   NO
X   Exon    129270618   129270706   AIFM1   NO
X   Exon    129270020   129270160   AIFM1   NO
X   Exon    129267288   129267430   AIFM1   NO
X   Exon    129265650   129265774   AIFM1   NO
X   Exon    129263945   129264141   AIFM1   NO
X   Exon    129263532   129263603   AIFM1   NO
3   Exon    15643358    15643401    BTD NO
3   Exon    15676931    15677195    BTD NO
3   Exon    15683415    15683564    BTD NO

Trying to create a new column with the Gene name in the first file for the positions between start and end of the second position with respective gene names.

covreage$Gene <- ifelse(covreage$chr == annotation$chr & covreage$pos >= annotation$start & covreage$pos <= annotation$end,annotation$Gene,"NA")

The problem is the second file have the value for file1 pos in range and chr and position should match in both files. The chr can have 23 different values and Pos will have similar values in all of the different chr values. Together chr and position the raw become unique element

The above code gives this error

Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(covreage$chr, annotation$chr) :
  longer object length is not a multiple of shorter object length
3: In covreage$pos >= annotation$start :
  longer object length is not a multiple of shorter object length
4: In covreage$pos <= annotation$end :
  longer object length is not a multiple of shorter object length
shams
  • 152
  • 11
  • Where is `annotation$Chromosome` in your data frame? – Tung Mar 16 '18 at 23:05
  • why tag this with `awk` if you want a solution in `r`? – Ed Morton Mar 17 '18 at 18:36
  • Since you tagged with data.table, maybe you'll want to look into it's non-equi and update joins, which might be able to solve this cleanly, something like `DT1[, gene := DT2[.SD, on=.(chr, start >= Pos, end <= Pos), x.Gene]]` – Frank Mar 17 '18 at 19:41

1 Answers1

0

By evaluating something like covreage$pos >= annotation$start, you're comparing both data.frames row by row, which is not what you want. You want to compare several rows from the first against one row from the second, using some grouping rule R does not know about.

You still get some output because R in general tries to recycle elements as needed:

> 1:6<c(2,6,6) [1] TRUE TRUE TRUE FALSE TRUE FALSE

> 1:5<c(2,6,6) [1] TRUE TRUE TRUE FALSE TRUE Warning message: In 1:5 < c(2, 6, 6) : longer object length is not a multiple of shorter object length

In the first case, no warning is printed because elements are evenly reused; in the second case, that is not possible (because as R says, longer object length is not a multiple of shorter object length), so a warning shows up.

Even though recycling is to be considered an error in the context you presented, R allows it because it may be useful in some situations.

vich
  • 96
  • 1
  • 3