3

I am a bit puzzled by the following. I have two formulas and would like to check whether they are the same. Here I expect to get FALSE returned.

fm1 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel + trede)
fm2 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel)
fm1 == fm2
#> [1] TRUE

identical(fm1, fm2)
#> [1] FALSE

What is the reason that fm1 == fm2 returns TRUE?

Created on 2021-12-17 by the reprex package (v2.0.1)

mharinga
  • 1,708
  • 10
  • 23
  • Does this answer your question? [What's the difference between identical(x, y) and isTRUE(all.equal(x, y))?](https://stackoverflow.com/questions/3395696/whats-the-difference-between-identicalx-y-and-istrueall-equalx-y) – Felix Phl Dec 17 '21 at 12:14

2 Answers2

4

== is designed to compare values in atomic vectors, not formulars.

Furthermore, see the following example from ?== :

x1 <- 0.5 - 0.3
x2 <- 0.3 - 0.1
x1 == x2                   # FALSE on most machines
isTRUE(all.equal(x1, x2))  # TRUE everywhere

Applied to your example you can find:

    > fm1 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel + trede)
> fm2 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel)
> fm1 == fm2
[1] TRUE
> 
> all.equal(fm1, fm2)
[1] "formulas differ in contents"
> isTRUE(all.equal(fm1,fm2))
[1] FALSE

But apparently reducing the number of predictors returns the expected result. It just illustrates that == should not be used for this type of comparison as its behaviour is not coherent:

> fm1 <- formula(schades ~ termijn + zipcode + provincie)
> fm2 <- formula(schades ~ termijn + zipcode)
> fm1 == fm2
[1] FALSE
> isTRUE(all.equal(fm1,fm2))
[1] FALSE
Felix Phl
  • 383
  • 1
  • 13
  • 1
    It seem to be an issue that `==` gives unexpected (incorrect) answers. Perhaps would be better that r returns an error for this comparison – user20650 Dec 17 '21 at 12:35
2

From ?Comparison/Details: "...Language objects such as symbols and calls are deparsed to character strings before comparison..."

So, what happens prior to comparing is:

deparsefm1 = deparse(fm1)
deparsefm2 = deparse(fm2)

and, then, the deparsed language objects are compared.

The interesting thing here, though, is that R, internally, (i) selects only the first element of the deparsed object to proceed into comparison and (ii) does not, by default, use deparse options that limit the number of elements produced (while deparse offers that flexibility -- e.g. see how deparse(fm1, 100) == deparse(fm2, 100) behaves). So, while, we would expect

deparse(fm1) == deparse(fm2)
#[1]  TRUE FALSE

we, actually, get

deparse(fm1)[[1]] == deparse(fm2)[[1]]
#[1] TRUE

I assume the "why that happens" is a good question.

alexis_laz
  • 12,884
  • 4
  • 27
  • 37