2

In R, I perform dunn's test. The function I use has no option to group the input variables by their statistical significant differences. However, this is what I am genuinely interested in, so I tried to write my own function. Unfortunately, I am not able to wrap my head around it. Perhaps someone can help.

I use the airquality dataset that comes with R as an example. The result that I need could look somewhat like this:

> library (tidyverse)
> ozone_summary <- airquality %>% group_by(Month) %>% dplyr::summarize(Mean = mean(Ozone, na.rm=TRUE))

# A tibble: 5 x 2
  Month  Mean
  <int> <dbl>
1     5  23.6
2     6  29.4
3     7  59.1
4     8  60.0
5     9  31.4

When I run the dunn.test, I get the following:

> dunn.test::dunn.test (airquality$Ozone, airquality$Month, method = "bh", altp = T)


Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 29.2666, df = 4, p-value = 0


                           Comparison of x by group                            
                             (Benjamini-Hochberg)                              
Col Mean-|
Row Mean |          5          6          7          8
---------+--------------------------------------------
       6 |  -0.925158
         |     0.4436
         |
       7 |  -4.419470  -2.244208
         |    0.0001*    0.0496*
         |
       8 |  -4.132813  -2.038635   0.286657
         |    0.0002*     0.0691     0.8604
         |
       9 |  -1.321202   0.002538   3.217199   2.922827
         |     0.2663     0.9980    0.0043*    0.0087*

alpha = 0.05
Reject Ho if p <= alpha

From this result, I deduce that May differs from July and August, June differs from July (but not from August) and so on. So I'd like to append significantly differing groups to my results table:

# A tibble: 5 x 3
  Month  Mean Group
  <int> <dbl> <chr>
1     5  23.6 a    
2     6  29.4 ac   
3     7  59.1 b    
4     8  60.0 bc   
5     9  31.4 a  

While I did this by hand, I suppose it must be possible to automate this process. However, I don't find a good starting point. I created a dataframe containing all comparisons:

> ozone_differences <- dunn.test::dunn.test (airquality$Ozone, airquality$Month, method = "bh", altp = T)
> ozone_differences <- data.frame ("P" = ozone_differences$altP.adjusted, "Compare" = ozone_differences$comparisons)

              P Compare
1  4.436043e-01   5 - 6
2  9.894296e-05   5 - 7
3  4.963804e-02   6 - 7
4  1.791748e-04   5 - 8
5  6.914403e-02   6 - 8
6  8.604164e-01   7 - 8
7  2.663342e-01   5 - 9
8  9.979745e-01   6 - 9
9  4.314957e-03   7 - 9
10 8.671708e-03   8 - 9

I thought that a function iterating through this data frame and using a selection variable to choose the right letter from letters() might work. However, I cannot even think of a starting point, because changing numbers of rows have to considered at the same time...

Perhaps someone has a good idea?

yenats
  • 531
  • 1
  • 3
  • 16

2 Answers2

1

Perhaps you could look into cldList() function from rcompanion library, you can pipe the res results from the output od dunnTest() and create a table that specifies the compact letter display comparison per group.

0

Following the advice of @TylerRuddenfort , the following code will work. The first cld is created with rcompanion::cldList, and the second directly uses multcompView::multcompLetters. Note that to use multcompLetters, the spaces have to be removed from the names of the comparisons.

Here, I have used FSA:dunnTest for the Dunn test (1964).

In general, I recommend ordering groups by e.g. median or mean before running e.g. dunnTest if you plan on using a cld, so that the cld comes out in a sensible order.

library (tidyverse)
ozone_summary <- airquality %>% group_by(Month) %>% dplyr::summarize(Mean = mean(Ozone, na.rm=TRUE))

library(FSA)

Result = dunnTest(airquality$Ozone, airquality$Month, method = "bh")$res


### Use cldList()

library(rcompanion)

cldList(P.adj ~ Comparison, data=Result)

### Use multcompView

library(multcompView)

X = Result$P.adj <= 0.05

names(X) = gsub(" ",  "",  Result$Comparison)

multcompLetters(X)
Sal Mangiafico
  • 440
  • 3
  • 8