Analyzing proportions with the Arrington et al. 2002 example

Arrington, Winemiller, Loftus, & Akin (2002) published a data set available from the web. It presents species of fish and what proportion of them were empty stomached when catched. The dataset contained 36000+ catches, which where identified by their Location (Africa, North America, rest of America), by their Trophism (their diet, Detritivore, Invertivore, Omnivore, Piscivore) and by the moment of feeding (Diel: Diurnal or Nocturnal).

The compiled scores can be consulted with

ArringtonEtAl2002
##                 Location    Trophism      Diel    s    n
## 1                 Africa Detritivore   Diurnal   16  217
## 2                 Africa Invertivore   Diurnal   76  498
## 3                 Africa Invertivore Nocturnal   55  430
## 4                 Africa    Omnivore   Diurnal    2   87
## 5                 Africa   Piscivore   Diurnal  673  989
## 6                 Africa   Piscivore Nocturnal  221  525
## 7  Central/South America Detritivore   Diurnal   68 1589
## 8  Central/South America Detritivore Nocturnal    9  318
## 9  Central/South America Invertivore   Diurnal  706 7452
## 10 Central/South America Invertivore Nocturnal  486 2101
## 11 Central/South America    Omnivore   Diurnal  293 6496
## 12 Central/South America    Omnivore Nocturnal   82  203
## 13 Central/South America   Piscivore   Diurnal 1275 5226
## 14 Central/South America   Piscivore Nocturnal  109  824
## 15         North America Detritivore   Diurnal  142 1741
## 16         North America Invertivore   Diurnal  525 3368
## 17         North America Invertivore Nocturnal  231 1539
## 18         North America    Omnivore   Diurnal  210 1843
## 19         North America    Omnivore Nocturnal    7   38
## 20         North America   Piscivore   Diurnal  536 1289
## 21         North America   Piscivore Nocturnal   19  102

One first difficulty with this dataset is that some of the cells are missing (e.g., African fish that are Detrivore during the night). As is the case for other sorts of analyses (e.g., ANOVAs), data with missing cells cannot be analyzed because the error terms cannot be computed.

One solution adopted by Warton & Hui (2011) was to impute the missing value. We are not aware if this is an adequate solution, and if so, what imputation would be acceptable. Consider the following with adequate care.

Warton imputed the missing cells with a very small proportion. In ANOPA, both the proportions and the group sizes are required. We implemented a procedure that impute a count of 0.05 (fractional counts are not possible from observations, but are not forbidden in ANOPA) obtained from a single observation.

Consult the default option with

getOption("ANOPA.zeros")
## [1] 0.05 1.00

The analysis is obtained with

w <- anopa( {s; n} ~  Location * Diel * Trophism, ArringtonEtAl2002)

The fyi message lets you know that cells are missing; the Warning message lets you know that these cells were imputed (you can suppress messages with options("ANOPA.feedback"="none").

To see the result, use summary(w) (which shows the corrected and uncorrected statistics) or uncorrected(w) (as the sample is quite large, the correction will be immaterial…),

uncorrected(w)
##                              MS  df        F   pvalue
## Location               0.027449   2 0.961802 0.382203
## Diel                   0.029715   1 1.041227 0.307536
## Trophism               0.095656   3 3.351781 0.018102
## Location:Diel          0.005277   2 0.184900 0.831187
## Location:Trophism      0.029485   6 1.033146 0.401285
## Diel:Trophism          0.073769   3 2.584868 0.051365
## Location:Diel:Trophism 0.011297   6 0.395837 0.882184
## Error(between)         0.028539 Inf

These suggests an interaction Diel : Trophism close to significant.

You can easily make a plot with all the factors using

anopaPlot(w)
**Figure 1**. The proportions in the Arrington et al. 2002 data. Error bars show difference-adjusted 95% confidence intervals.

Figure 1. The proportions in the Arrington et al. 2002 data. Error bars show difference-adjusted 95% confidence intervals.

The missing cells are absent from the plot. To highlight the interaction, restrict the plot to

anopaPlot(w, ~ Trophism * Location)
**Figure 1**. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals.

Figure 1. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals.

which shows clearly massive difference between Trophism, and small differences between Omnivorous and Piscivorous fishes with regards to Location.

This can be confirmed by examining simple effects (a.k.a. expected marginal analyzes):

e <- emProportions( w, ~ Location * Trophism | Diel  ) 

(but it will have to wait for the next version of ANOPA ;-)

References

Arrington, D. A., Winemiller, K. O., Loftus, W. F., & Akin, S. (2002). How often do fishes “run on empty”? Ecology, 83(8), 2145–2151. https://doi.org/10.1890/0012-9658(2002)083[2145:HODFRO]2.0.CO;2
Warton, D. I., & Hui, F. K. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92, 3–10. https://doi.org/10.1890/10-0340.1