diff options
Diffstat (limited to 'vignettes/unmarked.Rmd')
-rw-r--r-- | vignettes/unmarked.Rmd | 254 |
1 files changed, 254 insertions, 0 deletions
diff --git a/vignettes/unmarked.Rmd b/vignettes/unmarked.Rmd new file mode 100644 index 0000000..8c02c5a --- /dev/null +++ b/vignettes/unmarked.Rmd @@ -0,0 +1,254 @@ +--- +title: "Overview of unmarked: an R Package for the Analysis of Data from Unmarked Animals" +author: +- name: Ian Fiske +- name: Richard Chandler +date: February 5, 2019 +bibliography: unmarked.bib +csl: ecology.csl +output: + rmarkdown::html_vignette: + fig_width: 5 + fig_height: 3.5 + number_sections: true + toc: true +vignette: > + %\VignetteIndexEntry{Overview of unmarked} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + +```{r,echo=FALSE} +options(rmarkdown.html_vignette.check_title = FALSE) +``` + +# Abstract + +`unmarked` aims to be a complete environment for the +statistical analysis of data from surveys of unmarked +animals. Currently, the focus is on hierarchical models that +separately model a latent state (or states) and an observation +process. This vignette provides a brief overview of the package - +for a more thorough treatment see @fiskeChandler_2011. + +# Overview of unmarked + +Unmarked provides methods to estimate site occupancy, abundance, and +density of animals (or possibly other organisms/objects) that cannot be +detected with certainty. Numerous models are available that correspond +to specialized survey methods such as temporally replicated surveys, +distance sampling, removal sampling, and double observer +sampling. These data are often associated with metadata related to the +design of the study. For example, in distance sampling, the study +design (line- or point-transect), distance class break points, +transect lengths, and units of measurement need to be accounted for in +the analysis. Unmarked uses S4 classes to store data and metadata in a +way that allows for easy data manipulation, summarization, and model +specification. Table 1 lists the currently implemented models and +their associated fitting functions and data classes. + +```{r, echo=FALSE} +tab1 <- data.frame( + Model=c("Occupancy", "Royle-Nichols", "Point Count", "Distance-sampling", + "Generalized distance-sampling", "Arbitrary multinomial-Poisson", + "Colonization-extinction", "Generalized multinomial-mixture"), + `Fitting Function`=c("occu","occuRN","pcount","distsamp","gdistsamp", + "multinomPois","colext","gmultmix"), + Data=c("unmarkedFrameOccu","unmarkedFrameOccu","unmarkedFramePCount", + "unmarkedFrameDS","unmarkedFrameGDS","unmarkedFrameMPois", + "unmarkedMultFrame","unmarkedFrameGMM"), + Citation=c("@mackenzie_estimating_2002","@royle_estimating_2003", + "@royle_n-mixture_2004","@royle_modeling_2004", + "@chandlerEA_2011","@royle_generalized_2004", + "@mackenzie_estimating_2003","@royle_generalized_2004"), + check.names=FALSE) + +knitr::kable(tab1, format='markdown', align="lccc", + caption="Table 1. Models handled by unmarked.") +``` + +Each data class can be created with a call to the constructor function +of the same name as described in the examples below. + +# Typical unmarked session + +The first step is to import the data into R, which we do below using +the `read.csv` function. Next, the data need to be formatted for +use with a specific model fitting function. This can be accomplished +with a call to the appropriate type of `unmarkedFrame`. For +example, to prepare the data for a single-season site-occupancy +analysis, the function `unmarkedFrameOccu` is used. + + +## Importing and formatting data + +```{r} +library(unmarked) +wt <- read.csv(system.file("csv","widewt.csv", package="unmarked")) +y <- wt[,2:4] +siteCovs <- wt[,c("elev", "forest", "length")] +obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")], + ivel=wt[,c("ivel.1", "ivel.2", "ivel.3")]) +wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs) +summary(wt) +``` + +Alternatively, the convenience function `csvToUMF` can be used + +```{r} +wt <- csvToUMF(system.file("csv","widewt.csv", package="unmarked"), + long = FALSE, type = "unmarkedFrameOccu") +``` + +If not all sites have the same numbers of observations, then manual +importation of data in long format can be tricky. `csvToUMF` +seamlessly handles this situation. + +```{r} +pcru <- csvToUMF(system.file("csv","frog2001pcru.csv", package="unmarked"), + long = TRUE, type = "unmarkedFrameOccu") +``` + +To help stabilize the numerical optimization algorithm, we recommend +standardizing the covariates. + +```{r} +obsCovs(pcru) <- scale(obsCovs(pcru)) +``` + +## Fitting models + +Occupancy models can then be fit with the occu() function: + +```{r} +fm1 <- occu(~1 ~1, pcru) +fm2 <- occu(~ MinAfterSunset + Temperature ~ 1, pcru) +fm2 +``` + +Here, we have specified that the detection process is modeled with the +`MinAfterSunset` and `Temperature` covariates. No covariates are +specified for occupancy here. See `?occu` for more details. + + +## Back-transforming parameter estimates + +`unmarked` fitting functions return `unmarkedFit` objects which can be +queried to investigate the model fit. Variables can be +back-transformed to the unconstrained scale using `backTransform`. +Standard errors are computed using the delta method. + +```{r} +backTransform(fm2, 'state') +``` + +The expected probability that a site was +occupied is 0.823. This estimate applies to the hypothetical +population of all possible sites, not the sites found in our sample. +For a good discussion of population-level vs finite-sample inference, +see @royle_dorazio:2008 page 117. Note also that finite-sample +quantities can be computed in `unmarked` using empirical Bayes +methods as demonstrated at the end of this document. + +Back-transforming the estimate of $\psi$ was easy because there were +no covariates. Because the detection component was modeled with +covariates, $p$ is a function, not just a scalar quantity, and so we +need to be provide values of our covariates to obtain an +estimate of $p$. Here, we request +the probability of detection given a site is occupied and all +covariates are set to 0. + +```{r} +backTransform(linearComb(fm2, coefficients = c(1,0,0), type = 'det')) +``` + +Thus, we can say that the expected probability of detection was 0.552 +when time of day and temperature are fixed at their mean value. A +`predict` method also exists, which can be used to obtain estimates of +parameters at specific covariate values. + +```{r} +newData <- data.frame(MinAfterSunset = 0, Temperature = -2:2) +round(predict(fm2, type = 'det', newdata = newData, appendData=TRUE), 2) +``` + +Confidence intervals are requested with `confint`, using either the +asymptotic normal approximation or profiling. + +```{r, eval=FALSE} +confint(fm2, type='det') +confint(fm2, type='det', method = "profile") +``` + +```{r, echo=FALSE} +confint(fm2, type='det') +nul <- capture.output(ci <- confint(fm2, type='det', method = "profile")) +ci +``` + +## Model selection and model fit + +Model selection and multi-model inference can be implemented after +organizing models using the `fitList` function. + +```{r} +fms <- fitList('psi(.)p(.)' = fm1, 'psi(.)p(Time+Temp)' = fm2) +modSel(fms) +predict(fms, type='det', newdata = newData) +``` + + +The parametric bootstrap can be used to check the adequacy of model fit. +Here we use a $\chi^2$ statistic appropriate for binary data. + +```{r, warning=FALSE} +chisq <- function(fm) { + umf <- fm@data + y <- umf@y + y[y>1] <- 1 + sr <- fm@sitesRemoved + if(length(sr)>0) + y <- y[-sr,,drop=FALSE] + fv <- fitted(fm, na.rm=TRUE) + y[is.na(fv)] <- NA + sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE) + } + +(pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE)) +``` + +We fail to reject the null hypothesis, and conclude that the model fit +is adequate. + +## Derived parameters and empirical Bayes methods + +The `parboot` function can be also be used to compute confidence +intervals for estimates of derived parameters, such as the proportion +of $N$ sites occupied $\mbox{PAO} = \frac{\sum_i z_i}{N}$ where $z_i$ is the true +occurrence state at site $i$, which is unknown at sites where no individuals +were detected. The `colext` vignette shows examples of using +`parboot` to obtain confidence intervals for such derived +quantities. An alternative way achieving this goal is to use empirical Bayes +methods, which were introduced in `unmarked` version 0.9-5. These methods estimate +the posterior distribution of the latent variable given the data and +the estimates of the fixed effects (the MLEs). The mean or the mode of +the estimated posterior distibution is referred to as the empirical +best unbiased predictor (EBUP), which in `unmarked` can be +obtained by applying the `bup` function to the estimates of the +posterior distributions returned by the `ranef` function. The +following code returns an estimate of PAO using EBUP. + +```{r} +re <- ranef(fm2) +EBUP <- bup(re, stat="mode") +sum(EBUP) / numSites(pcru) +``` + +Note that this is similar, but slightly lower than the +population-level estimate of $\psi$ obtained above. + +A plot method also exists for objects returned by `ranef`, but +distributions of binary variables are not so pretty. Try it out on a +fitted abundance model instead. + +# References |