aboutsummaryrefslogtreecommitdiff
path: root/vignettes/unmarked.Rmd
diff options
context:
space:
mode:
Diffstat (limited to 'vignettes/unmarked.Rmd')
-rw-r--r--vignettes/unmarked.Rmd254
1 files changed, 254 insertions, 0 deletions
diff --git a/vignettes/unmarked.Rmd b/vignettes/unmarked.Rmd
new file mode 100644
index 0000000..8c02c5a
--- /dev/null
+++ b/vignettes/unmarked.Rmd
@@ -0,0 +1,254 @@
+---
+title: "Overview of unmarked: an R Package for the Analysis of Data from Unmarked Animals"
+author:
+- name: Ian Fiske
+- name: Richard Chandler
+date: February 5, 2019
+bibliography: unmarked.bib
+csl: ecology.csl
+output:
+ rmarkdown::html_vignette:
+ fig_width: 5
+ fig_height: 3.5
+ number_sections: true
+ toc: true
+vignette: >
+ %\VignetteIndexEntry{Overview of unmarked}
+ %\VignetteEngine{knitr::rmarkdown}
+ \usepackage[utf8]{inputenc}
+---
+
+```{r,echo=FALSE}
+options(rmarkdown.html_vignette.check_title = FALSE)
+```
+
+# Abstract
+
+`unmarked` aims to be a complete environment for the
+statistical analysis of data from surveys of unmarked
+animals. Currently, the focus is on hierarchical models that
+separately model a latent state (or states) and an observation
+process. This vignette provides a brief overview of the package -
+for a more thorough treatment see @fiskeChandler_2011.
+
+# Overview of unmarked
+
+Unmarked provides methods to estimate site occupancy, abundance, and
+density of animals (or possibly other organisms/objects) that cannot be
+detected with certainty. Numerous models are available that correspond
+to specialized survey methods such as temporally replicated surveys,
+distance sampling, removal sampling, and double observer
+sampling. These data are often associated with metadata related to the
+design of the study. For example, in distance sampling, the study
+design (line- or point-transect), distance class break points,
+transect lengths, and units of measurement need to be accounted for in
+the analysis. Unmarked uses S4 classes to store data and metadata in a
+way that allows for easy data manipulation, summarization, and model
+specification. Table 1 lists the currently implemented models and
+their associated fitting functions and data classes.
+
+```{r, echo=FALSE}
+tab1 <- data.frame(
+ Model=c("Occupancy", "Royle-Nichols", "Point Count", "Distance-sampling",
+ "Generalized distance-sampling", "Arbitrary multinomial-Poisson",
+ "Colonization-extinction", "Generalized multinomial-mixture"),
+ `Fitting Function`=c("occu","occuRN","pcount","distsamp","gdistsamp",
+ "multinomPois","colext","gmultmix"),
+ Data=c("unmarkedFrameOccu","unmarkedFrameOccu","unmarkedFramePCount",
+ "unmarkedFrameDS","unmarkedFrameGDS","unmarkedFrameMPois",
+ "unmarkedMultFrame","unmarkedFrameGMM"),
+ Citation=c("@mackenzie_estimating_2002","@royle_estimating_2003",
+ "@royle_n-mixture_2004","@royle_modeling_2004",
+ "@chandlerEA_2011","@royle_generalized_2004",
+ "@mackenzie_estimating_2003","@royle_generalized_2004"),
+ check.names=FALSE)
+
+knitr::kable(tab1, format='markdown', align="lccc",
+ caption="Table 1. Models handled by unmarked.")
+```
+
+Each data class can be created with a call to the constructor function
+of the same name as described in the examples below.
+
+# Typical unmarked session
+
+The first step is to import the data into R, which we do below using
+the `read.csv` function. Next, the data need to be formatted for
+use with a specific model fitting function. This can be accomplished
+with a call to the appropriate type of `unmarkedFrame`. For
+example, to prepare the data for a single-season site-occupancy
+analysis, the function `unmarkedFrameOccu` is used.
+
+
+## Importing and formatting data
+
+```{r}
+library(unmarked)
+wt <- read.csv(system.file("csv","widewt.csv", package="unmarked"))
+y <- wt[,2:4]
+siteCovs <- wt[,c("elev", "forest", "length")]
+obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")],
+ ivel=wt[,c("ivel.1", "ivel.2", "ivel.3")])
+wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs)
+summary(wt)
+```
+
+Alternatively, the convenience function `csvToUMF` can be used
+
+```{r}
+wt <- csvToUMF(system.file("csv","widewt.csv", package="unmarked"),
+ long = FALSE, type = "unmarkedFrameOccu")
+```
+
+If not all sites have the same numbers of observations, then manual
+importation of data in long format can be tricky. `csvToUMF`
+seamlessly handles this situation.
+
+```{r}
+pcru <- csvToUMF(system.file("csv","frog2001pcru.csv", package="unmarked"),
+ long = TRUE, type = "unmarkedFrameOccu")
+```
+
+To help stabilize the numerical optimization algorithm, we recommend
+standardizing the covariates.
+
+```{r}
+obsCovs(pcru) <- scale(obsCovs(pcru))
+```
+
+## Fitting models
+
+Occupancy models can then be fit with the occu() function:
+
+```{r}
+fm1 <- occu(~1 ~1, pcru)
+fm2 <- occu(~ MinAfterSunset + Temperature ~ 1, pcru)
+fm2
+```
+
+Here, we have specified that the detection process is modeled with the
+`MinAfterSunset` and `Temperature` covariates. No covariates are
+specified for occupancy here. See `?occu` for more details.
+
+
+## Back-transforming parameter estimates
+
+`unmarked` fitting functions return `unmarkedFit` objects which can be
+queried to investigate the model fit. Variables can be
+back-transformed to the unconstrained scale using `backTransform`.
+Standard errors are computed using the delta method.
+
+```{r}
+backTransform(fm2, 'state')
+```
+
+The expected probability that a site was
+occupied is 0.823. This estimate applies to the hypothetical
+population of all possible sites, not the sites found in our sample.
+For a good discussion of population-level vs finite-sample inference,
+see @royle_dorazio:2008 page 117. Note also that finite-sample
+quantities can be computed in `unmarked` using empirical Bayes
+methods as demonstrated at the end of this document.
+
+Back-transforming the estimate of $\psi$ was easy because there were
+no covariates. Because the detection component was modeled with
+covariates, $p$ is a function, not just a scalar quantity, and so we
+need to be provide values of our covariates to obtain an
+estimate of $p$. Here, we request
+the probability of detection given a site is occupied and all
+covariates are set to 0.
+
+```{r}
+backTransform(linearComb(fm2, coefficients = c(1,0,0), type = 'det'))
+```
+
+Thus, we can say that the expected probability of detection was 0.552
+when time of day and temperature are fixed at their mean value. A
+`predict` method also exists, which can be used to obtain estimates of
+parameters at specific covariate values.
+
+```{r}
+newData <- data.frame(MinAfterSunset = 0, Temperature = -2:2)
+round(predict(fm2, type = 'det', newdata = newData, appendData=TRUE), 2)
+```
+
+Confidence intervals are requested with `confint`, using either the
+asymptotic normal approximation or profiling.
+
+```{r, eval=FALSE}
+confint(fm2, type='det')
+confint(fm2, type='det', method = "profile")
+```
+
+```{r, echo=FALSE}
+confint(fm2, type='det')
+nul <- capture.output(ci <- confint(fm2, type='det', method = "profile"))
+ci
+```
+
+## Model selection and model fit
+
+Model selection and multi-model inference can be implemented after
+organizing models using the `fitList` function.
+
+```{r}
+fms <- fitList('psi(.)p(.)' = fm1, 'psi(.)p(Time+Temp)' = fm2)
+modSel(fms)
+predict(fms, type='det', newdata = newData)
+```
+
+
+The parametric bootstrap can be used to check the adequacy of model fit.
+Here we use a $\chi^2$ statistic appropriate for binary data.
+
+```{r, warning=FALSE}
+chisq <- function(fm) {
+ umf <- fm@data
+ y <- umf@y
+ y[y>1] <- 1
+ sr <- fm@sitesRemoved
+ if(length(sr)>0)
+ y <- y[-sr,,drop=FALSE]
+ fv <- fitted(fm, na.rm=TRUE)
+ y[is.na(fv)] <- NA
+ sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE)
+ }
+
+(pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE))
+```
+
+We fail to reject the null hypothesis, and conclude that the model fit
+is adequate.
+
+## Derived parameters and empirical Bayes methods
+
+The `parboot` function can be also be used to compute confidence
+intervals for estimates of derived parameters, such as the proportion
+of $N$ sites occupied $\mbox{PAO} = \frac{\sum_i z_i}{N}$ where $z_i$ is the true
+occurrence state at site $i$, which is unknown at sites where no individuals
+were detected. The `colext` vignette shows examples of using
+`parboot` to obtain confidence intervals for such derived
+quantities. An alternative way achieving this goal is to use empirical Bayes
+methods, which were introduced in `unmarked` version 0.9-5. These methods estimate
+the posterior distribution of the latent variable given the data and
+the estimates of the fixed effects (the MLEs). The mean or the mode of
+the estimated posterior distibution is referred to as the empirical
+best unbiased predictor (EBUP), which in `unmarked` can be
+obtained by applying the `bup` function to the estimates of the
+posterior distributions returned by the `ranef` function. The
+following code returns an estimate of PAO using EBUP.
+
+```{r}
+re <- ranef(fm2)
+EBUP <- bup(re, stat="mode")
+sum(EBUP) / numSites(pcru)
+```
+
+Note that this is similar, but slightly lower than the
+population-level estimate of $\psi$ obtained above.
+
+A plot method also exists for objects returned by `ranef`, but
+distributions of binary variables are not so pretty. Try it out on a
+fitted abundance model instead.
+
+# References