Add random effects vignette

author: Ken Kellner <ken@kenkellner.com> 2021-12-07 13:27:55 -0500
committer: Ken Kellner <ken@kenkellner.com> 2021-12-07 13:27:55 -0500
commit: a60e653b9988b86a3cc464813a3f3c9c75dc515b (patch)
tree: fcea681adb12191fe2127a2a995868089c2538fe
parent: f7c19b0e84e50253a02bf580968357aa0299344c (diff)
2 files changed, 235 insertions, 2 deletions
diff --git a/DESCRIPTION b/DESCRIPTION
index 7d540f3..2a9073b 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: unmarked
-Version: 1.1.1.9015
-Date: 2021-12-06
+Version: 1.1.1.9016
+Date: 2021-12-07
 Type: Package
 Title: Models for Data from Unmarked Animals
 Authors@R: c(
diff --git a/vignettes/random-effects.Rnw b/vignettes/random-effects.Rnw
new file mode 100644
index 0000000..ec75a3a
--- /dev/null
+++ b/vignettes/random-effects.Rnw
@@ -0,0 +1,233 @@
+<<echo=false>>=
+options(width=70)
+options(continue=" ")
+@
+
+\documentclass[a4paper]{article}
+\usepackage[OT1]{fontenc}
+\usepackage{Sweave}
+\usepackage{natbib}
+%\usepackage{fullpage}
+\usepackage[vmargin=1in,hmargin=1in]{geometry}
+\bibliographystyle{ecology}
+
+\usepackage{hyperref}
+\hypersetup{
+    colorlinks=true,
+    linkcolor=blue,     
+    urlcolor=cyan,
+    citecolor=black
+}
+
+\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em}
+\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em}
+\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
+\fvset{listparameters={\setlength{\topsep}{0pt}}}
+\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}
+
+%%\VignetteIndexEntry{Random effects in unmarked}
+
+\title{Experimental Support for Random Effects in unmarked}
+\author{Ken Kellner}
+\date{December 1, 2021}
+
+
+\begin{document}
+
+\newcommand{\code}[1]{\texttt{\small{#1}}}
+\newcommand{\package}[1]{\textsf{\small{#1}}}
+
+\maketitle
+
+\section{Introduction}
+
+Random effects are often useful when fitting hierarchical ecological models.
+For example, if we have many different observers collecting count or presence data, it may be appropriate to include observer as a random effect on the detection process.
+If we have multiple years of data at our sites, and are fitting a so-called "stacked" model where site-years are the response data instead of sites, it may be appropriate to include site as a random effect on the state process.
+Until recently, including random effects as part of the linear predictor for the state or detection parameter in a model was possible only by fitting the model in a Bayesian framework with e.g. WinBUGS or JAGS.
+
+The \code{unmarked} package now includes experimental support for fitting models with random effects, via the use of \href{https://kaskr.github.io/adcomp/Introduction.html}{Template Model Builder} (TMB).
+TMB uses Laplace approximation to estimate the random effects.
+Currently, only a few model types including single-season occupancy (\code{occu}) and $N$-mixture models (\code{pcount}) have random effects support, but more models may be supported in the future.
+
+\subsection{Caveats}
+
+The first caveat is that \code{unmarked} can only fit normally-distributed random effects with mean 0 on the link scale, and nested and correlated random effects are not supported.
+Second, we have found that estimation of random effects in \code{unmarked} via TMB works well in many cases, but for datasets with small sample sizes or sparse (many 0) observations, estimates are often poor and can result in misleading inference.
+We urge \code{unmarked} users to incorporate random effects with caution, and compare results to similar models without random effects.
+Some datasets may simply not be appropriate for models with random effects in \code{unmarked}.
+In these cases Bayesian methods may be more appropriate.
+
+\section{Example model with random effects}
+
+Below, we demonstrate fitting an $N$-mixture model via \code{pcount} to a dataset, including different combinations of random effects.
+
+\subsection{Simulating a dataset}
+
+We begin by simulating a count dataset, in which 100 sites have been visited 3 times per year in each of 3 years.
+Abundance will be a function of a single fixed-effect covariate, and we will consider site a random effect.
+We will simulate detection with a random observer effect.
+
+First, define the simulation parameters, covariate data, and the random effect values:
+
+<<>>=
+set.seed(35)
+nsites <- 100
+nyears <- 3
+nvisits <- 3
+
+# Abundance parameters
+beta0 <- 0 # Intercept
+beta1 <- 1 # fixed covariate slope
+sd_site <- 0.5 # SD of site-level random effect
+re_site <- rnorm(nsites, 0, sd_site) # simulate random effect
+
+# Detection parameters
+alpha0 <- 0 # Intercept
+sd_obs <- 0.3 # SD of observer-level random effect (20 unique observers)
+re_obs <- rnorm(20, 0, sd_obs) # simulate random effect
+
+# Covariates
+x <- rnorm(nsites*nyears) # a covariate on abundance
+site_id <- rep(1:100, each=3)
+obs_id <- sample(1:20, nsites*nyears*nvisits, replace=TRUE)
+@
+
+Next, calculate $\lambda$ for each site and simulate the actual abundance \code{N}:
+
+<<fig=TRUE>>=
+lambda <- exp(beta0 + beta1*x + # fixed part of linear predictor
+              re_site[site_id])  # site random effect
+
+# Generate latent abundance N
+N <- rpois(nsites*nyears, lambda) 
+hist(N)
+@
+
+Simulate \code{p} by observer, and using \code{N}, simulate the observed counts \code{y}:
+
+<<>>=
+p <- plogis(alpha0 + re_obs[obs_id])
+p <- matrix(p, nrow=nsites*nyears, ncol=nvisits, byrow=TRUE)
+
+y <- matrix(NA, nsites*nyears, nvisits)
+
+for (i in 1:(nsites*nyears)){
+  for (j in 1:nvisits){
+    y[i,j] <- rbinom(1, N[i], p[i,j])
+  }
+}
+@
+
+Finally, organize the data into an \code{unmarkedFrame}.
+Note that we are specifying our random effect parameters as R factors.
+
+<<>>=
+library(unmarked)
+site_covs <- data.frame(x=x,
+                        site_id=factor(as.character(site_id)))
+obs_covs <- data.frame(obs_id=factor(as.character(obs_id)))
+umf <- unmarkedFramePCount(y=y, siteCovs=site_covs, obsCovs=obs_covs)
+@
+
+\subsection{Fitting models}
+
+First we will fit a model without random effects, including only the fixed effect covariate \code{x}.
+
+<<>>=
+mod_x <- pcount(~1~x, umf, K=40)
+summary(mod_x)
+@
+
+This model overestimates the abundance intercept (truth = 0) and underestimates the effect of \code{x} (truth = 1).
+Next, we will fit a model with random intercepts by site.
+This is specified in the abundance formula, using syntax similar to that found in package \code{lme4}.
+To specify random intercepts based on our site covariate \code{site\_id}, add \code{(1|site\_id)} to the abundance formula (the parentheses are required).
+You can read this as "the intercepts should be random based on \code{site\_id}".
+Note that the fixed effect \code{x} is outside the parentheses in the abundance formula.
+
+<<>>=
+mod_site <- pcount(~1~x+(1|site_id), umf, K=40)
+mod_site
+@
+
+In the summary output \code{unmarked} provides an estimate of the random effect SD, which is similar to the true value (0.5).
+The other abundance parameters are also now closer to their true values.
+Now we'll fit the model with random detection intercepts by observer, in addition to the site random effect.
+As before, we add \code{(1|obs\_id)} to the detection formula.
+
+<<>>=
+mod_obs <- pcount(~1 + (1|obs_id) ~ x + (1|site_id), umf, K=40)
+mod_obs
+@
+
+Both estimated random effect SDs are similar to the "true" values.
+
+\subsection{Model inference}
+
+To get more details, including 95\% CIs, on the random effects SDs, use the \code{sigma} function:
+
+<<>>=
+sigma(mod_obs)
+@
+
+We can also extract the actual random effect values using the \code{randomTerms} function.
+We'll extract the values for the abundance model:
+
+<<>>=
+head(randomTerms(mod_obs, "state"))
+@
+
+Note the they are sorted incorrectly because \code{site\_id} in this example, while numeric, is a factor so R sorts it like a character.
+Also note that these values are just the random part of the intercept - to get the complete intercept for each grouping level, we must add the mean intercept value (in this case 0.186).
+We can compare these estimates to the true values of the random intercepts.
+
+<<fig=TRUE>>=
+ran <- randomTerms(mod_obs, "state")
+ran <- ran[order(as.numeric(ran$Level)),] # sort them correctly
+ints <- coef(mod_obs)[1] + ran$Estimate # Calculate the total intercept for each level
+
+plot(re_site, ints, xlab="Truth", ylab="Estimate", pch=19)
+abline(a=0, b=1)
+@
+
+We can also use \code{predict} on models with random effects.
+A new argument is available, \code{re.form}, which specifies if the random effect(s) should be included when calculating the predicted estimate.
+By default, \code{re.form=NULL} meaning they are included; to exclude them set \code{re.form=NA}.
+These values are a bit confusing but they match the way it is done in \code{lme4}.
+
+<<>>=
+# with random effect
+head(predict(mod_obs, "det"))
+
+# without random effect
+head(predict(mod_obs, "det", re.form=NA))
+@
+
+In the latter case, all the estimates of $p$ are identical because there are no fixed covariates on $p$ in this model.
+
+\subsection{More complicated random effects structures}
+
+It is possible to include multiple random effects on a single parameter; for example to include both observer and site as random effects on $p$, the formula for $p$ would look like this:
+
+<<eval=FALSE>>=
+~1 + (1|obs_id) + (1|site_id)
+@
+
+Additionally it is possible to have random slopes as well as intercepts.
+For example, to also have random slopes for the covariate \code{x} by site, the formula for abundance would be
+
+<<eval=FALSE>>=
+~x + (1 + x || site_id)
+@
+
+The \code{||} (indicating no correlation estimated between the two random effects) is necessary instead of \code{|}, as \code{unmarked} does not support correlated random effects.
+
+\section{A note on model selection}
+
+As you can see above, \code{unmarked} returns an AIC value for models with random effects.
+This AIC value is calculated in the normal manner, with the number of parameters equal to the number of fixed parameters plus the number of random effects standard deviations.
+There isn't a consensus on how to calculate AIC for models with fixed and random effects.
+Thus, it is probably not a good idea to use AIC to compare between models with and without random effects, even though \code{unmarked} will allow you to do so with \code{fitList} and \code{modSel}.
+
+\end{document}
author	Ken Kellner <ken@kenkellner.com>	2021-12-07 13:27:55 -0500
committer	Ken Kellner <ken@kenkellner.com>	2021-12-07 13:27:55 -0500
commit	a60e653b9988b86a3cc464813a3f3c9c75dc515b (patch)
tree	fcea681adb12191fe2127a2a995868089c2538fe
parent	f7c19b0e84e50253a02bf580968357aa0299344c (diff)