diff options
author | Ken Kellner <ken@kenkellner.com> | 2021-12-07 13:27:55 -0500 |
---|---|---|
committer | Ken Kellner <ken@kenkellner.com> | 2021-12-07 13:27:55 -0500 |
commit | a60e653b9988b86a3cc464813a3f3c9c75dc515b (patch) | |
tree | fcea681adb12191fe2127a2a995868089c2538fe | |
parent | f7c19b0e84e50253a02bf580968357aa0299344c (diff) |
Add random effects vignette
-rw-r--r-- | DESCRIPTION | 4 | ||||
-rw-r--r-- | vignettes/random-effects.Rnw | 233 |
2 files changed, 235 insertions, 2 deletions
diff --git a/DESCRIPTION b/DESCRIPTION index 7d540f3..2a9073b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: unmarked -Version: 1.1.1.9015 -Date: 2021-12-06 +Version: 1.1.1.9016 +Date: 2021-12-07 Type: Package Title: Models for Data from Unmarked Animals Authors@R: c( diff --git a/vignettes/random-effects.Rnw b/vignettes/random-effects.Rnw new file mode 100644 index 0000000..ec75a3a --- /dev/null +++ b/vignettes/random-effects.Rnw @@ -0,0 +1,233 @@ +<<echo=false>>= +options(width=70) +options(continue=" ") +@ + +\documentclass[a4paper]{article} +\usepackage[OT1]{fontenc} +\usepackage{Sweave} +\usepackage{natbib} +%\usepackage{fullpage} +\usepackage[vmargin=1in,hmargin=1in]{geometry} +\bibliographystyle{ecology} + +\usepackage{hyperref} +\hypersetup{ + colorlinks=true, + linkcolor=blue, + urlcolor=cyan, + citecolor=black +} + +\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em} +\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} +\fvset{listparameters={\setlength{\topsep}{0pt}}} +\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} + +%%\VignetteIndexEntry{Random effects in unmarked} + +\title{Experimental Support for Random Effects in unmarked} +\author{Ken Kellner} +\date{December 1, 2021} + + +\begin{document} + +\newcommand{\code}[1]{\texttt{\small{#1}}} +\newcommand{\package}[1]{\textsf{\small{#1}}} + +\maketitle + +\section{Introduction} + +Random effects are often useful when fitting hierarchical ecological models. +For example, if we have many different observers collecting count or presence data, it may be appropriate to include observer as a random effect on the detection process. +If we have multiple years of data at our sites, and are fitting a so-called "stacked" model where site-years are the response data instead of sites, it may be appropriate to include site as a random effect on the state process. +Until recently, including random effects as part of the linear predictor for the state or detection parameter in a model was possible only by fitting the model in a Bayesian framework with e.g. WinBUGS or JAGS. + +The \code{unmarked} package now includes experimental support for fitting models with random effects, via the use of \href{https://kaskr.github.io/adcomp/Introduction.html}{Template Model Builder} (TMB). +TMB uses Laplace approximation to estimate the random effects. +Currently, only a few model types including single-season occupancy (\code{occu}) and $N$-mixture models (\code{pcount}) have random effects support, but more models may be supported in the future. + +\subsection{Caveats} + +The first caveat is that \code{unmarked} can only fit normally-distributed random effects with mean 0 on the link scale, and nested and correlated random effects are not supported. +Second, we have found that estimation of random effects in \code{unmarked} via TMB works well in many cases, but for datasets with small sample sizes or sparse (many 0) observations, estimates are often poor and can result in misleading inference. +We urge \code{unmarked} users to incorporate random effects with caution, and compare results to similar models without random effects. +Some datasets may simply not be appropriate for models with random effects in \code{unmarked}. +In these cases Bayesian methods may be more appropriate. + +\section{Example model with random effects} + +Below, we demonstrate fitting an $N$-mixture model via \code{pcount} to a dataset, including different combinations of random effects. + +\subsection{Simulating a dataset} + +We begin by simulating a count dataset, in which 100 sites have been visited 3 times per year in each of 3 years. +Abundance will be a function of a single fixed-effect covariate, and we will consider site a random effect. +We will simulate detection with a random observer effect. + +First, define the simulation parameters, covariate data, and the random effect values: + +<<>>= +set.seed(35) +nsites <- 100 +nyears <- 3 +nvisits <- 3 + +# Abundance parameters +beta0 <- 0 # Intercept +beta1 <- 1 # fixed covariate slope +sd_site <- 0.5 # SD of site-level random effect +re_site <- rnorm(nsites, 0, sd_site) # simulate random effect + +# Detection parameters +alpha0 <- 0 # Intercept +sd_obs <- 0.3 # SD of observer-level random effect (20 unique observers) +re_obs <- rnorm(20, 0, sd_obs) # simulate random effect + +# Covariates +x <- rnorm(nsites*nyears) # a covariate on abundance +site_id <- rep(1:100, each=3) +obs_id <- sample(1:20, nsites*nyears*nvisits, replace=TRUE) +@ + +Next, calculate $\lambda$ for each site and simulate the actual abundance \code{N}: + +<<fig=TRUE>>= +lambda <- exp(beta0 + beta1*x + # fixed part of linear predictor + re_site[site_id]) # site random effect + +# Generate latent abundance N +N <- rpois(nsites*nyears, lambda) +hist(N) +@ + +Simulate \code{p} by observer, and using \code{N}, simulate the observed counts \code{y}: + +<<>>= +p <- plogis(alpha0 + re_obs[obs_id]) +p <- matrix(p, nrow=nsites*nyears, ncol=nvisits, byrow=TRUE) + +y <- matrix(NA, nsites*nyears, nvisits) + +for (i in 1:(nsites*nyears)){ + for (j in 1:nvisits){ + y[i,j] <- rbinom(1, N[i], p[i,j]) + } +} +@ + +Finally, organize the data into an \code{unmarkedFrame}. +Note that we are specifying our random effect parameters as R factors. + +<<>>= +library(unmarked) +site_covs <- data.frame(x=x, + site_id=factor(as.character(site_id))) +obs_covs <- data.frame(obs_id=factor(as.character(obs_id))) +umf <- unmarkedFramePCount(y=y, siteCovs=site_covs, obsCovs=obs_covs) +@ + +\subsection{Fitting models} + +First we will fit a model without random effects, including only the fixed effect covariate \code{x}. + +<<>>= +mod_x <- pcount(~1~x, umf, K=40) +summary(mod_x) +@ + +This model overestimates the abundance intercept (truth = 0) and underestimates the effect of \code{x} (truth = 1). +Next, we will fit a model with random intercepts by site. +This is specified in the abundance formula, using syntax similar to that found in package \code{lme4}. +To specify random intercepts based on our site covariate \code{site\_id}, add \code{(1|site\_id)} to the abundance formula (the parentheses are required). +You can read this as "the intercepts should be random based on \code{site\_id}". +Note that the fixed effect \code{x} is outside the parentheses in the abundance formula. + +<<>>= +mod_site <- pcount(~1~x+(1|site_id), umf, K=40) +mod_site +@ + +In the summary output \code{unmarked} provides an estimate of the random effect SD, which is similar to the true value (0.5). +The other abundance parameters are also now closer to their true values. +Now we'll fit the model with random detection intercepts by observer, in addition to the site random effect. +As before, we add \code{(1|obs\_id)} to the detection formula. + +<<>>= +mod_obs <- pcount(~1 + (1|obs_id) ~ x + (1|site_id), umf, K=40) +mod_obs +@ + +Both estimated random effect SDs are similar to the "true" values. + +\subsection{Model inference} + +To get more details, including 95\% CIs, on the random effects SDs, use the \code{sigma} function: + +<<>>= +sigma(mod_obs) +@ + +We can also extract the actual random effect values using the \code{randomTerms} function. +We'll extract the values for the abundance model: + +<<>>= +head(randomTerms(mod_obs, "state")) +@ + +Note the they are sorted incorrectly because \code{site\_id} in this example, while numeric, is a factor so R sorts it like a character. +Also note that these values are just the random part of the intercept - to get the complete intercept for each grouping level, we must add the mean intercept value (in this case 0.186). +We can compare these estimates to the true values of the random intercepts. + +<<fig=TRUE>>= +ran <- randomTerms(mod_obs, "state") +ran <- ran[order(as.numeric(ran$Level)),] # sort them correctly +ints <- coef(mod_obs)[1] + ran$Estimate # Calculate the total intercept for each level + +plot(re_site, ints, xlab="Truth", ylab="Estimate", pch=19) +abline(a=0, b=1) +@ + +We can also use \code{predict} on models with random effects. +A new argument is available, \code{re.form}, which specifies if the random effect(s) should be included when calculating the predicted estimate. +By default, \code{re.form=NULL} meaning they are included; to exclude them set \code{re.form=NA}. +These values are a bit confusing but they match the way it is done in \code{lme4}. + +<<>>= +# with random effect +head(predict(mod_obs, "det")) + +# without random effect +head(predict(mod_obs, "det", re.form=NA)) +@ + +In the latter case, all the estimates of $p$ are identical because there are no fixed covariates on $p$ in this model. + +\subsection{More complicated random effects structures} + +It is possible to include multiple random effects on a single parameter; for example to include both observer and site as random effects on $p$, the formula for $p$ would look like this: + +<<eval=FALSE>>= +~1 + (1|obs_id) + (1|site_id) +@ + +Additionally it is possible to have random slopes as well as intercepts. +For example, to also have random slopes for the covariate \code{x} by site, the formula for abundance would be + +<<eval=FALSE>>= +~x + (1 + x || site_id) +@ + +The \code{||} (indicating no correlation estimated between the two random effects) is necessary instead of \code{|}, as \code{unmarked} does not support correlated random effects. + +\section{A note on model selection} + +As you can see above, \code{unmarked} returns an AIC value for models with random effects. +This AIC value is calculated in the normal manner, with the number of parameters equal to the number of fixed parameters plus the number of random effects standard deviations. +There isn't a consensus on how to calculate AIC for models with fixed and random effects. +Thus, it is probably not a good idea to use AIC to compare between models with and without random effects, even though \code{unmarked} will allow you to do so with \code{fitList} and \code{modSel}. + +\end{document} |