---
title: Comparing ubms and JAGS
date: 2020-09-11
output: 
  html_document:
      css: style.css
      highlight: pygments
---

```{r, echo=FALSE}
knitr::opts_chunk$set(message=FALSE, warning=FALSE)
set.seed(456)
```

# Introduction

[ubms](https://kenkellner.com/blog/ubms-vignette.html) is an R package for fitting models of wildlife occurrence and abundance to with datasets where animals are not individually marked.
It provides a nearly identical interface to the popular [unmarked](https://cran.r-project.org/web/packages/unmarked/index.html) package.

One of the key features of `ubms` is the ability to include random effects in model formulas.
This is not possible in `unmarked`, but it is possible with custom models using [JAGS](http://mcmc-jags.sourceforge.net/).
I was curious how estimates of parameters and uncertainty intervals would compare between `ubms` and `JAGS`.
To explore this I simulated a basic occupancy dataset with known parameter values and fit models with both.
In this simulation, individual sites were nested within groups, and I estimated group-level random intercepts.
This is a common scenario for so-called [stacked](https://groups.google.com/forum/#!topic/unmarked/OHkk98y09Zo) data from multiple years, where the "groups" are unique sites and "sites" represent unique site-year combinations.

# Simulating the data

First I set the true parameter values for the vector of fixed effects $\boldsymbol{\beta}$ and the random effects standard deviation $\sigma$.

```{r}
beta <- c(0.4, -0.5, 0.3, 0.5)
sigma <- 1.2
```

Then, I simulated covariate data for the occupancy and detection submodels.
This included a random group assignment (26 possible groups) for each site.

```{r}
dat_occ <- data.frame(x1=rnorm(500), group=factor(sample(letters[1:26], 500, replace=T)))
dat_p <- data.frame(x2=rnorm(500*5))
```

Next I simulated the random effect value for each group (with an effects parameterization centered on 0).

```{r}
re <- rnorm(26, 0, sigma)
```

Finally, I simulated an occupancy dataset `y` using the data and parameter values obtained above.

```{r}
y <- matrix(NA, 500, 5)
z <- rep(NA, 500)
group_idx <- as.numeric(dat_occ$group) #convert group assignment to numeric index
idx <- 1
for (i in 1:500){
  #True latent state of site i in group group_idx[i]
  z[i] <- rbinom(1,1, plogis(beta[1] + beta[2]*dat_occ$x1[i] + re[group_idx[i]]))
  
  #Observation process
  for (j in 1:5){
    #Observed data at site i for observation j
    y[i,j] <- z[i]*rbinom(1,1, plogis(beta[3] + beta[4]*dat_p$x2[idx]))
    idx <- idx + 1
  }
}
```

# Fitting the model in ubms

I combined the covariates and simulated data into an `unmarkedFrame`.

```{r}
library(ubms)
umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p)
```

Using `stan_occu`, I fit a model with a random effect of group on occupancy (`(1|group)`).
I probably did not use enough iterations but it's fine for this example.

```{r, eval=FALSE}
options(mc.cores=3)
ubms_fit <- stan_occu(~x2~x1+(1|group), umf, chains=3, iter=300)
```

```{r, echo=FALSE}
options(mc.cores=3)
ubms_fit <- stan_occu(~x2~x1+(1|group), umf, chains=3, iter=300, refresh=0)
```


# Fitting the model in JAGS

A bit more work is required to fit this model in JAGS.
First I reorganized the data into a list so it could be sent to JAGS.

```{r}
nsites <- nrow(umf@y)
jags_data <- list(y=umf@y, x1=dat_occ$x1, group=as.numeric(dat_occ$group),
                  ngroups=nlevels(dat_occ$group),
                  x2=matrix(dat_p$x2, nrow=nsites, byrow=TRUE),
                  nsites=nsites, nobs=ncol(umf@y))
```

Next I wrote a simple occupancy model with group-level intercepts `gint` in the BUGS language, and saved it to a temporary file.
Explaining the model code is beyond the scope of this example; for more information see [this excellent book](https://www.amazon.com/Bayesian-Population-Analysis-using-WinBUGS/dp/0123870208).

```{r}
modfile <- tempfile()
writeLines("
model{

#Likelihood
for (i in 1:ngroups){
  gint[i] ~ dnorm(beta[1], tau.group)
}

for (i in 1:nsites){
  z[i] ~ dbern(psi[i])
  logit(psi[i]) <- gint[group[i]] + beta[2]*x1[i]

  for (j in 1:nobs){
    y[i,j] ~ dbern(p[i,j]*z[i])
    logit(p[i,j]) <- beta[3] + beta[4]*x2[i,j]
  }
}

#Priors
for (i in 1:4){
  beta[i] ~ dnorm(0,0.01)
}

sd.group ~ dunif(0,10)
tau.group <- pow(sd.group, -2)

}
", con=modfile)
```

JAGS also requires a list of parameters you want to save, and a function to generate initial values.
Generally you need to provide reasonable initial values for the latent state $z$.
Note that I'm saving both the parameter values (`"beta"` and `"sd.group"`), along with the actual random effects estimates (`"gint"`).
```{r}
params <- c("beta", "sd.group", "gint")
inits <- function(){list(z=apply(y, 1, max, na.rm=TRUE))}
```

Finally, I fit the model using the R package [jagsUI](https://cran.r-project.org/web/packages/jagsUI/index.html).
I'm using the development version which you can find [here](https://github.com/kenkellner/jagsUI).

```{r}
library(jagsUI)
jags_fit <- jags(jags_data, inits, params, modfile, n.chains=3, n.adapt=100, n.iter=2000,
                 n.burnin=1000, n.thin=2, parallel=TRUE, quiet=TRUE)
```

Running the model in JAGS is significantly slower, although you could speed up by marginalizing the likelihood.

# Compare fixed effect estimates 

With the model fit in both `ubms` and `JAGS`, I compared the fixed effect parameter point estimates and uncertainty intervals visually.

```{r}
#Data frame of JAGS parameter estimates and UIs
jags_sum <- summary(jags_fit)
beta_jags <- as.data.frame(jags_sum[1:5,c(1,3,7)])
names(beta_jags) <- c("mean","lower","upper")
beta_jags$source <- "JAGS"

#Data frame of ubms parameter estimates and UIs
ubms_sum <- summary(ubms_fit, "state")
beta_ubms <- rbind(summary(ubms_fit, "state")[,c(1,4,8)],
                   summary(ubms_fit, "det")[,c(1,4,8)])
beta_ubms <- beta_ubms[c(1:2,4:5,3),]
names(beta_ubms) <- c("mean","lower","upper")
beta_ubms$source <- "ubms"

#Data frame of true parameter values
beta_truth <- data.frame(mean=c(beta, sigma), lower=NA, upper=NA, source="Truth")

#Bind together
beta_plot <- rbind(beta_jags, beta_ubms, beta_truth)
beta_plot$Parameter <- rep(rownames(beta_jags), 3)

#Plot
library(ggplot2)
dodge <- position_dodge(0.4)
ggplot(data=beta_plot, aes(x=Parameter, y=mean, col=source)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), position=dodge) +
  geom_point(position=dodge, size=3) +
  labs(x='Parameter',y='Value', col='Source') +
  theme_bw() +
  theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(),
        axis.text=element_text(size=12), axis.title=element_text(size=14),
        legend.text=element_text(size=12), legend.title=element_text(size=14))
```

Fixed-effects parameter estimates are very similar between `ubms` and `JAGS`, as are the uncertainty intervals.
In both cases the intervals overlap the true parameter values.

# Compare random effect estimates

Finally, I did the same for the random effect estimates.
The `ranef` method extracts random effects terms from a fitted `ubms` model.
Although we specified the random effect in `ubms` as an "effects" parameterization, `ranef` automatically adds the random effect to the intercept term to get the complete random intercept at the group level. 
The group-level random intercepts in JAGS are parameter `"gint"`.

```{r}
#Get random effect estimates from ubms
re_ubms <- ranef(ubms_fit, "state", summary=TRUE)[[1]][[1]][,c(1,3,4)]
re_ubms$source <- "ubms"

#Get random effects estimates from JAGS
re_jags <- as.data.frame(jags_sum[grepl("gint", rownames(jags_sum)),c("mean","q2.5","q97.5")])
re_jags$source <- "JAGS"
names(re_jags) <- names(re_ubms)

#Get truth
truth <- data.frame(Estimate=re, `2.5%`=NA, `97.5%`=NA, source="Truth", check.names=FALSE)

#Combine
plot_dat <- rbind(re_ubms, re_jags, truth)
plot_dat$group <- rep(letters[1:26], 3)
levels(plot_dat$source) <- c("Truth","JAGS","ubms")

library(ggplot2)
dodge <- position_dodge(0.4)

#Plot a subset of random effects
plot_dat_sub <- plot_dat[plot_dat$group %in% letters[1:6],]
ggplot(data=plot_dat_sub, aes(x=group, y=Estimate, col=source)) +
  geom_errorbar(aes(ymin=`2.5%`, ymax=`97.5%`), position=dodge) +
  geom_point(position=dodge, size=3) +
  labs(x='Group',y='Random effect value', col='Source') +
  theme_bw() +
  theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(),
        axis.text=element_text(size=12), axis.title=element_text(size=14),
        legend.text=element_text(size=12), legend.title=element_text(size=14))
```

As with the fixed effects, the estimated random intercepts are very similar between the two methods.
Success!