summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKen Kellner <ken@kenkellner.com>2018-06-03 13:56:54 -0400
committerKen Kellner <ken@kenkellner.com>2018-06-03 13:56:54 -0400
commitbb28d3156f9b92e281a8e4d661c782c2508e9f9c (patch)
treed73dcbdd412357fb2cf9509909fbbb18064487a2
parent3f7ae2d450b3847ecdbfeebef68753bafd25832e (diff)
Add some notes on MCMC with integrals
-rw-r--r--src/models-with-integrals.Rmd135
1 files changed, 135 insertions, 0 deletions
diff --git a/src/models-with-integrals.Rmd b/src/models-with-integrals.Rmd
new file mode 100644
index 0000000..4b0be11
--- /dev/null
+++ b/src/models-with-integrals.Rmd
@@ -0,0 +1,135 @@
+---
+title: Fitting Bayesian Models with Integrals
+date: 2018-06-03
+output:
+ html_document:
+ theme: journal
+ css: style.css
+ highlight: pygments
+---
+
+Recently I've been working on a model that includes a component of the basic form
+
+$$ \sum_{i=1}^n F(x_i)\int_{X}F(x)dx $$
+
+where $F(x)$ is an arbitrary function that includes parameters I want to obtain estimates of.
+
+# Using R
+
+In `R` a maximum likelihood approach to estimation is possible, using the built-in `integrate()` and `optim()` functions:
+
+```{r,eval=FALSE}
+#Psuedocode example
+#Arbitrary function
+F <- function(x, params){...}
+
+loglik <- function(params, data){
+ #Integral part
+ int_val <- integrate(F, Xmin, Xmax, params)$value
+ #Summation part
+ sum_val <- 0
+ for (i in 1:nrow(data)){
+ sum_val <- sum_val + F(data$x[i], params)
+ }
+ return(-1 * log(sum(sum_val * int_val)))
+}
+
+fit <- optim(inits, loglik, data)
+```
+
+This approach works fine.
+
+# Using Popular MCMC Software
+
+In my specific case, however, I would prefer to fit the model in a Bayesian framework using MCMC.
+The usual approaches are limited: `WinBUGS`, `JAGS`, and `Stan` all lack built-in integral functions.
+Depending on the complexity of the integral, it may be possible to solve it first and then incorporate it into a model in one of these languages.
+Alternatively, you could use some kind of numerical approximation (e.g. a Riemann sum), but that could greatly increase MCMC run time.
+
+The `OpenBUGS` flavor of the `BUGS` language actually does have a `integral` function.
+For example:
+
+```{r, eval=FALSE}
+#OpenBUGS pseudocode
+model {
+ C <- 10000 #Large constant
+ F(x) <- ... #Some expression of parameters
+ inv_val <- integral(F(x), Xmin, Xmax, tol)
+
+ for (i in 1:nobs){
+ sum_val[i] <- ... #Expression from above
+ loglik[i] <- -1 * log(sum_val[i] * int_val)
+ #Zeroes trick for arbitrary likelihood in BUGS
+ mn[i] <- loglik + C
+ dummy[i] <- 0
+ dummy[i] ~ dpois(mn[i])
+ }
+ #Priors, etc.
+}
+```
+
+This works reasonably well in my experience, but has the key limitation that the function $F(x)$ to be integrated has to be able to be expressed in the limited BUGS syntax.
+If it's a complex function that might be impossible, as it was in my case.
+Furthermore, OpenBUGS can be quite frustrating work with due to its infamously cryptic error messages.
+
+# Using NIMBLE
+
+[NIMBLE](https://r-nimble.org/) is a recent `R` package for hierarchical modeling and MCMC.
+It has many features, but the one that caught my attention was its ability to extend existing `BUGS` syntax with user-defined functions and distributions written in `R`.
+For me this was a perfect solution: I could fit my model in a Bayesian framework, leverage my existing knowledge of `BUGS`, and also incorporate my own functions in the more flexible `R` language (including functions with integrals!).
+
+For example, suppose I have an arbitrary function in `R`, `my_func()`, which calculates the value of the integral I'm interested in:
+
+```{r,eval=FALSE}
+#Pseudocode
+my_func <- function(params,Xmin,Xmax){
+ F <- function(x){...} #Expression of params
+ return(integrate(F,Xmin,Xmax)$value)
+}
+```
+
+Since it's written in `R`, I have lots of freedom for defining `F(x)`.
+Next I need to make `NIMBLE` aware of this function, and provide additional detail about the inputs and outputs.
+The `R` function is wrapped in a call to `nimbleRcall` with a similar name.
+
+```{r,eval=FALSE}
+#Pseudocode
+Rmy_func <- nimbleRcall(function(params=double(1), Xmin=integer(0),
+ Xmax=integer(0)){}, Rfun='my_func',
+ returnType = double(0), envir=.GlobalEnv)
+```
+
+The `BUGS` model (nearly identical to the `OpenBUGS` version above) is then written in a wrapper for `NIMBLE`:
+
+```{r, eval=FALSE}
+#Pseudocode
+mod <- nimbleCode({
+ C <- 10000 #Large constant
+ #Call to my custom function
+ int_val <- Rmy_func(params, Xmin, Xmax)
+
+ for (i in 1:nobs){
+ sum_val[i] <- ... #Expression from above
+ loglik[i] <- -1 * log(sum_val[i] * int_val)
+ #Zeroes trick for arbitrary likelihood in BUGS
+ mn[i] <- loglik + C
+ dummy[i] <- 0
+ dummy[i] ~ dpois(mn[i])
+ }
+ #Priors, etc.
+})
+```
+
+The only difference is the call to my custom function which calculates the integral, `Rmy_func`, instead of the built-in `integral` function.
+Finally, run `NIMBLE` after providing other necessary inputs (data, initial values, MCMC run info, etc.):
+
+```{r,eval=FALSE}
+#Pseudocode
+mcmc.out = nimbleMCMC(code = mod, constants = inp_constants,
+ data = inp_data, inits = inp_inits(),
+ monitors = param_names,
+ nchains = 3, niter=3000,nburnin=2500,
+ thin=5,summary =TRUE)
+```
+
+Calling the `R` function from `NIMBLE` unsurprisingly has a MCMC speed penalty, but I found run times to be comparable to `OpenBUGS` for equivalent models.