From bb28d3156f9b92e281a8e4d661c782c2508e9f9c Mon Sep 17 00:00:00 2001 From: Ken Kellner Date: Sun, 3 Jun 2018 13:56:54 -0400 Subject: Add some notes on MCMC with integrals --- src/models-with-integrals.Rmd | 135 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 src/models-with-integrals.Rmd diff --git a/src/models-with-integrals.Rmd b/src/models-with-integrals.Rmd new file mode 100644 index 0000000..4b0be11 --- /dev/null +++ b/src/models-with-integrals.Rmd @@ -0,0 +1,135 @@ +--- +title: Fitting Bayesian Models with Integrals +date: 2018-06-03 +output: + html_document: + theme: journal + css: style.css + highlight: pygments +--- + +Recently I've been working on a model that includes a component of the basic form + +$$ \sum_{i=1}^n F(x_i)\int_{X}F(x)dx $$ + +where $F(x)$ is an arbitrary function that includes parameters I want to obtain estimates of. + +# Using R + +In `R` a maximum likelihood approach to estimation is possible, using the built-in `integrate()` and `optim()` functions: + +```{r,eval=FALSE} +#Psuedocode example +#Arbitrary function +F <- function(x, params){...} + +loglik <- function(params, data){ + #Integral part + int_val <- integrate(F, Xmin, Xmax, params)$value + #Summation part + sum_val <- 0 + for (i in 1:nrow(data)){ + sum_val <- sum_val + F(data$x[i], params) + } + return(-1 * log(sum(sum_val * int_val))) +} + +fit <- optim(inits, loglik, data) +``` + +This approach works fine. + +# Using Popular MCMC Software + +In my specific case, however, I would prefer to fit the model in a Bayesian framework using MCMC. +The usual approaches are limited: `WinBUGS`, `JAGS`, and `Stan` all lack built-in integral functions. +Depending on the complexity of the integral, it may be possible to solve it first and then incorporate it into a model in one of these languages. +Alternatively, you could use some kind of numerical approximation (e.g. a Riemann sum), but that could greatly increase MCMC run time. + +The `OpenBUGS` flavor of the `BUGS` language actually does have a `integral` function. +For example: + +```{r, eval=FALSE} +#OpenBUGS pseudocode +model { + C <- 10000 #Large constant + F(x) <- ... #Some expression of parameters + inv_val <- integral(F(x), Xmin, Xmax, tol) + + for (i in 1:nobs){ + sum_val[i] <- ... #Expression from above + loglik[i] <- -1 * log(sum_val[i] * int_val) + #Zeroes trick for arbitrary likelihood in BUGS + mn[i] <- loglik + C + dummy[i] <- 0 + dummy[i] ~ dpois(mn[i]) + } + #Priors, etc. +} +``` + +This works reasonably well in my experience, but has the key limitation that the function $F(x)$ to be integrated has to be able to be expressed in the limited BUGS syntax. +If it's a complex function that might be impossible, as it was in my case. +Furthermore, OpenBUGS can be quite frustrating work with due to its infamously cryptic error messages. + +# Using NIMBLE + +[NIMBLE](https://r-nimble.org/) is a recent `R` package for hierarchical modeling and MCMC. +It has many features, but the one that caught my attention was its ability to extend existing `BUGS` syntax with user-defined functions and distributions written in `R`. +For me this was a perfect solution: I could fit my model in a Bayesian framework, leverage my existing knowledge of `BUGS`, and also incorporate my own functions in the more flexible `R` language (including functions with integrals!). + +For example, suppose I have an arbitrary function in `R`, `my_func()`, which calculates the value of the integral I'm interested in: + +```{r,eval=FALSE} +#Pseudocode +my_func <- function(params,Xmin,Xmax){ + F <- function(x){...} #Expression of params + return(integrate(F,Xmin,Xmax)$value) +} +``` + +Since it's written in `R`, I have lots of freedom for defining `F(x)`. +Next I need to make `NIMBLE` aware of this function, and provide additional detail about the inputs and outputs. +The `R` function is wrapped in a call to `nimbleRcall` with a similar name. + +```{r,eval=FALSE} +#Pseudocode +Rmy_func <- nimbleRcall(function(params=double(1), Xmin=integer(0), + Xmax=integer(0)){}, Rfun='my_func', + returnType = double(0), envir=.GlobalEnv) +``` + +The `BUGS` model (nearly identical to the `OpenBUGS` version above) is then written in a wrapper for `NIMBLE`: + +```{r, eval=FALSE} +#Pseudocode +mod <- nimbleCode({ + C <- 10000 #Large constant + #Call to my custom function + int_val <- Rmy_func(params, Xmin, Xmax) + + for (i in 1:nobs){ + sum_val[i] <- ... #Expression from above + loglik[i] <- -1 * log(sum_val[i] * int_val) + #Zeroes trick for arbitrary likelihood in BUGS + mn[i] <- loglik + C + dummy[i] <- 0 + dummy[i] ~ dpois(mn[i]) + } + #Priors, etc. +}) +``` + +The only difference is the call to my custom function which calculates the integral, `Rmy_func`, instead of the built-in `integral` function. +Finally, run `NIMBLE` after providing other necessary inputs (data, initial values, MCMC run info, etc.): + +```{r,eval=FALSE} +#Pseudocode +mcmc.out = nimbleMCMC(code = mod, constants = inp_constants, + data = inp_data, inits = inp_inits(), + monitors = param_names, + nchains = 3, niter=3000,nburnin=2500, + thin=5,summary =TRUE) +``` + +Calling the `R` function from `NIMBLE` unsurprisingly has a MCMC speed penalty, but I found run times to be comparable to `OpenBUGS` for equivalent models. -- cgit v1.2.3