summaryrefslogtreecommitdiff
path: root/src/models-with-integrals.Rmd
blob: 4b0be116c8627eef9c9b82d7ad0b013c4d3a085c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: Fitting Bayesian Models with Integrals
date: 2018-06-03
output:
  html_document:
    theme: journal
    css: style.css
    highlight: pygments
---

Recently I've been working on a model that includes a component of the basic form

$$ \sum_{i=1}^n F(x_i)\int_{X}F(x)dx $$

where $F(x)$ is an arbitrary function that includes parameters I want to obtain estimates of.

# Using R

In `R` a maximum likelihood approach to estimation is possible, using the built-in `integrate()` and `optim()` functions:

```{r,eval=FALSE}
#Psuedocode example
#Arbitrary function
F <- function(x, params){...}

loglik <- function(params, data){
	#Integral part
	int_val <- integrate(F, Xmin, Xmax, params)$value
	#Summation part
	sum_val <- 0
	for (i in 1:nrow(data)){		
		sum_val <- sum_val + F(data$x[i], params)
	}
	return(-1 * log(sum(sum_val * int_val)))
}

fit <- optim(inits, loglik, data)
```

This approach works fine.

# Using Popular MCMC Software

In my specific case, however, I would prefer to fit the model in a Bayesian framework using MCMC.
The usual approaches are limited: `WinBUGS`, `JAGS`, and `Stan` all lack built-in integral functions.
Depending on the complexity of the integral, it may be possible to solve it first and then incorporate it into a model in one of these languages.
Alternatively, you could use some kind of numerical approximation (e.g. a Riemann sum), but that could greatly increase MCMC run time.

The `OpenBUGS` flavor of the `BUGS` language actually does have a `integral` function.
For example:

```{r, eval=FALSE}
#OpenBUGS pseudocode
model {
	C <- 10000 #Large constant		
	F(x) <- ... #Some expression of parameters
	inv_val <- integral(F(x), Xmin, Xmax, tol)

	for (i in 1:nobs){
		sum_val[i] <- ... #Expression from above
		loglik[i] <- -1 * log(sum_val[i] * int_val)
		#Zeroes trick for arbitrary likelihood in BUGS
		mn[i] <- loglik + C
		dummy[i] <- 0
		dummy[i] ~ dpois(mn[i])
	}
	#Priors, etc.
}
```

This works reasonably well in my experience, but has the key limitation that the function $F(x)$ to be integrated has to be able to be expressed in the limited BUGS syntax.
If it's a complex function that might be impossible, as it was in my case.
Furthermore, OpenBUGS can be quite frustrating work with due to its infamously cryptic error messages.

# Using NIMBLE

[NIMBLE](https://r-nimble.org/) is a recent `R` package for hierarchical modeling and MCMC.
It has many features, but the one that caught my attention was its ability to extend existing `BUGS` syntax with user-defined functions and distributions written in `R`.
For me this was a perfect solution: I could fit my model in a Bayesian framework, leverage my existing knowledge of `BUGS`, and also incorporate my own functions in the more flexible `R` language (including functions with integrals!).

For example, suppose I have an arbitrary function in `R`, `my_func()`, which calculates the value of the integral I'm interested in:

```{r,eval=FALSE}
#Pseudocode
my_func <- function(params,Xmin,Xmax){
	F <- function(x){...} #Expression of params
	return(integrate(F,Xmin,Xmax)$value)
}
```

Since it's written in `R`, I have lots of freedom for defining `F(x)`. 
Next I need to make `NIMBLE` aware of this function, and provide additional detail about the inputs and outputs.
The `R` function is wrapped in a call to `nimbleRcall` with a similar name.

```{r,eval=FALSE}
#Pseudocode
Rmy_func <- nimbleRcall(function(params=double(1), Xmin=integer(0), 
						Xmax=integer(0)){}, Rfun='my_func', 
						returnType = double(0), envir=.GlobalEnv)
```

The `BUGS` model (nearly identical to the `OpenBUGS` version above) is then written in a wrapper for `NIMBLE`:

```{r, eval=FALSE}
#Pseudocode
mod <- nimbleCode({
	C <- 10000 #Large constant		
	#Call to my custom function
	int_val <- Rmy_func(params, Xmin, Xmax)

	for (i in 1:nobs){
		sum_val[i] <- ... #Expression from above
		loglik[i] <- -1 * log(sum_val[i] * int_val)
		#Zeroes trick for arbitrary likelihood in BUGS
		mn[i] <- loglik + C
		dummy[i] <- 0
		dummy[i] ~ dpois(mn[i])
	}
	#Priors, etc.
})
```

The only difference is the call to my custom function which calculates the integral, `Rmy_func`, instead of the built-in `integral` function. 
Finally, run `NIMBLE` after providing other necessary inputs (data, initial values, MCMC run info, etc.):

```{r,eval=FALSE}
#Pseudocode
mcmc.out = nimbleMCMC(code = mod, constants = inp_constants,
					  data = inp_data, inits = inp_inits(),
                      monitors = param_names,
					  nchains = 3, niter=3000,nburnin=2500,
                      thin=5,summary =TRUE)
```

Calling the `R` function from `NIMBLE` unsurprisingly has a MCMC speed penalty, but I found run times to be comparable to `OpenBUGS` for equivalent models.