Bayesian and frequentist computation with application to data from the Medical Birth Registry of Norway
Not peer reviewed
MetadataShow full item record
In chapter 1, the field of statistics is discussed in general terms. Then, Bayes’ theorem is presented together with the posterior distribution. Finally, we consider maximum likelihood estimation and its relation to a two-step inference procedure.
The thesis proceeds to introducing several numerical methods in chapter 2. One of the main methods is Hamiltonian Monte Carlo (HMC). In order to understand HMC, which is a type of Markov chain Monte Carlo (MCMC) algorithm, we first study some theory of Markov chains. Furthermore, MCMC and some examples of its well-known algorithms are encountered. As HMC is based on Hamiltonian dynamics (that originate from the field of physics), a short illustration of the dynamics is given. Furthermore, the system of differential equations used in HMC is discussed, as are several different numerical integration schemes and methods for developing them. A numerical integration scheme is a necessary part of the HMC algorithm. Paper 1 covers HMC and finding more efficient numerical integration schemes to be used in the algorithm.
It is clear that there are situations where the MCMC algorithms perform poorly, especially with regards to time consumption. For certain latent variable models we should consider alternative numerical methods to MCMC. For these models, one could use integrated nested Laplace approximation (INLA). Although INLA has not been used in the model specification here, we are interested in some of the theory it applies. INLA is an efficient alternative for latent Gaussian models and uses some important properties of Gaussian fields (GFs) and Gaussian Markov random fields (GMRFs). From INLA, we here more specifically consider the relation between the computationally demanding covariance matrix and the sparse precision matrix. These two matrices are directly connected to GFs and GMRFs respectively, and we examine some the computational advantages that come with GMRFs compared to GFs. We use the template model builder (TMB) that uses automatic differentiation and Laplace approximation in the process of finding the maximum likelihood estimation of the parameters of the marginal likelihood. The combination in TMB gives the ability to handle complex latent variable models, and this is the methodology that is used for the spatial models in paper 2 and paper 3.
In chapter 3 we look at the field of spatial statistics. This includes its uses and the different types of spatial data that we have. Spatial data are separated by what kind of location information that is available and what type of problem we want to model. Moreover, we discuss the stochastic partial differential equations (SPDE) approach. This is an approach that is able to link GFs and GMRFs using a triangulated grid. This grid, a weak solution to the SPDE and its use with spatial models are examined. To understand the SPDE approach we also consider the Matérn covariance function, which is the covariance function for certain GFs. The SPDE approach is essential for the results of paper 2 and paper 3. The spatial and spatio-temporal models are also explained and some additional results to paper 3 are presented. Finally, we look at health registry data, which is the type of data used in paper 2 and paper 3.