STA 250 :: Advanced Statistical Computing (UCD, Fall 2013)

Code + goodies used in Prof. Baines' STA 250 Course (UC Davis, Fall 2013)

Project maintained by STA250 Hosted on GitHub Pages — Theme by mattgraham

STA 250 :: Homework Policy

For all questions you must show your work. This enables us to understand your thought process, give partial credit and prevent crude cheating. Please see the code of the conduct in the Syllabus for rules about collaborating on homeworks.

For questions requiring computing, if you use R, python or any programming environment then you must turn in a printout of your output with your solutions.
In addition, a copy of your code must be uploaded to the appropriate HW directory of your forked GitHub repo within 24 hours of the homework due date.

Homework 03

Due: In Class, Wed November 27th

Assigned: Wednesday November 20th

Optimization and EM Module

(Click here for printable version)

In this homework you will implement several variants of the EM algorithm discussed in lecture. This includes the regular EM algorithm, the MCEM algorithm and the IEM algorithm.

Sync your fork of the course GitHub repo to include the latest updates using the instructions provided here.

In this problem you will implement the most basic optimization algorithms presented in class: bisection and Newton-Raphson.

Write a general function to implement the bisection algorithm. The function should take the following arguments:

The function to find the root of
The initial interval \((l,u)\) (with \(g(l)g(u)<0\)
The tolerance of the convergence criteria
Maximum number of iterations
A debugging option (e.g., to print status to the user)

Write a general function to implement the Newton-Raphson algorithm. The function should take the following arguments:

The function to find the root of
The derivative of the function
The starting value
The tolerance of the convergence criteria
Maximum number of iterations
A debugging option (e.g., to print status to the user)

Consider the classic linkage problem from Rao (1969) with probabilities and counts:

Reference:

Use both of your functions to find the MLE for \(\lambda\) in the linkage example.

What to turn in:

Code for your bisection and Newton-Raphson algorithms
Output showing the results of applying your code to the linkage problem

In this question, you will implement EM for a multivariate-t regression model: \begin{equation} Y_i \sim t_{p}(\mu,\Psi,\nu) , \qquad i=1,\ldots,n \label{om} \end{equation} where \(t_p\) is a multivariate t-distribution in \(p\) dimensions. It turns out that the t-distribution has a convenient representation as a ratio of a normal and \(\chi^{2}\) distribution. Therefore, we can write the model in \eqref{om} as: \begin{align} Y_i | \tau_{i},\theta &\sim N_p\left(\mu,\frac{1}{\tau_{i}}\Psi\right) , \\ \tau_{i} &\sim \frac{\chi^{2}_{\nu}}{\nu} , \end{align} where \(\theta=(\mu,\Psi)\) is the parameter to be estimated (\(\nu\) is fixed) with \(Y_{obs}=(Y_1,\ldots,Y_{n})\) and \(Y_{mis}=(\tau_{1},\ldots,\tau_{n})\). For convenience, define: \begin{equation} \tau_{i}^{(t+1)} = \mathbb{E}\left[\tau_{i}|Y_{obs},\theta^{(t)}\right] = \frac{\nu+p}{\nu+\left(Y_{i}-\mu^{(t)}\right)^{T}\left[\Psi^{(t)}\right]^{-1}\left(Y_{i}-\mu^{(t)}\right)} . \end{equation} Note that there is no \(\sqrt{\cdot}\) for the \(\tau_{i}\)'s in the multivariate formulation used here (unlike in the standard univariate representation of a t-distribution). There are multiple different definitions of the multivariate \(t-\)distribution in the literature, although the one used here is standard in the EM and Bayesian communities.

Derive the EM algorithm for estimating \(\theta\) i.e., find \(\mu^{(t+1)}\) and \(\Psi^{(t+1)}\).

Some possibly useful facts:

For \(A\) a \(p\times{}p\) matrix and \(c>0\) a constant: \[ \textrm{det}(cA) = c^{p}\textrm{det}(A) \]
If \(X\sim{}\textrm{Gamma}(a,b)\) then: \[ p(x|a,b) \propto x^{a-1}\exp\left\{-bx\right\} , \] for \(a,b>0\) and \(\mathbb{E}\left[X\right]=\frac{a}{b}\).
If \(A\) is a positive definite \(p\times{}p\) matrix then: \begin{equation} \textrm{argmax}_{A}\left\{n\log(\textrm{det}(A^{-1})) - \textrm{tr}\left(A^{-1}B\right)\right\} = \frac{1}{n}B . \end{equation}
Facts about the \(\textrm{trace}\) operator for scalar \(c\), vector \(x\) and square matrices \(A,B,C\): \begin{gather} \textrm{tr}(c) = c , \qquad \textrm{tr}(AB) = \textrm{tr}(BA) \\ \textrm{tr}(A+B) = \textrm{tr}(A)+\textrm{tr}(B) , \qquad \textrm{tr}(x^{T}Ax) = \textrm{tr}(xx^{T}A) . \end{gather}

In this question you will implement two different EM algorithms for a Hierarchical Poisson model: \begin{equation} Y_i | \lambda_i \sim \textrm{Pois}(\lambda_i) , \qquad \lambda_i | \beta \sim \textrm{Gamma}(1,\beta) ,\label{eq1} \end{equation} where the pdf of a \(\textrm{Gamma}(1,b)\) random variable is defined to be \[ p(x) = \mathcal{I}_{\left\{x>0\right\}}b\exp\left\{-bx\right\} , \] i.e., \(\lambda_i\sim\textrm{Expo}(\beta)\).