STA 250 :: Advanced Statistical Computing (UCD, Fall 2013)

Code + goodies used in Prof. Baines' STA 250 Course (UC Davis, Fall 2013)


Project maintained by STA250 Hosted on GitHub Pages — Theme by mattgraham

STA 250 :: Homework Policy

For all questions you must show your work. This enables us to understand your thought process, give partial credit and prevent crude cheating. Please see the code of the conduct in the Syllabus for rules about collaborating on homeworks.

For questions requiring computing, if you use R, python or any programming environment then you must turn in a printout of your output with your solutions.
In addition, a copy of your code must be uploaded to the appropriate HW directory of your forked GitHub repo within 24 hours of the homework due date.


Homework 03

Due: In Class, Wed November 27th

Assigned: Wednesday November 20th

Optimization and EM Module

(Click here for printable version)

In this homework you will implement several variants of the EM algorithm discussed in lecture. This includes the regular EM algorithm, the MCEM algorithm and the IEM algorithm.

  1. Sync your fork of the course GitHub repo to include the latest updates using the instructions provided here.


  2. In this problem you will implement the most basic optimization algorithms presented in class: bisection and Newton-Raphson.
    1. Write a general function to implement the bisection algorithm. The function should take the following arguments:
      • The function to find the root of
      • The initial interval \((l,u)\) (with \(g(l)g(u)<0\)
      • The tolerance of the convergence criteria
      • Maximum number of iterations
      • A debugging option (e.g., to print status to the user)
    2. Write a general function to implement the Newton-Raphson algorithm. The function should take the following arguments:
      • The function to find the root of
      • The derivative of the function
      • The starting value
      • The tolerance of the convergence criteria
      • Maximum number of iterations
      • A debugging option (e.g., to print status to the user)
    3. Consider the classic linkage problem from Rao (1969) with probabilities and counts:
    4. \begin{equation} \begin{array}{ccc} \textrm{Phenotype} & \textrm{Probability} & \textrm{Count} \\ \textrm{AB} & (3-2\theta+\theta^2)/4 & 125 \\ \textrm{Ab} & (2\theta-\theta^2)/4 & 18 \\ \textrm{aB} & (2\theta-\theta^2)/4 & 20 \\ \textrm{ab} & (1-2\theta+\theta^2)/4 & 34 \end{array} \end{equation} Defining \(\lambda=1-2\theta+\theta^2\), the likelihood is seen to be: \begin{equation} L(\lambda) \propto (2+\lambda)^{125}(1-\lambda)^{18+20}\lambda^{34} \end{equation} Reference: Rao, C.R. (1965) Linear Statistical Inference and its Applications (Second Edition). Wiley Series in Probability and Statistics.

      Use both of your functions to find the MLE for \(\lambda\) in the linkage example.

      What to turn in:

      • Code for your bisection and Newton-Raphson algorithms
      • Output showing the results of applying your code to the linkage problem
  3. In this question, you will implement EM for a multivariate-t regression model: \begin{equation} Y_i \sim t_{p}(\mu,\Psi,\nu) , \qquad i=1,\ldots,n \label{om} \end{equation} where \(t_p\) is a multivariate t-distribution in \(p\) dimensions. It turns out that the t-distribution has a convenient representation as a ratio of a normal and \(\chi^{2}\) distribution. Therefore, we can write the model in \eqref{om} as: \begin{align} Y_i | \tau_{i},\theta &\sim N_p\left(\mu,\frac{1}{\tau_{i}}\Psi\right) , \\ \tau_{i} &\sim \frac{\chi^{2}_{\nu}}{\nu} , \end{align} where \(\theta=(\mu,\Psi)\) is the parameter to be estimated (\(\nu\) is fixed) with \(Y_{obs}=(Y_1,\ldots,Y_{n})\) and \(Y_{mis}=(\tau_{1},\ldots,\tau_{n})\). For convenience, define: \begin{equation} \tau_{i}^{(t+1)} = \mathbb{E}\left[\tau_{i}|Y_{obs},\theta^{(t)}\right] = \frac{\nu+p}{\nu+\left(Y_{i}-\mu^{(t)}\right)^{T}\left[\Psi^{(t)}\right]^{-1}\left(Y_{i}-\mu^{(t)}\right)} . \end{equation} Note that there is no \(\sqrt{\cdot}\) for the \(\tau_{i}\)'s in the multivariate formulation used here (unlike in the standard univariate representation of a t-distribution). There are multiple different definitions of the multivariate \(t-\)distribution in the literature, although the one used here is standard in the EM and Bayesian communities.
    1. Derive the EM algorithm for estimating \(\theta\) i.e., find \(\mu^{(t+1)}\) and \(\Psi^{(t+1)}\).
    2. Some possibly useful facts:

      • For \(A\) a \(p\times{}p\) matrix and \(c>0\) a constant: \[ \textrm{det}(cA) = c^{p}\textrm{det}(A) \]
      • If \(X\sim{}\textrm{Gamma}(a,b)\) then: \[ p(x|a,b) \propto x^{a-1}\exp\left\{-bx\right\} , \] for \(a,b>0\) and \(\mathbb{E}\left[X\right]=\frac{a}{b}\).
      • If \(A\) is a positive definite \(p\times{}p\) matrix then: \begin{equation} \textrm{argmax}_{A}\left\{n\log(\textrm{det}(A^{-1})) - \textrm{tr}\left(A^{-1}B\right)\right\} = \frac{1}{n}B . \end{equation}
      • Facts about the \(\textrm{trace}\) operator for scalar \(c\), vector \(x\) and square matrices \(A,B,C\): \begin{gather} \textrm{tr}(c) = c , \qquad \textrm{tr}(AB) = \textrm{tr}(BA) \\ \textrm{tr}(A+B) = \textrm{tr}(A)+\textrm{tr}(B) , \qquad \textrm{tr}(x^{T}Ax) = \textrm{tr}(xx^{T}A) . \end{gather}


  4. In this question you will implement two different EM algorithms for a Hierarchical Poisson model: \begin{equation} Y_i | \lambda_i \sim \textrm{Pois}(\lambda_i) , \qquad \lambda_i | \beta \sim \textrm{Gamma}(1,\beta) ,\label{eq1} \end{equation} where the pdf of a \(\textrm{Gamma}(1,b)\) random variable is defined to be \[ p(x) = \mathcal{I}_{\left\{x>0\right\}}b\exp\left\{-bx\right\} , \] i.e., \(\lambda_i\sim\textrm{Expo}(\beta)\).
    1. Derive the EM algorithm for \(\beta\) for the model in \eqref{eq1}.
    2. Construct an Ancillary Augmentation (AA) that preserves the observed data log-likelihood of the Hierarchical model in \eqref{eq1}. Hint: Use the strategy suggested in class! Derive the corresponding EM algorithm.
    3. Using the SA and AA from (a) and (b), derive the corresponding Interwoven EM algorithm. You may select either the SA or AA as A1 (the resulting will be the same either way for this example).
    4. Derive the observed data log-likelihood according to model \eqref{eq1} and compute the MLE for \(\beta\).
    5. Compare the linear rate of convergence of each of the EM algorithms, and discuss when each algorithm will perform well (or not).

(: Happy Coding! :)