STA 250 :: Advanced Statistical Computing (UCD, Fall 2013)

Code + goodies used in Prof. Baines' STA 250 Course (UC Davis, Fall 2013)


Project maintained by STA250 Hosted on GitHub Pages — Theme by mattgraham

STA 250 :: Syllabus

Advanced Statistical Computation (Baines)

UCD, Fall Quarter, 2013

(Syllabus last updated: 09/14/13)

The course is organized around the following key topics:

  1. "Complex" modeling: Bayesian inference, computational methods, applications
  2. "Big" Data: understanding, approaches, tools, applications
  3. "Fast" computation: methodology, technology, tools, applications

To cover these topics, the course will be broken into four modules: (i) Bayesian Inference and Computation, (ii) Statistics with "Big Data", (iii) Optimization and the EM Algorithm, and, (iv) Efficient Computing: Parallelization and GPUs.

The course is designed to equip students with the basic skills required to tackle challenging problems at the forefront of modern statistical applications. For statistics PhD students, there are many rich research topics in these areas. For masters students, and PhD students from other fields, the course is intended to cultivate practical skills that are required of the modern statistician/data scientst, and can be used in your own field of research or future career.

Before we get into the fun stuff, the first few classes will serve as a "boot camp" to make sure everyone has the mathematical and programming background to tackle the challenges later in the course. We will also use the first few weeks to become familiar with some of the key datasets that we will use throughout the course.

Please complete the pre-course survey!

Logistics

Evaluation

Grading for the course will be broken down with the following weighting:

There is no final exam for the course.

Course Topics

References

Please note that since the topics for the course are taken from a variety of different areas, there will
be no single textbook for the class. Below are a list of useful references. It is not required that
you purchase any of these books, and they will primarily serve as additional references
for material presented in class.

Tentative Course Schedule

Lecture Date Topic Notes
01 Mon 30th Sep: Course Overview, Demos
02 Wed 2nd Oct: Boot Camp -- Basics, R, Python
03 Mon 7th Oct: Boot Camp -- Gauss, Linux, Stats
04 Wed 9th Oct: Bayes I -- Introduction to Bayes Homework 0 Due
05 Mon 14th Oct: Bayes II -- MCMC/Bayesian Computing
06 Wed 16th Oct: Bayes III -- Inference/Model Checking
07 Mon 21st Oct: Bayes IV -- Applications/Extras
08 Wed 23rd Oct: Big Data I -- Types of "Big" Data
09 Mon 28th Oct: Big Data II -- "Big" data strategies Homework 1 Due
10 Wed 30th Oct: Big Data III -- "Big" data computation
11 Mon 4th Nov: Big Data IV -- Applications/Extras
12 Wed 6th Nov: EM I -- Introduction to EM
-- Mon 11th Nov: NO CLASS -- VETERANS DAY
13 Wed 13th Nov: EM II -- Variations on EM Homework 2 Due
14 Mon 18th Nov: EM III -- Parametrization, Convergence
15 Wed 20th Nov: EM IV -- Efficient algorithms
16 Mon 25th Nov: GPUs I -- Overview of GPUs Homework 3 Due
17 Wed 27th Nov: GPUs II -- Programming GPUs
18 Mon 2nd Dec: GPUs III -- High-level GPU interfaces
19 Wed 4th Dec: GPUs IV -- Applications/extras
-- Fri 6th Dec: Homework 4 Due
-- Mon 9th Dec: Final Project Due

Assignments:

The basic outline for homeworks is below. All due dates are subject to change.

Each homework will be followed by a "code-swapping" assignment, whereby each student will be
assigned to write a short critique of the homework code submitted by another student in the
course. The plan is for R users to critique Python code, and Python users to critique
R so that students are exposed to different programming models.

and, last, but by no means least:

Other Course Duties:

Projects:

Below are very basic project descriptions to provide you a flavor of what to expect. Full final project descriptions and datasets will be provided later in the course. All final projects require a written report, and possibly an oral presentation.
Students are also welcome to submit their own proposals for final projects. This can be
especially useful for PhD students with specific problems that they would like to address using
the tools learned in class.

  1. Bayesian Project This project will allow you the opportunity to apply your newly acquired knowledge of Bayesian statistics and computational strategies to a complex model for real-world data (provided by the instructor). You will be required to demonstrate that you are able to effectively solve the problem using simulation results, and also to draw conclusions based on real data.

  2. Big Data Project This project will extend some of the skills developed in the "Big" data module. It will involve a computationally challenging analysis of a "big" dataset: including model development, refinement, verification and application.

  3. EM Project Throughout the course you will be introduced to the Expectation-Maximization (EM) algorithm, and many more sophisticated extensions of it. This final project will require you to derive several of these algorithms for a specific statistical model. Once derived, your job will be to implement the algorithms and run simulations to compare competing performance. The final report will detail your algorithms, explain any implementation decisions made, and summarize your findings.

  4. GPU Project In the fourth and final module of the course you will be introduced to Graphics Processing Units (GPUs) and how they can be used for statistical computation. Using the tools you have learned in class, this final project will require you to implement a statistical analysis that makes use of the power of the GPU. You will be required to implement, debug, test, optimize and evaluate your code.


(: Happy Coding! :)