STA 250 :: Advanced Statistical Computing (UCD, Fall 2013)

Code + goodies used in Prof. Baines' STA 250 Course (UC Davis, Fall 2013)


Project maintained by STA250 Hosted on GitHub Pages — Theme by mattgraham

STA 250 :: Homework 00

For all questions you must show your work. This enables us to understand your thought process, give partial credit and prevent crude cheating. Please see the code of the conduct in the Syllabus for rules about collaborating on homeworks.

For questions requiring computing, if you use R, python or any programming environment then you must turn in a printout of your output with your solutions.
In addition, a copy of your code must be uploaded to your HW0 directory as per Q6 below.


Homework 0 (No Credit -- Practice Only)

Due: In Class, 5:30pm Wed October 9th

Assigned: Wednesday Oct 2nd

Some basic coding problems to get you back in the swing.

  1. Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".
    (From: http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html)

  2. Write a program that generates 10,000 uniform random numbers between 0 and \(2\pi\) (call this \(x\)), and 10,000 uniform random numbers between 0 and 1 (call this \(y\)). You will then have 10,000 pairs of random numbers. Transform \((x,y)\) to \((u,v)\) where:

    \[\begin{aligned} u & = y\cdot\cos(x) \\ v & = y\cdot\sin(x) . \end{aligned}\]

    Make a 2D scatterplot of the 10,000 \((u,v)\) pairs.

    What is the distribution of: \(r=\sqrt{u^{2}+v^{2}}\)

  3. Consider the following snippet:

    Hello, my name is Bob. I am a statistician. I like statistics very much.

    a. Write a program to spit out every character in the snippet to a separate file (i.e., file out_01.txt would contain the character H, file out_02.txt would contain e etc.). Note that the ,, . and spaces should also get their own files.

    b. Write a program to combine all files back together into a single file that contains the original sentence. Take care to respect whitespace and punctuation!

  4. Run boot_camp_demo.py as a batch job on Gauss using the submission script boot_camp_sarray.sh in the Github repo. Follow the instructions in class for how to do this.

  5. Run the Twitter code provided in lecture. Make sure to run the tweet-grabbing portion of code for a sufficient length of time (It is recommended to open another terminal and run ls -alh to check the size of the output file). The README provides full instructions for each of the steps.
    • See how your plot differs from the one shown in lecture 01
    • Modify the code to report the percentage of tweets that had geo-tagged data at the end of the sentiment analysis.
  6. Consider the autoregressive process of order 1, usually called an AR(1) process:
    \[ y_{t} = \rho{}y_{t-1} + \epsilon_{t} , \quad \epsilon_{t}\sim{}N(0,1) , \]

    for \(t=1,2,…,n\). Let \(y_{0}=0\) and the \(\epsilon_{t}\) be independent.

    a. Simulate from this process with \(\rho=0.9\) and \(n=1000\). Plot the resulting series.

    b. Repeat part (a) 200 times, storing the result in a \(1000\times{}200\) matrix. Each column should correspond to a realization of the random process.

    c. Compute the mean of the 200 realizations at each time points \(t=1,2,\ldots,1000\).
    Plot the means.

    d. Plot the variance of the 200 realizations at each time points \(t=1,2,\ldots,1000\).
    Plot the variances.

    e. Compute the mean of each of the 200 series across time points \(i=1,2,\ldots,200\).
    Plot the means.

    f. Compute the variance of each of the 200 series across time points \(i=1,2,\ldots,200\).
    Plot the variances.

    g. Justify the results you have seen in parts b.--f. theoretically.

  7. a. Let \(Z\sim{}N(0,1)\). Compute \(\mathbb{E}\left[\exp^{-Z^{2}}\right]\) using Monte Carlo integration.

    b. Let \(Z\sim{}\textrm{Truncated-Normal}(0,1;[-2,1])\). Compute \(\mathbb{E}\left[Z\right]\) using importance sampling.

  8. Let \(x_{ij}\sim{}N(0,1)\) for \(i=1,\ldots,n\) and \(j=1,2\), and \(x_{i0}=1\) for \(i=1,\ldots,n\). Define \(x_{ij}^{T}=(x_{i0},x_{i1},x_{i2})^{T}\) and \(\beta=(1.2,0.3,-0.9)^{T}\) and let \(\epsilon_{i}\sim{}N(0,1)\) for \(i=1,\ldots,n\).

    Simulate from the linear regression model:

    \[ y_{i} = x_{i}^{T}\beta + \epsilon_{i} , \quad i=1,\ldots,n , \]

    with n=100. Use the bootstrap procedure to estimate \(\textrm{SD}(\hat{\beta})\) based on \(B=1000\) bootstrap resamples. Compare to the asymptotic results reported by lm or computed using the square root of the diagonal elements of \(\hat{\sigma^{2}}(X^{T}X)^{-1}\).

  9. In this question you will fork the course GitHub repo and upload your homework code to all previous homework questions to the repo. Go to https://github.com/STA250/Stuff.

(: Happy Coding! :)