Statistical Consulting: Use of modern statistical methods for data analysis with R

Instructor: Larry Goldstein, larry at usc dot edu, KAP 406D.
Office Hours: Monday 12-1:30, Wednesday 9-10.

Grader: Melike Sirlanci Tuysuzoglu, sirlanci at usc.edu KAP 244
Office Hours: Thursday 12:00pm-1:00pm and Friday 11:00am-1:00pm

Lecture:  KAP 147, MW 2:00-3:20.

Text and Course Coverage

An Introduction to Statistical Learning, James, Witten, Hastie and Tibshirani

Time Permitting, from the text book the course will cover:

Chapter 1: Introduction
Chapter 2: Statistical Learning
Chapter 4: Classification
Chapter 5: Resampling Methods
Chapter 6: Linear Model Selection and Regularization
Chapter 8: Tree Based Methods
Chapter 10: Unsupervised Learning

Though the main emphasis of the course is on the handling of real data and the use of R, the course will also include various mathematical `interludes’ that explain, justify and broaden the understanding of the basis on which some of the methods introduced rest, in particular for those techniques that may not have been covered in previous core courses.

Exams and Grading Policy

Grading Policy

  • 30% Homework and in class assignments. Please drop off a hard copy of your homework, including the results of the coding problems, in Melike’s office during office hours if possible, otherwise slide it under her door. R code that produced output should be uploaded via blackboard in .R script files, or a format that you have cleared with Melike.
  • 30% Midterm exam, Monday, October 9th, n=21, Median = 72, 25th = 65, 75th = 80, Ave = 67, SD = 19, High = 93
  • 30% Final Project: each student will pick a consulting topic, prepare a writeup and make a class presentation. Writeups are to be distributed to the class at the time of the presentation. The presentation should describe the problem considered, why it is of interest, the data available, and the goals of inference. Then the method of data analysis should be discussed, the results of that analysis, along with the conclusions made and a sense of how reliable those conclusions are. You may include R code written for specifically for the project if you find that it contains some component of interest. There are no preset limits on the length of the writeup, but ballpark it could be from 4-10 pages, without code. Project proposals must be submitted and approved in order to fix a date for your presentation, and are due September 19th.
  • 10% Participation in class, and in presentations of course final projects.

Assignments

1. Chapter 2 Exercises:   Conceptual 1-7, Applied 8-10
2. Chapter 4 Exercises:   Conceptual 1-9, Applied 10-13
3. Chapter 5 Exercises:   Conceptual 1-4, Applied 5-8
4. Chapter 6 Exercises:   Conceptual 1,3,4,5,6,7, Applied 9,11
5. Chapter 8 Excecises:   Conceptual 1-5, Applied 8,10,11
6. Chapter 10 Exercises: Conceptual 2,4,6, Applied 7,8,9

Due Dates: 
1. Sept 12
2. Oct 6
3. Oct 24
4. Nov 7
5. Nov 21
6. Dec 1

Projects

As you begin thinking of a potential project, please keep the following items in mind:

1. The overall question or questions you would like to address.
2. What data you will use and where it can be obtained.
3. What specific predictors are available, roughly how many there are, and how large a sample size you will have.
4. What specific response you would like to predict, and what model and methods you will use to predict it.

Project Schedule

October      18: Ang Mai
October      23: Lijia Wang,Yuqi Wang
October      25: Jinting Liu, Yusheng Wu
October      30: Yujia Deng, Lernik Asserian
November   1:  Kai Fan, Tianyu Wang
November   6: Maria Allayioti, Larry Goldstein
November   8: Yuxuan Gu, Chukiat Phonsom
November  13: Guilherme De Sena Brandine, Stella Ma
November  15: Joshua Derenski, Amal Thomas
November  20: Sharma Hiteshi, Larry Goldstein
November  27: Ranran Chen, Lang Wang
November  29: Arash Rahmani, Ajay Halthor

Papers and Posters Connected to Past Projects

  • VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data, Microbiome20175:69
  • Thacker, I., Muis, K.R., Danielson, R.W., Sinatra G., Pekrun, R., Winne, P.H., Chevrier, M. (August, 2017). The Influence of Attitudes and Emotions in Learning from Multiple Texts. Poster to be presented to the European Association for Research on Learning and Instruction, Tampere, Finland.
  • Thacker, I., Muis, K.R., Danielson, R.W., Sinatra G., Pekrun, R., Winne, P.H., Chevrier, M. (April, 2017). The Influence of Attitudes and Emotions in Learning from Multiple Texts. Poster presented to the Annual meeting of the American Educational Research Association, San Antonio, USA.