Office: B206-B
Office hours: by appointment (email)
especially
Tuesday or Thursday morning or early afternoon, or Tuesday after 6:30pm, or
anytime I'm in my office; my door is always open.
Lectures: Generally Tuesdays and Thursdays, although some
lectures will be on Fridays.
Lectures begin at 3:10pm August 20 through
September 20 and on Septemer 27.
Lectures begin at 1:00pm starting
September 25, except for Septemer 27.
Lectures beginning at 3:10 end at 5:00pm, and lectures beginning at 1:00pm
end at 3:00pm.
If you send email to the instructor, please put "ISYE 6740" in the subject line.
``Machine learning'' refers to use of logical rules of induction and deduction along with data to identify salient properties of objects or processes, such as clusters, patterns, or trends. Machine learning is an important part of artificial intelligence, as well as other areas of data science.
``Statistical learning'' is essentially synonymous with machine learning, but the term ``statistical'' perhaps implies greater emphasis on data. This course will focus on modern methods of statistical data analysis.
We distinguish supervised learning, in which we seek to predict an outcome measure or class based on a sample of input measures, from unsupervised learning, in which we seek to identify and describe relationships and patterns among a sample of input measures. The emphasis in this course is on supervised learning, but the course addresses the elements of both supervised learning and unsupervised learning. It covers essential material for developing new statistical learning algorithms.
The text is An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, published by Springer-Verlag, 2013. ISBN 978-1-4614-7137-0. The website for the text is http://www.StatLearning.com/.
The software used in this course is R, which is a freeware package that can be downloaded from the Comprehensive R Archive Network (CRAN).
No prior experience in R is assumed for this course. A good site for getting started with R, especially for people who are somewhat familiar with SAS or SPSS, is Quick R.
Questions from students during lectures are encouraged, and of course questions after class in person or by email are welcomed.
Student work in the course (and the relative weighting of this work in the overall grade) will consist of
You are expected to take the exams during the designated time periods.
Students may discuss homework, but the work submitted must be the student's own work.
Because the available time for the class is not sufficient to cover all of even the most common methods of learning, a student may wish to do a project involving methods addressed in the text, but which are not covered in class.
The project will require a written report.
See also Section 2.3, beginning on page 42 of the text.
x<-6
y<-10
z<-sqrt(sqrt(x+y))
z
Here
is the R code I used to produce the infamous red and green plots in my
lecture. I have
inserted comments to tell you what I'm doing, but don't try to understand all
of the code.
Complete the questionaire and email it to me.
Assignment (due Thursday, Aug 30):
HW1. Exercises 2.1 and 2.8 in text
The dataset for 2.8 is at the website for the book. There is a PDF file for the whole book there also.
Show your R code.
Please email me your solutions. One PDF file is preferred.
Solutions, comments
TeX source
Assignment (due Frisday, Sep 7):
HW2. Exercises 3.1, 3.8, 3.11, and 3.14 in text
Show your R code.
Please email me your solutions in one PDF file.
Solutions, comments
Some notes about R functions for regression
Assignment (due Tuesday, Sep 11):
HW3. Exercises 4.2, 4.3, 4.4(a),(b),(c), and 4.10 in text
Show your R code.
Please email me your solutions in one PDF file.
Solutions, comments
Assignment (due Tuesday, Sep 18):
HW04. Exercises 5.2, 5.3, 5.4, and 5.5 in text
Show your R code.
Please email me your solutions in one PDF file.
Solutions, comments
Exam (through Chapter 4; lecture of September 7)
Kinds of questions to expect
Solutions, comments
Assignment (due Tuesday, Sep 25):
HW5. Exercises 6.3, 6.4, 6.5, and 6.8 in text
Show your R code.
Please email me your solutions in one PDF file.
Solutions, comments
Assignment (due Tuesday, Oct 9):
HW6. Exercises 7.1, 7.5, and 7.6 in text
Show your R code.
Please email me your solutions in one PDF file.
Assignment (due Thursday, Oct 18):
HW7. Exercises 8.2, 8.3, 8.4, and 8.8 in text
Show your R code.
Please email me your solutions in one PDF file.
Support vector machines (Chapter 9 in text)
Lecture notes
Assignment (due Thursday, Oct 25):
HW8. Exercises 9.1, 9.2, 9.3, and 9.7 in text
Show your R code.
Please email me your solutions in one PDF file.
Some stuff about linear algebra relevant to PCA
Assignment (due Tuesday, Oct 30):
HW9. Exercises 10.2, 10.3, 10.6, and 10.10 in text
Show your R code.
Please email me your solutions in one PDF file.
Review presentations (not in the order):
Other topics:
Continue with review presentations.
You should describe and state the objectives in analyzing the data. You can describe previous work on the dataset.
Then briefly describe the methods that you used. You should use at least two learning methods.
Describe how you proceeded with your analysis. You may want to describe your program(s) and show some code. (You do not have to use R.)
Describe your results and conclusions. There are two types of results for
your project.
One has to do with the analysis itself. What do your results show about the
data itself. (These are for the objectives in analyzing this particular dataset.)
The other type of result concerns the relative performance of the methods
you used. Which performed better? Was there any particular characteristic
of the data you analyzed that might make a particular method perform better?
How would you expect the methods you used to perform in similar learning
problems?
Any other discussion or conclusions and references, as relevant.