Machine Learning - Course website

Chris Thornton

Introduction

This course teaches the theory and practice of machine learning using a mixture of demos, lectures and labs.

Assessment is based on one programming assignment and an unseen exam. Most of the syllabus material is in the online lecture notes (below), but note-taking and additional reading is strongly advised.

The first meeting for the course will be the first lecture in week 1.

Your first lab will your first scheduled lab session after the lecture on k-means clustering.

Lectures

Week 1
. Introduction to the topic data, records, tabulation, explicit v. implicit structure, common applications pdf
. Nearest-neighbour methods Euclidean v. city-block distance, normalization, sparsification, inductive inference pdf

Week 2
. k-means clustering agglomerative clustering, cluster hierarchies, centroids pdf
. Naive Bayes classifiers probabilities, conditional probabilities, probabilistic prediction, Bayes rule, dangers of the independence assumption pdf

Week 3
. Cross-validation classes and attributes, supervised v. unsupervised learning, classification v. regression, error rate, off-training set error, IID assumption pdf
. Knowledge test pdf

Week 4
. Information theory uncertainty, entropy measurement, information, optimal digital codes, redundancy, bits pdf
. Decision trees spherical v. rectangular shapes, bias mismatch, lookup tables pdf

Week 5
. Generalized decision-tree learning information gain, expected entropy, numeric data, thresholds, C4.5 pdf
. Knowledge test pdf

Week 6
. VC dimension over-fitting, dataset shattering, 3-point illustration pdf
. Perceptrons Linear separation of classes, inner products, boundaries from thresholds, error correction, delta rule, neural networks pdf

Week 7
. Multilayer perceptrons input, hidden and output units, network training, sigmoidal activation, error backpropagation pdf
. Knowledge test pdf

Week 8
. Support Vector machines max-margin classifiers, non-linear SVMs, the kernel trick pdf
. Machine discovery analogy and relational problems, BACON, structure-mapping pdf

Week 9
. Minimum description length variable independence, checkerboards, XOR, No Free Lunch theorems, Kolmogorov complexity,Occam's Razor pdf
. Knowledge test pdf

Week 10
. Student-led revision

. Demos

If you have questions or need extra help

If you have questions about the material, the best thing is to put a question to me during a lecture. I try to call for questions several times a session.

If you prefer, you can approach me at the end of a lecture.

If that doesn't work, you can talk to me (or a lab tutor) in your next lab.

If all else fails, find me at the end of a lecture or lab and ask to arrange a special meeting.

Don't send me questions by email. This takes far too long, and I usually can't answer such things anyway, for fear of creating unfairness.

Books

There is no single course text. The lecture notes define the syllabus and the online pages present most (but not all) of the content.

A good introduction to the area is included in Russell and Norvig `Artificial Intelligence: A Modern Approach'. The most up-to-date version of this is the 3rd edition.

Other, book-based introductions are

Alpaydin, 2010, An Introduction to Machine Learning (2nd Edition).
Principles of Data Mining (Hand et al., 2001). Oriented towards data mining but covering the usual ML territory. Written from a mathematical viewpoint.
Introduction to Data Mining (Tan, Steinbech and Kumar, 2006). Probably the most up-to-date and comprehensive general reference.

Web sites

o http://www.thearling.com/text/dmtechniques/dmtechniques.htm Nice overview of ML techniques and issues. There is also a complete tutorial at http://www.thearling.com/dmintro/dmintro_2.htm

See also online materials for the Tan text.

http://www-users.cs.umn.edu/~kumar/dmbook

There's a nice video on ML used for NLP at http://videolectures.net/icml09_smith_spn/

Programming assignment

This assessed assignment for the course must be submitted as a pdf file via Study Direct by 4pm Thursday week 8. The task is to apply machine learning to the so-called `primate-factors' dataset. The background is as follows.

Sometime in the early 1980s, an animal behaviour researcher studying the social behaviour of primates was killed in a car accident while returning from a field trip. In his belongings, police discovered several listings of data but the notebooks, assumed to contain details about the origins of the data, were all burned. The longest of the listings is reproduced at ex1-data.txt. This is in tabulated form and it is generally believed that each line records the values of several variables, followed by a value which is either 1 or 0. This is believed to be a classification value of some sort but the nature of the class has never been determined. The final 100 lines in the listing are missing this classification value.

The exercise involves (1) analysing the properties of the data, (2) implementing and then using one or more machine learning methods to derive predicted classification values for the final 100 cases and (3) submitting predictions online and (4) presenting the results of the experiment in a report.

Programs can be implemented in Java or any similar language, but not matlab.

Marks for the assessment will be awarded using the following scheme.

5% for the introduction to the topic of machine learning. (Approx half a page.)
5% for the introduction to the method(s) used in the experiment. (Approx half a page.)
10% for informative analysis of the training data. This can be based on either formal methods (such as analysis of correlation, density, variance etc) or informal methods (such as reasoned deduction about the probable nature of the researchers work). Ideally it should be based on both. The aim should be to show what assumptions can be made about the data, and explain how those assumptions might inform the choice or application of machine learning procedures. (Approx 2 pages.)
10% for presentation of results. This section should show what output your program produced and how it is to be interpreted. It may make use of tables and graphs to show how well the program performs. (Approx 3 pages.)
10% for level of functionality in the program.
10% for program design.
10% for concluding section and appendices. The conclusion should discuss any limitations of your approach. There should be a self-assessment (in terms of the specified credit factors) and a full bibliography. You must directly acknowledge any quoted material. The code and other relevant data (e.g., graphs) should be included as appendices (but javadoc listings are not required).
30% for the accuracy of the predicted classification values. Marks for this component will be normalised appropriately. The most accurate predictions in the group will score the full 30%. Other marks will be scaled accordingly.
10% for material that goes beyond the spec, e.g., evaluation of performance of multiple algorithms.

To recap, your report should have a general introduction, an introduction to the method used, analysis of the training data, a section on results plus supporting materials (e.g., the code). There should also be a section on limitations and a self-assessment (in terms of the given credit factors). You will get zero marks for any component missing from your report so it's a good idea to have one section of your report per credit factor. The page requirements are a guide only and are based on use of a normal sized print font (e.g., 11 point).

The full report should be submitted as a pdf document via Study Direct by end of Thursday week 8. As part of that submission, the predicted output values should be saved as a .txt file, with a name that is your last name followed by your first name. So if it was me, I would save the file as ThorntonChris.txt. The pdf file should have the same name, but with .pdf rather than .txt added. The file should contain the predicted classification value for the 100 test cases in the right order, with each value on a separate line. Viewed using WordPad or Notepad, the file should look something like ThorntonChris.txt.

General guidelines for the writing of program reports are below. However, the section requirements above should be treated as definitive.

Usual departmental penalties apply for late submissions.

Students resitting the course should make sure they implement a different method this time around.

Guidelines for producing the report

These are some general guidelines for writing the kind of report needed here.

1. INTRODUCTION

This should be a general introduction to the theory, techniques and general area of work that the program relates to.

2. ANALYSIS OF THE TRAINING DATA

There is no `one right answer' for this section. Your analysis could be fairly formal, involving statistical analyses of correlation etc., possibly making use of Excel scattergrams etc. Or it could be more informal and rely on visual inspection of the data and the making of intelligent assumptions about the task from which the data were obtained.

3. RESULTS

This is where you present the results of your study, e.g., levels of training and cross-validation error achieved by your method(s). You may also want to show some sample predicted outputs. You may want to talk about how your program(s) deal with the anomalous cases in the data. If you've used the kNN method and employed cross-validation to determine the best value of k, this is where you would present a graph relating values of k to cross-validation error.

4. ILLUSTRATION - Illustrations of your program(s) working. This could be based on an edited output listing, with annotations to show what is going on. If the program uses a GUI, it could be based on a sequence of annotated screenshots. (If necessary `screenshots' can be drawn by hand.) However, Edit out repetitive chunks from any program output and edit in explanatory comments. Annotate the most significant bits of the output to draw attention to them.

5. SYSTEM OVERVIEW - Explain precisely HOW the program works.

You should discuss the main steps the program goes through to achieve its results. You do not need to describe every line of the program. You should give an overview of what sorts of things are represented, how they are represented, where input comes from, how it is transformed, where output goes to, etc. The overview should refer to a copy of the program attached as appendix-1. (See below) In particular make sure you describe what kinds of objects are represented, and how you represent them (e.g. using lists, or vectors, or numerical values for variables or whatever). Explain what problems your program had to solve, and how it solved them. Use figures and diagrams productively so as to back up the text. Do not rely on an automatic documentation facility (e.g., javadoc) to write the system overview for you.

When writing the overview, don't write as if you are communicating with a tutor. Equally don't write as if for a novice. Try to write for an audience consisting of fellow students who may be a week or two behind you in their understanding. When describing what a method does, always adhere to the principles of modularity, i.e., start by stating what sorts of inputs the method takes, and what sorts of output it produces. Give an explanation of the relation between input and output, and one or two simple examples to illustrate. If there are no inputs, or no outputs, then say so.

5. CONCLUSIONS

This is where you should assess what you have achieved and discuss any limitations or problems. It's also the place to talk about potential future developments. Don't be afraid to criticise your own program. If you have read about similar programs, or related programs, it may be appropriate to include some comparisons. What lessons, if any have been learnt? Were your goals achieved? Is there anything you now think you should have done differently?

6. SELF-ASSESSMENT

This is where you present a formal evaluation of the whole submission in terms of the specified marking criteria. For each criterion, note how well you think you've done (and why) and give yourself a mark.

7. BIBLIOGRAPHY

List books, articles, web pages, files etc. considered to be relevant.

8. CODE APPENDIX - The whole program, with comments explaining what the classes represent, what the methods do, what the global variables (if any) are for, etc.

References

Alpaydin, E. (2010). INTRODUCTION TO MACHINE LEARNING (2nd Edition). Cambridge, Massachusetts: MIT Press.

Hand, D., Mannila, H. and Smyth, P. (2001). PRINCIPLES OF DATA MINING. Cambridge, Mass.: MIT Press.

Tan, P., Steinbech, M. and Kumar, V. (2006). INTRODUCTION TO DATA MINING. Boston: Pearson/Addison Wesley.