The arrangement is that you go in the lab at the start of the session and begin work straight away, picking up where you left off the previous session if necessary. Use the first five minutes of each lab to review the last two lectures presented. There should be a list of questions at the end of each lecture page. You can use these to test your knowledge.
It's understood people will progress through the exercises at their own speed. I don't expect everyone to be doing the same exercise at the same time.
People will be on hand (either me or a lab tutor) to answer questions.
If you've not had the lecture on k-means clustering by the time of your first lab, use the time to go through the website and look at the arrangements for assessment. Once you've had the lecture on k-means, proceed with the exercises below.
VAR1 VAR2 CLASS 1.713 1.586 0 0.180 1.786 1 0.353 1.240 1 0.940 1.566 0 1.486 0.759 1 1.266 1.106 0 1.540 0.419 1 0.459 1.799 1 0.773 0.186 1The problem is to predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of k-means clustering with 3 means (i.e., 3 centroids).
If you're learning programming for the first time this term, you may want to to solve this problem by hand-simulating the k-means clustering process. You'll need a big piece of paper for this but it shouldn't take more than an hour.
If you already have programming skills then you should aim to implement a k-means clustering program that uses 3 means. Ideally, you should construct the program from scratch using the specification from the lecture. If you're completely stuck, you can base your work on the very simple Java application you'll find here. But note that this program uses only 2 means, not 3. You'll need to modify it to get the desired result.
Once you've done all this, the next task is to modify the program so that you can set it to run with any value of means.
The final task is to modify the program so that it will automatically handle prediction tasks, such as the one above. You'll need to set things up so that your program can tell the difference between given values of data, and to-be-predicted values (e.g. classifications). It also need to be able to detect when the model has stabilized, and generate an appropriate prediction at that point.
If you complete all of this before week 5, you can go on to the decision tree exercise below. If you haven't finished these tasks by week 5, you should probably call a halt and move on to the main assignement at that point. But get advice from the lab tutor on this.
medium skiing design single twenties no -> highRisk high golf trading married forties yes -> lowRisk low speedway transport married thirties yes -> medRisk medium football banking single thirties yes -> lowRisk high flying media married fifties yes -> highRisk low football security single twenties no -> medRisk medium golf media single thirties yes -> medRisk medium golf transport married forties yes -> lowRisk high skiing banking single thirties yes -> highRisk low golf unemployed married forties yes -> highRiskInput attributes are (from left to right) income, recreation, job, status, age-group, home-owner.
medium flying banking married thirties yes -> lowRisk high speedway media single forties yes -> highRisk low golf transport married thirties yes -> medRisk