2 Classification vs. Prediction predicts categorical class labelsclassifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new dataPrediction:models continuous-valued functions, i.e., predicts unknown or missing valuesTypical Applicationscredit approvaltarget marketingmedical diagnosistreatment effectiveness analysis
3 Classification—A Two-Step Process Step 1 - Model constructiondescribe a set of predetermined classesEach tuple/sample is assumed to belong to a predefined class, as determined by the class label attributeThe set of tuples used for model construction is the training setThe model is represented as classification rules, decision trees, or mathematical formulaeStep 2 - Model usageEstimate accuracy of the modelThe known label of test sample is compared with the classified result from the modelAccuracy rate is the percentage of test set samples that are correctly classified by the modelTest set is independent of training setUse model to classify future or unknown objects
5 Classification Process (1): Model Construction AlgorithmsTrainingDataClassifier(Model)IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’
6 Classification Process (2): Use the Model in Prediction ClassifierAccuracy != 100TestingDataUnseen Data(Jeff, Professor, 4)Tenured?
7 Lazy vs. Eager Learning Lazy vs. eager learning Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tupleEager learning (eg. Decision trees, SVM, NN): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classifyLazy: less time in training but more time in predictingAccuracyLazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target functionEager: must commit to a single hypothesis that covers the entire instance space
8 Lazy vs. Eager Learning Its very similar to a Desktop!! Eager Survey beforeLazyDesktopIts very similar to aDesktop!!
9 Lazy Learner: Instance-Based Methods Instance-based learning:Store training examples and delay the processing (“lazy evaluation”) until a new instance must be classifiedTypical approachesk-nearest neighbor approachInstances represented as points in a Euclidean space.Locally weighted regressionConstructs local approximation
12 The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space.The nearest neighbor are defined in terms of Euclidean distance.The target function could be discrete- or real- valued.For discrete-valued, the k-NN returns the most common value among the k training examples nearest to xq.Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples..__.__+..+._xq+._+
24 ทางเลือกค่าความผิดพลาด ทางเลือกนิยามค่าความผิดพลาดที่เป็นไปได้ 3 แบบคือSquared error over k Nearest NeighborsDistance-weighted squared error over the entire set D of training dataCombine 1 and 2หมายเหตุ K คือ เคอร์เนลฟังก์ชั่น หรือ ฟังก์ชั่นผกผัน (inverse function) กับระยะห่าง ใช้เพื่อ กำหนดน้ำหนักสำหรับตัวอย่างสอนแต่ละตัว
29 x = <humidity, temperature> Euclidean DistanceDiscrete valuesHumiditytemperatureRun3025+4840-8064285060x = <humidity, temperature>New instance xq = <40, 30, run=?? > We can run inside(+) or outside (-)1-NN (x1)Answer run inside(+)2-NN (x1,x4)3-NN (x1,x2,x4)Answer run inside (+)4-NN (x1,x2,x4,x5)5-NNAnswer run inside(-)