In this project, you will apply two algorithms to a contact lens data set. Answer each question in the order they appear. Do not skip to later steps to answer earlier questions that ask you to predict outcomes based on your analysis of the data and understanding of the algorithms.
Write a report containing your responses to the following:
Are the features of the data numerical or categorical?
A supervised learning problem is typically either a regression problem or a classification problem. Which kind of problem is the contact lens problem described in the data set?
Which kind of classification problem is the contact lens problem: binary or muilti-class?
How may binary classification problems can be derived from the data set?
For each binary classifcation problem you can derive from the data set, what is the minimum performance baseline?
Which attribute do you expect to be chosen as the split attribute at the root node?
Run a decision tree classifier on the data and report the results in a confusion matrix.
Extract a rule set from your decision tree.
Run boosted decision trees on the data set.
How did the boosted decision tree compare to the non-boosted decision tree?
Spark only supports random forests and gradient-boosted tree ensembles. Which one is a boosting method?
lenses.pdf file on Canvas as an attachment. When you’re ready, double-check that you have submitted and not just saved a draft.
Practice safe submission! Verify that your HW files were truly submitted correctly, the upload was successful, and that your program runs with no syntax or runtime errors. It is solely your responsibility to turn in your homework and practice this safe submission safeguard.
This procedure helps guard against a few things.