You are a data scientist in a telecom company. Retaining customers is more profitable than losing customers and acquiring new customers to replace them, so your company wants to be able to identify customers who are likely to churn in order to target special offers to them.

In this project, you will apply several algorithms to a customer churn data set. Answer each question in the order they appear. Do not skip to later steps to answer earlier questions that ask you to predict outcomes based on your analysis of the data and understanding of the algorithms.

Telecom churn data set from Duke Teradata Center, via Kaggle.

- SVM
- Neural networks
- PCA
- Clustering

Write a report with brief discussions of the following questions and issues.

What is the dimensionality of the data?

What form do the data take (numerical, categorical)?

Run support vector machines on the churn calibration data using polynomial and RBF kernels.

Which kernel works better? Why?

Run PCA to reduce the dimensionality of the churn calibration data and run SVMs on the reduced data as you did for the original data.

How many principal components did you pick? Why? How much of the variance in the churn data is described by the principal components you chose?

How did the SVMs perform on the reduced data compared to the original data? Why?

Run a neural network algorithm on the full data set and PCA-reduced data.

Experiment with different network structures (e.g. extra hidden layers, extra units). Report the results in graphs that show training time (epochs) versus error rate or accuracy.

Which network structures result in the most overfitting?

How is the performance (accuracy, training time) of the neural network affected by dimensionality-reduction?

How does the neural network performance compare to the SVMs in terms of accuracy, training time, and need for dimensionality reduction?

Use PCA to reduce the dimensionality of the churn calibration data to visualize the data set in two and three dimensions.

How much of the variance in the data is described by the first two or three principal components?

What does the visualization tell you about the data? Can you pick out any clusters or get a feel for how many clusters there are?

Run a decision tree on the PCA-reduced data set and extract rules for identifying a churning customer.

Choose a k and cluster the data without the churn labels.

Are the clusters good? Why?

Are the clusters consistent with the churn labels, that is, all churn or all no-churn for each cluster member?

Use the clusters as labels for the churn data, run decision tree algorithm, extract rules from the data, and give descriptive names to the labels/clusters. You may use the original churn data or use dimensionality-reduced churn data.

Is there a cluster with many churning customers and a few non-churning customers? If so, what does that suggest about the non-churning customers in that cluster and how would you recommend your company use this information (e.g., in the form of the rules you extracted from the clusters in the previous step)?

Is there a reason to prefer dimensionality-reduced data over the original data?

Submit your report, titled `<loginID>-churn.pdf`

, as an attachment to the assignment on Canvas.