Question 3 (8 marks)
The file Party.csv contains data on a sample of 250 voters with tracked variables, including party preference (Party=1 or 0), Age, Female (gender), Married (marital status), Income (in thousands), Education (schooling years), and Religion (Religion=1: religious, and 0: non-religious).
- Estimate a logistic regression of Party on Age, Female, Married, Income, Education, and Religion with statsmodels. Discuss the significance of each coefficient & model fitness.
- Based on the results of Part 1, build the confusion matrix with (in-sample) prediction. Compute and discuss the predication accuracy, precision, and recall.
- Based on the results of Part 1, construct two groups of voters: Group A is formed by voters with over 75% of predicted probability to vote for Part 1 and Group B is formed by voters with over 75% of predicted probability to vote for Part 0. How many voters are in Group A? How many voters are in Group B? Find the 90% confidence interval of the mean income of these two group of voters. Comment on the results.
- Perform KMeans clustering with Age, Female, Income, Education, and Religion and use the Elbow curve to justify the optimal number of clusters is 3. Form 3 clusters and use the crosstab to check if the clustering outcome reflects party preference. Comment on the results.
Our Academic Assistance: service is all about doing research and being good at it. The more research one will do, the better the paper will turn out.