11. In this problem, you will develop a model to predict whether a given
car gets high or low gas mileage based on the Auto data set.
(a) Create a binary variable, mpg01, that contains a 1 if mpg contains
a value above its median, and a 0 if mpg contains a value below
its median. You can compute the median using the median()
function. Note you may find it helpful to use the data.frame()
function to create a single data set containing both mpg01 and
the other Auto variables.
172 4. Classification
(b) Explore the data graphically in order to investigate the association between mpg01 and the other features. Which of the other
features seem most likely to be useful in predicting mpg01? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings.
(c) Split the data into a training set and a test set.
(d) Perform LDA on the training data in order to predict mpg01
using the variables that seemed most associated with mpg01 in
(b). What is the test error of the model obtained?
(e) Perform QDA on the training data in order to predict mpg01
using the variables that seemed most associated with mpg01 in
(b). What is the test error of the model obtained?
(f) Perform logistic regression on the training data in order to predict mpg01 using the variables that seemed most associated with
mpg01 in (b). What is the test error of the model obtained?
(g) Perform KNN on the training data, with several values of K, in
order to predict mpg01. Use only the variables that seemed most
associated with mpg01 in (b). What test errors do you obtain?
Which value of K seems to perform the best on this data set?
在这个问题中,您将开发一个模型来预测是否一个给定的
汽车得到高或低油耗根据汽车数据集。
(一)创建一个二进制变量,mpg01,包含一个1英里/加仑是否包含
价值高于中位数,如果mpg包含一个值低于0
它的值。您可以使用中值计算中位数()
功能。请注意您可能会发现它有助于使用data.frame()
函数来创建一个包含mpg01和数据集
其他汽车变量。
172 4。分类
(b)探索数据图形化,以调查mpg01和其他特性之间的关系。而其他的
功能似乎最有可能是有用的在预测mpg01吗?散点图和箱线图可能有用的工具来回答这个问题。描述你的结果。
(c)将数据分为训练集和测试集。
(d)对训练数据以执行LDA预测mpg01
使用的变量似乎大多数与mpg01有关
(b)。获得的测试误差的模型是什么?
(e)对训练数据以执行QDA预测mpg01
使用的变量似乎大多数与mpg01有关
(b)。获得的测试误差的模型是什么?
(f)对训练数据以执行逻辑回归预测mpg01使用的变量似乎是最相关的
(b)mpg01。获得的测试误差的模型是什么?
(g)对训练数据执行法,与几个K值,
为了预测mpg01。只使用的变量似乎最大
在(b)与mpg01。你获得什么测试错误?
的K值似乎对这个数据集执行最好的吗?