发帖

楼主: oliyiyi

1089 1

Introducing xda: R package for exploratory data analysis [推广有奖]

1关注
185
粉丝

版主

已卖：3000份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

计量文库

0%

威望: 7 级
论坛币: -140545 个
通用积分: 31676.0721
学术水平: 1454 点
热心指数: 1573 点
信用等级: 1364 点
经验: 384234 点
帖子: 9629
精华: 66
在线时间: 5508 小时
注册时间: 2007-5-21
最后登录: 2025-7-8

楼主

oliyiyi 发表于 2016-6-19 07:30:52 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

(This article was first published on R Language – the data science blog, and kindly contributed to R-bloggers)

This R package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. You can install the package from GitHub.

The functions currently included in the package are mentioned below:

numSummary(mydata) function automatically detects all numeric columns in the dataframe mydata and provides their summary statistics
charSummary(mydata) function automatically detects all character columns in the dataframe mydata and provides their summary statistics
Plot(mydata, dep.var) plots all independent variables in the dataframe mydata against the dependant variable specified by the dep.var parameter
removeSpecial(mydata, vec) replaces all special characters (specified by vector vec) in the dataframe mydata with NA
bivariate(mydata, dep.var, indep.var) performs bivariate analysis between dependent variable dep.var and independent variable indep.var in the dataframe mydata

Installation

To install the xda package, devtools package needs to be installed first. To install devtools, please follow instructions here.

Then, use the following commands to install xda:

library(devtools) install_github("ujjwalkarn/xda") Usage

For all examples below, the popular iris datasethas been used. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).

library(xda) ## to view a comprehensive summary for all numeric columns in the iris dataset numSummary(iris) ## n = total number of rows for that variable ## miss = number of rows with missing value ## miss% = percentage of total rows with missing values ((miss/n)*100) ## 5% = 5th percentile value of that variable (value below which 5 percent of the observations may be found) ## the percentile values are helpful in detecting outliers
[color=rgb(255, 255, 255) !important]

## to view a comprehensive summary for all character columns in the iris dataset charSummary(iris) ## n = total number of rows for that variable ## miss = number of rows with missing value ## miss% = percentage of total rows with missing values ((n/miss)*100) ## unique = number of unique levels of that variable ## note that there is only one character column (Species) in the iris dataset

## to perform bivariate analysis between 'Species' and 'Sepal.Length' in the iris dataset bivariate(iris,'Species','Sepal.Length') ## bin_Sepal.Length = 'Sepal.Length' variable has been binned into 4 equal intervals (original range is [4.3,7.9]) ## for each interval of 'Sepal.Length', the number of samples from each category of 'Species' is shown ## i.e. 39 of the 50 samples of Setosa have Sepal.Length is in the range (4.3,5.2], and so on. ## the number of intervals (4 in this case) can be customized (see documentation)
[color=rgb(255, 255, 255) !important]

## to plot all other variables against the 'Petal.Length' variable in the iris dataset Plot(iris,'Petal.Length')

[color=rgb(255, 255, 255) !important]

The package is constantly under development and more functionalities will be added soon. Will also add this to CRAN in the coming days. Pull requests to add more functions are welcome!

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Introducing Exploratory Analysis package Analysi package

Introducing xda: R package for exploratory data analysis [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

Introducing xda: R package for exploratory data analysis [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群