楼主: 颜羽瑶
1551 2

[学习分享] Scagnostics JMP Add-in – A New Way to Explore your Data [推广有奖]

  • 0关注
  • 1粉丝

已卖:79份资源

博士生

21%

还不是VIP/贵宾

-

威望
0
论坛币
282 个
通用积分
1.3110
学术水平
7 点
热心指数
6 点
信用等级
7 点
经验
1828 点
帖子
176
精华
0
在线时间
48 小时
注册时间
2014-5-15
最后登录
2017-6-1

楼主
颜羽瑶 发表于 2014-8-17 21:13:30 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Scagnostics, scatterplot diagnostics, was discovered by John and Paul Tukey and later popularized by Leland Wilkinson in Graph-Theoretic Scagnostics (2005). These analyses were redefined in High-Dimensional Visual Analytics: Interactive Exploration Guided by Pairwise Views of Point Distributions (2006).

The beauty of scagnostics is the ability to visually explore a dataset. JMP has the inherent feature called Scatterplot Matrix (SPLOM), which allows the user to simultaneously compare the relationship between many pairs of variables.

However, SPLOMs lose their effectiveness when the number of variables get too large. Figure 1 shows a portion of the SPLOM report.

Figure 1. SPLOM for Drosophila Aging Data



We look to explore the Drosophila Aging data with 48 observations and 100 numeric variables.  Notice in Figure 1 the substantial number of variables in this dataset. This can be overwhelm and our ability to visually observe the data is flawed. In Figure 1, only about 15% of the actual SPLOM is shown. In a world where our datasets are growing every day, it is imperative to be able to extract meaningful information from the relationship between our variables. That’s where scagnostics comes in! Scagnostics assesses five aspects of scatterplots: outliers, shape, trend, density, and coherence.

This summer, I had the privilege of writing a JMP add-in (downloaded here with a free SAS profile) that allows the user to interactively explore data using nine graph-theoretic measures.  The add-in combines three current features of JMP: Distribution, Scatterplot Matrix, and Graph Builder. Each point in the scatterplot represents a 2D scatterplot. When the user selects a point in the scatterplot matrix in the bottom left, Graph Builder shows the respective scatterplot for the two variable in the bottom right.

As an example, one point has already been selected in the SPLOM in Figure 2. The corresponding variables are log2in_Tsp42Ej and log2in_CG6372. For this pair of variables, there are two discernible clusters of data. This is noted in a high Clumpy value.

Figure 2. Scagnostics for Drosophila Aging Data – Clumpy Example



Figure 3 below shows us that if we select a point with a high monotonic value, we can observe a clear association and a strong linear relationship between the variables,  log2in_alpha_Cat and log2in_CG3430der.

Figure 3. Scagnostics for Drosophila Aging Data – Monotonic Example



Another key aspect of Scagnostics is outlier detection. Review the Graph Builder plot in Figure 4 below. When we inspect the two variables log2in_CG18178 and log2in_BcDNA_GH04120, we see two data points that visually appear to be outliers. Results with a substantial outlying value, as well as a relatively high skewed value, support the notion that this pair of variables has major outliers overall.

Figure 4. Scagnostics for Drosophila Aging Data – Outlying Example



As we compare the original SPLOM report in Figure 1 to the recursive SPLOM and Graph Builder reports in Figures 2, 3, and 4, we uncover much more informative and enlightening analyses.

Now it’s time to download the Scagnostics add-in and begin your own exploration!


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Agnostic Explore Add-in Xplore Data relation explore ability compare Matrix

沙发
easyspring 发表于 2014-10-10 10:07:46
非常感谢楼主的分享

藤椅
颜羽瑶 发表于 2014-10-10 22:04:12
easyspring 发表于 2014-10-10 10:07
非常感谢楼主的分享
[victory]

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-7 06:48