楼主: lzguo568
1696 3

[学科前沿] R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites [推广有奖]

已卖:1051份资源

学术权威

54%

还不是VIP/贵宾

-

TA的文库  其他...

统计入门

威望
3
论坛币
10213 个
通用积分
10704.0688
学术水平
2275 点
热心指数
2224 点
信用等级
1485 点
经验
-224 点
帖子
5788
精华
14
在线时间
3549 小时
注册时间
2010-8-31
最后登录
2025-9-14

初级学术勋章 初级热心勋章 中级热心勋章 中级学术勋章 初级信用勋章 中级信用勋章 高级学术勋章 高级热心勋章 特级热心勋章 特级学术勋章 高级信用勋章

楼主
lzguo568 在职认证  发表于 2016-6-30 08:53:16 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites [url=]
[/url]

136

        Tweet

Previous post

Next post





Tags: Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner,SQL



R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.

By Gregory Piatetsky, KDnuggets.

comments



The 16th annual KDnuggets Software Poll continued to get huge attention from analytics and data mining community and vendors, attracting about 2,800 voters, who chose from a record number of 93 different tools.

R is the most popular overall tool among data miners and data scientists, but Python usage grows faster and it is likely to catch up in 2-3 years. RapidMiner remains the most popular suite for data mining/data science, but it got fewer votes than last year. There was a notable increase in Hadoop/Big Data tool usage (29%, up from 17% in 2014), mainly driven by jump in Spark whose usage share grew over 3-fold. (see KDnuggets exclusive interview with Spark Creator Matei Zaharia). Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.

This report has 5 sections

The participation by region was: US/Canada (41.5%), Europe (38.4%), Asia (8.2%), Latin America (6.3%), Australia/NZ (3.1%), Africa/MidEast (2.5%).

Top Analytics Tools and TrendsHere are the top 10 tools by share of usage:



The top 10 tools by share of users were
  • R, 46.9% share ( 38.5% in 2014)
  • RapidMiner, 31.5% ( 44.2% in 2014)
  • SQL, 30.9% ( 25.3% in 2014)
  • Python, 30.3% ( 19.5% in 2014)
  • Excel, 22.9% ( 25.8% in 2014)
  • KNIME, 20.0% ( 15.0% in 2014)
  • Hadoop, 18.4% ( 12.7% in 2014)
  • Tableau, 12.4% ( 9.1% in 2014)
  • SAS, 11.3 (10.9% in 2014)
  • Spark, 11.3% ( 2.6% in 2014)


Compared to 2014 Analytics/Data Mining Software Poll, Tableau and Spark were newcomers to top 10, displacing Weka and Microsoft SQL Server.

The average number of tools jumped to 4.8, up from 3.7 in 2014 and 3.0 in 2013.

The distinction between commercial and free software is becoming harder to make, with many tools having both a free/community version and commercial/enterprise version. We classified each tool according to the primary type of the latest version, so we put RapidMiner in commercial category and KNIME in free software category.

Many vendors asked their users to vote in the poll and even tweet their vote, but we have not found any bot or illegal voting, and did not have to remove any votes.

This year, 91% of voters used commercial software and 73% used free software. About 27% used only commercial software, and only 9% used free-software. For the first time a majority of 64% used both free and commercial software, up from 49% in 2014.



Among tools with at least 10 votes, the highest increase in 2014 was for
  • H2O (0xdata), 1210% up, to 2.0% share (55 votes) from 0.2% in 2014
  • Actian, 345% up, to 2.0% (56 votes), from 0.5% in 2014
  • Spark, 326% up, to 11.3% (311), from 2.6% in 2014
  • MLlib, 228% up, to 3.3% (91), from 1.0% in 2014
  • Alteryx, 79% up, to 5.6% (155), from 3.1% in 2014
  • Python, 56% up, to 30.3% (837), from 19.5% in 2014
  • TIBCO Spotfire, 56% up, to 4.3% (119), from 2.8% in 2014
  • Pig, 54% up, to 5.4% (150), from 3.5% in 2014
  • SAS Enterprise Miner, 53% up, to 10.9% (302), from 7.2% in 2014
  • Splunk/Hunk, 49% up, to 1.1% (30), from 0.7% in 2014


Tools that showed at least 20% increases in their share for 2 years in the row are Alteryx, Hadoop, KNIME, Python, Qlikview, SAS Enterprise Miner, Tableau, and TIBCO Spotfire.

New analytics tools that received at least 20 votes in 2015 were
  • scikit-learn, 8.3% (229)
  • Microsoft Azure ML, 3.7% (102)
  • Microsoft Power BI, 3.6% (98)
  • IBM Watson Analytics, 2.1% (57)
  • Ayasdi, 2.0% (56)
  • Dataiku, 2.0% (56)
  • Lexalytics, 1.3% (35)
  • Vowpal Wabbit, 1.3% (35)
  • Microstrategy, 0.9% (24)
  • Amazon Machine Learning, 0.7% (20)




Among tools with at least 20 votes in 2014, the largest decline in 2015 was for these tools, which includes probably a combination of decline of popularity for free tools like Orange and lack of a voter drive for some of commercial tools this year.
  • Predixion Software, 90% down (0.4% share), from 3.7% in 2014
  • BayesiaLab, 86% down, to 0.6%, from 4.1%
  • Alpine Data Labs, 82% down, to 0.5% from 2.7%
  • Oracle Data Miner, 64% down, to 0.8% from 2.2%
  • RapidInsight/Veera, 60% down, to 0.2% from 0.5%
  • Revolution Analytics (now part of Microsoft), 57% down, to 4.0% from 9.1%
  • SAP (including former KXEN), 57% down, to 3.0% from 6.8%
  • Orange, 44% down to 1.9% from 3.4%
  • Gnu Octave, 41% down, to 2.3% from 3.9%




Hadoop/Big Data ToolsHadoop/Big Data tool usage jumped to 29% among voters, up from 17% in 2014, and 14% in 2013.

This is probably due to availability and low-cost of many cloud-based Big Data tools. Very notable is the jump in Spark share to 11.3%.

However, most data analysis is still done on "medium" and small data.

Top Hadoop/Big Data tools were
  • Hadoop, 18.4% share (507 votes)
  • Spark, 11.3% (311)
  • Hive, 10.2% (282)
  • SQL on Hadoop tools, 7.2% (198)
  • Pig, 5.4% (150)
  • HBase, 4.6% (127)
  • Other Hadoop/HDFS-based tools, 4.5% (125)
  • MLlib, 3.3% (91)
  • Mahout, 2.8% (76)
  • Datameer, 0.8% (23)



Deep Learning ToolsNew this year was a category of Deep Learning Tools, with most popular tools being:
  • Pylearn2 (55 users)
  • Theano (50)
  • Caffe (29)
  • Cuda-convnet (17)
  • Deeplearning4j (12)
  • Torch (27)



However, this category is growing rapidly and above list is incomplete, since the largest count in this category was for other Deep Learning tools (106)

See also










二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:RapidMiner Big data ignites python Rapid Software although growing overall popular

<img src="static/image/smiley/comcom/5.gif" class="vm&qu

沙发
huanghuiqun 发表于 2016-6-30 09:01:13

藤椅
waterhorse 发表于 2016-7-1 23:47:45
VERY GOOD AND INFORMATIVE< XIE XIE>

板凳
dcwang1233 发表于 2016-7-2 02:07:01
Rapidminer相當容易操作

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-30 20:37