请选择 进入手机版 | 继续访问电脑版
楼主: oliyiyi
814 3

Big Data Key Terms, Explained [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
272151 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5467 小时
注册时间
2007-5-21
最后登录
2024-4-16

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2016-8-12 09:28:56 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.

By Matthew Mayo, KDnuggets.

Big Data. If somehow you've made it to this website and have not heard the term since it first gained momentum toward becoming a popular term at least a decade and a half ago, I really don't know what to say.

But just because one has heard the term, or has taken part in (or opposed) its flippant usage, that really doesn't mean one knows what it actually means, or what it fully encompasses. Indeed, trying to exhaustively describe what Big Data is in a single post would be nonsensical, not the least of which reason being that there is no agreed-upon exhaustive description, nor should there be. Collecting some key terms associated with Big Data is not a bad idea, however, as it lays a common foundation from which to work forward.



This post will function slightly differently than other Key Term posts previously presented on KDnuggets, in that it will include a number of definitions for which Key Term posts already exist. In these cases, the definitions will link to these posts for further related key terms to explore; for remaining terms, links will point to associated KDnuggets tags and/or searches in order to facilitate continued investigation.

So, while there may be bigger Big Data experts out there, I humbly offer the following concisely-defined terminology as an entry-level base for Big Data. We begin with the white whale itself.

1. Big Data

There are all sorts of popular and academic articles available defining Big Data, and the definitions vary considerably. In the interest of capturing the term's essence, we like this definition: "Data is big when data size becomes part of the problem." Big Data is a moving target, and this definition provides the flexibility required to capture its central characteristic.

Big Data is often characterized by the (originally) 3 Vs, which has grown to 4, 5, 6, or more, depending on where you look. I believe the following 6 Vs are enough to explain Big Data at a very high level.

Trying to freshly define the Vs of Big Data seems futile, and so we turn to the ever-authoritative Wikipedia for definitions:

  • Big Data Volume

    Volume refers to the quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can actually be considered big data or not.
  • Big Data Velocity

    Velocity is the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
  • Big Data Variety

    Variety refers to the type and nature of the data. This helps people who analyze it to effectively use the resulting insight.
  • Big Data Veracity

    Veracity is the quality of captured data, which can vary greatly, affecting accurate analysis.
  • Big Data Variability

    Variability is the inconsistency of the data set, which can hamper processes to handle and manage it.
  • Big Data Value

    Value is tossed around as an important Big Data V from time to time, and I agree with its consideration, especially from a business standpoint. Value refers to what insight can be leveraged from the patterns, processing, and other Big Data-related tasks on the data in question.

8. Cloud Computing

Cloud computing, or what is simply referred to as the cloud, can be defined as an Internet-based computing model that largely offers on-demand access to computing resources. These resources comprise of many things, such as application software, computing resources, servers and data centers etc. Cloud service providers usually adopt a ‘pay-as-you-go’ model, something that allows companies to scale their costs as per need. It allows businesses to bypass infrastructural setup costs, which was inevitable prior to the advent of the cloud.



(From Kaushik Pal's Cloud Computing Key Terms, Explained)

9. Predictive Analytics

Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.

(From Eric Siegel's Predictive Analytics Introductory Key Terms, Explained)

Predictive analytics employ predictive models:

A mechanism that predicts a behavior of an individual, such as click, buy, lie, or die. It takes characteristics (variables) of the individual as input and provides a predictive score as output. The higher the score, the more likely it is that the individual will exhibit the predicted behavior.



(From Eric Siegel's Predictive Analytics Introductory Key Terms, Explained)

10. Descriptive Analytics

This form of analytics is descriptive in nature, as its name clearly hints at. Descriptive analytics summarizes data, focusing less on the precise details of every piece of data, and instead focusing on an overall narrative.

11. Prescriptive Analytics

Prescriptive analytics generally follows prediction, in that actions can be prescribed based on what has been gleaned from predictive modeling.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Explained Big data explain Terms plain becoming because current getting popular

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
oliyiyi 发表于 2016-8-12 09:29:22 |显示全部楼层 |坛友微信交流群
12. Database

Data needs to be curated, coddled, and cared for. It needs to be stored and processed, so that it may be transformed into information, and further refined into knowledge. The mechanism for storing data, subsequently facilitating these transformations, is, clearly, the database.

13. Data Warehouse

Data warehouse is another potentially elusive term. Han, Kamber & Pei define a data warehouse as data storage architectures which allow for "business executives to systematically organize, understand, and use their data to make strategic decisions." Vague, to be sure, but generally speaking, a data warehouse exhibits these characteristics:

maintained separately from an organization's operational and transactional databases, which are characterized by frequent access and are used by day-to-day organizational operations
allow for the integration of various disparate application systems
house and allow access to consolidated historical data for processing and analysis
Data warehouse

Bill Inmon, the Godfather of the Data Warehouse, gave this original and lasting definition, with which we will conclude:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

14. ETL

ETL stands for Extract, Transform and Load. This is the process of extracting data from source systems, such as transactional databases, and placing it into data warehouses. If you are familiar with online transactional processing (OLTP) and online analytical processing (OLAP), ETL can be thought of as the bridge between these 2 system types.

15. Business Intelligence

And perhaps the most ambiguous term of all (an incredible feat in a set of Big Data terminology definitions) is business intelligence (BI). BI is an unstable, ill-defined set of tools, technologies, and concepts which support business by providing historical, current, and predictive views on its operations. The relationship between BI and data mining, in particular, is a curious one, with various definitions proposing that: BI is a subset of data mining; data mining is a subset of BI, BI is driven by data mining; BI and data mining are separate and mutually exclusive. So, that settles that.

In the age of data science and Big Data, BI is generally thought to include OLAP, competitive intelligence, benchmarking, reporting, and other business management approaches (all of which tend toward ambiguity in definition as well), and is heavily influenced by the dashboard culture.

16. Apache Hadoop

Apache's Hadoop could almost single-handedly be responsible for the rise of the Big Data Revolution, at least from a software point of view.

Hadoop

Apache Hadoop is an open-source framework for processing large volume of data in a clustered environment. It uses simple MapReduce programming model for reliable, scalable and distributed computing. The storage and computation both are distributed in this framework.

(From Kaushik Pal's Hadoop Key Terms, Explained)

17. Apache Spark

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics, with APIs in Java, Scala, Python, R, and SQL. Spark runs programs up to 100x faster than Apache Hadoop MapReduce in memory, or 10x faster on disk. It can be used to build data applications as a library, or to perform ad-hoc data analysis interactively. Spark powers a stack of libraries including SQL, DataFrames, and Datasets, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. You can combine these libraries seamlessly in the same application. As well, Spark runs on a laptop, Apache Hadoop, Apache Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Apache Cassandra, Apache HBase, and S3.

(From Denny Lee and Jules Damji's Apache Spark Key Term's, Explained)

18. Internet of Things

The Internet of Things (IoT) is a growing source of Big Data moving forward. IoT is:

The concept to allow internet based communications to happen between physical objects, sensors, and controllers.

(From Geethika Bhavya Peddibhotla's Internet of Things Key Terms, Explained)

19. Machine Learning

Machine learning can be employed for predictive analysis and pattern recognition in Big Data. According to Mitchell, machine learning is "concerned with the question of how to construct computer programs that automatically improve with experience." Machine learning is interdisciplinary in nature, and employs techniques from the fields of computer science, statistics, and artificial intelligence, among others. The main artefacts of machine learning research are algorithms which facilitate this automatic improvement from experience, algorithms which can be applied in such diverse fields as computer vision, artificial intelligence, and data mining.

20. Data Mining

Fayyad, Piatetsky-Shapiro & Smyth define data mining as "the application of specific algorithms for extracting patterns from data." This demonstrates that, in data mining, the emphasis is on the application of algorithms, as opposed to on the algorithms themselves. We can define the relationship between machine learning and data mining as follows: data mining is a process, during which machine learning algorithms are utilized as tools to extract potentially-valuable patterns held within datasets.

使用道具

h2h2 发表于 2016-8-12 11:23:15 |显示全部楼层 |坛友微信交流群
谢谢分享

使用道具

william9225 学生认证  发表于 2016-8-12 14:03:29 来自手机 |显示全部楼层 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-17 04:56