人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › Programming Hive

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

12 下一页

发帖

楼主: Lisrelchen

2681 16

Programming Hive [推广有奖]

0关注
62粉丝

VIP

院士

67%

还不是VIP/贵宾

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望: 0 级
论坛币: 49957 个
通用积分: 79.5487
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2015-3-17 23:06:57 |只看作者 |坛友微信交流群|倒序 |AI写论文

相似文件

换一批

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Programming Hive

Data Warehouse and Query Language for Hadoop

Book Description
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQLdialect - HiveQL - to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You'll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.
Book Details

Publisher: O'Reilly Media
By: Edward Capriolo, Dean Wampler, Jason Rutherglen
ISBN: 978-1-4493-1933-5
Year: 2012
Pages: 352
Language: English
File size: 8.7 MB
File format: PDF
Download:
本帖隐藏的内容
Programming Hive.rar (6.12 MB, 需要: 20 个论坛币) 本附件包括：
Programming Hive.pdf
2015-4-8 10:48:33 上传

需要: 20 个论坛币 [购买]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Programming Program gram Ming Hive summarize provides learn

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126
· Data Science NewOccidental|主题: 1233, 订阅: 120

使用道具举报

沙发

jerker 发表于 2015-3-17 23:11:03 |只看作者 |坛友微信交流群

Alter Database
You can set key-value pairs in the DBPROPERTIES associated with a database using the ALTER DATABASE command. No other metadata about the database can be changed, including its name and directory location:
hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');
There is no way to delete or “unset” a DBPROPERTY.

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

藤椅

Elena3 发表于 2015-3-17 23:23:45 |只看作者 |坛友微信交流群

Creating Tables
The CREATE TABLE statement follows SQL conventions, but Hive’s version offers significant extensions to support a wide range of flexibility where the data files for tables are stored, the formats used, etc. We discussed many of these options in Text File Encoding of Data Values and we’ll return to more advanced options later in Chapter 15. In this section, we describe the other options available for the CREATE TABLEstatement, adapting the employees table declaration we used previously in Collection Data Types:
CREATE TABLE IF NOT EXISTS mydb.employees (
name STRING COMMENT 'Employee name',
salary FLOAT COMMENT 'Employee salary',
subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
deductions MAP<STRING, FLOAT>
COMMENT 'Keys are deductions names, values are percentages',
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
COMMENT 'Home address')
COMMENT 'Description of the table'
LOCATION '/user/hive/warehouse/mydb.db/employees'
TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...);
First, note that you can prefix a database name, mydb in this case, if you’re not currently working in the target database.
If you add the option IF NOT EXISTS, Hive will silently ignore the statement if the table already exists. This is useful in scripts that should create a table the first time they run.
The clause has a gotcha you should know. If the schema specified differs from the schema in the table that already exists, Hive won’t warn you. If your intention is for this table to have the new schema, you’ll have to drop the old table, losing your data, and then re-create it. Consider if you should use one or more ALTER TABLE statements to change the existing table schema instead. See Alter Table for details.

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

板凳

suyouwoko 发表于 2015-3-18 02:28:44 |只看作者 |坛友微信交流群

Partitioned, Managed Tables
The general notion of partitioning data is an old one. It can take many forms, but often it’s used for distributing load horizontally, moving data physically closer to its most frequent users, and other purposes.
Hive has the notion of partitioned tables. We’ll see that they have important performance benefits, and they can help organize data in a logical fashion, such as hierarchically.
We’ll discuss partitioned managed tables first. Let’s return to our employees table and imagine that we work for a very large multinational corporation. Our HR people often run queries with WHERE clauses that restrict the results to a particular country or to a particular first-level subdivision (e.g., state in the United States or province in Canada). (First-level subdivision is an actual term, used here, for example: http://www.commondatahub.com/state_source.jsp.) We’ll just use the word state for simplicity. We have redundant state information in the address field. It is distinct from the state partition. We could remove the state element from address. There is no ambiguity in queries, since we have to use address.state to project the value inside the address. So, let’s partition the data first by country and then by state:
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);
Partitioning tables changes how Hive structures the data storage. If we create this table in the mydb database, there will still be an employees directory for the table:
hdfs://master_server/user/hive/warehouse/mydb.db/employees

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

报纸

suyouwoko 发表于 2015-3-18 02:29:16 |只看作者 |坛友微信交流群

Changing Columns
You can rename a column, change its position, type, or comment:
ALTER TABLE log_messages
CHANGE COLUMN hms hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp'
AFTER severity;
You have to specify the old name, a new name, and the type, even if the name or type is not changing. The keyword COLUMN is optional as is the COMMENT clause. If you aren’t moving the column, the AFTER other_column clause is not necessary. In the example shown, we move the column after the severity column. If you want to move the column to the first position, use FIRST instead of AFTER other_column.
As always, this command changes metadata only. If you are moving columns, the data must already match the new schema or you must change it to match by some other means.

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

地板

bailihongchen 发表于 2015-3-18 12:58:59 |只看作者 |坛友微信交流群

Adding Columns
You can add new columns to the end of the existing columns, before any partition columns.
ALTER TABLE log_messages ADD COLUMNS (
app_name STRING COMMENT 'Application name',
session_id BIGINT COMMENT 'The current session id');
The COMMENT clauses are optional, as usual. If any of the new columns are in the wrong position, use an ALTER COLUMN table CHANGE COLUMNstatement for each one to move it to the correct position.

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

7楼

bailihongchen 发表于 2015-3-18 13:00:02 |只看作者 |坛友微信交流群

Dropping Tables
The familiar DROP TABLE command from SQL is supported:
DROP TABLE IF EXISTS employees;
The IF EXISTS keywords are optional. If not used and the table doesn’t exist, Hive returns an error.
For managed tables, the table metadata and data are deleted.
Note
Actually, if you enable the Hadoop Trash feature, which is not on by default, the data is moved to the .Trash directory in the distributed filesystem for the user, which in HDFS is /user/$USER/.Trash. To enable this feature, set the property fs.trash.interval to a reasonable positive number. It’s the number of minutes between “trash checkpoints”; 1,440 would be 24 hours. While it’s not guaranteed to work for all versions of all distributed filesystems, if you accidentally drop a managed table with important data, you may be able to re-create the table, re-create any partitions, and then move the files from .Trash to the correct directories (using the filesystem commands) to restore the data.
For external tables, the metadata is deleted but the data is not.

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

8楼

leejwen 发表于 2015-4-7 14:52:09 |只看作者 |坛友微信交流群

Alter Storage Properties
There are several ALTER TABLE statements for modifying format and SerDe properties.
The following statement changes the storage format for a partition to be SEQUENCEFILE, as we discussed in Creating Tables (see Sequence Files and Chapter 15 for more information):
ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1)
SET FILEFORMAT SEQUENCEFILE;
The PARTITION clause is required if the table is partitioned.
You can specify a new SerDe along with SerDe properties or change the properties for the existing SerDe. The following example specifies that a table will use a Java class named com.example.JSONSerDe to process a file of JSON-encoded records:
ALTER TABLE table_using_JSON_storage
SET SERDE 'com.example.JSONSerDe'
WITH SERDEPROPERTIES (
'prop1' = 'value1',
'prop2' = 'value2');
The SERDEPROPERTIES are passed to the SerDe module (the Java class com.example.JSONSerDe, in this case). Note that both the property names (e.g., prop1) and the values (e.g., value1) must be quoted strings.
The SERDEPROPERTIES feature is a convenient mechanism that SerDe implementations can exploit to permit user customization. We’ll see a real-world example of a JSON SerDe and how it uses SERDEPROPERTIES in JSON SerDe.
The following example demonstrates how to add new SERDEPROPERTIES for the current SerDe:
ALTER TABLE table_using_JSON_storage
SET SERDEPROPERTIES (
'prop3' = 'value3',
'prop4' = 'value4');
You can alter the storage properties that we discussed in Creating Tables:
ALTER TABLE stocks
CLUSTERED BY (exchange, symbol)
SORTED BY (symbol)
INTO 48 BUCKETS;
The SORTED BY clause is optional, but the CLUSTER BY and INTO … BUCKETS are required. (See also Bucketing Table Data Storage for information on the use of data bucketing.)

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

使用道具举报

9楼

ReneeBK 发表于 2015-7-16 06:54:40 |只看作者 |坛友微信交流群

Deleting or Replacing Columns
The following example removes all the existing columns and replaces them with the new columns specified:
ALTER TABLE log_messages REPLACE COLUMNS (
hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp',
severity STRING COMMENT 'The message severity'
message STRING COMMENT 'The rest of the message');
This statement effectively renames the original hms column and removes the server and process_id columns from the original schema definition. As for all ALTER statements, only the table metadata is changed.
The REPLACE statement can only be used with tables that use one of the native SerDe modules: DynamicSerDe orMetadataTypedColumnsetSerDe. Recall that the SerDe determines how records are parsed into columns (deserialization) and how a record’s columns are written to storage (serialization).

复制代码

使用道具举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12402328 个通用积分 1620.9215 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 477211 点帖子 23879 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	10楼 Nicolle 发表于 2015-7-16 08:25:09 \|只看作者 \|坛友微信交流群 Loading Data into Managed Tables 提示: 作者被禁止或删除内容自动屏蔽

	回复使用道具举报显身卡

返回列表

12 下一页

发帖

本版微信群

加好友,备注jltj
拉您入交流群

手机版 |

意见反馈 |

帮助 |

新手入门 |

用户手册 |

友情链接 |

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

Programming Hive [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

Loading Data into Managed Tables

初级学术勋章

中级热心勋章

初级热心勋章

初级信用勋章

中级学术勋章

高级学术勋章

特级学术勋章

高级热心勋章

特级热心勋章

中级信用勋章

高级信用勋章

特级信用勋章

本版微信群

Programming Hive [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

Loading Data into Managed Tables

初级学术勋章

中级热心勋章

初级热心勋章

初级信用勋章

中级学术勋章

高级学术勋章

特级学术勋章

高级热心勋章

特级热心勋章

中级信用勋章

高级信用勋章

特级信用勋章

本版微信群

扫码加我拉你入群