楼主: Lisrelchen
2681 16

Programming Hive [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
49957 个
通用积分
79.5487
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Programming Hive


Data Warehouse and Query Language for Hadoop



Book Description
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQLdialect - HiveQL - to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You'll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.
Book Details
  • Publisher:        O'Reilly Media
  • By:        Edward Capriolo, Dean Wampler, Jason Rutherglen
  • ISBN:        978-1-4493-1933-5
  • Year:        2012
  • Pages:        352
  • Language:        English
  • File size:        8.7 MB
  • File format:        PDF
  • Download:       

    本帖隐藏的内容

    Programming Hive.rar (6.12 MB, 需要: 20 个论坛币) 本附件包括:
    • Programming Hive.pdf



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Programming Program gram Ming Hive summarize provides learn

本帖被以下文库推荐

沙发
jerker 发表于 2015-3-17 23:11:03 |只看作者 |坛友微信交流群
  1. Alter Database
  2. You can set key-value pairs in the DBPROPERTIES associated with a database using the ALTER DATABASE command. No other metadata about the database can be changed, including its name and directory location:
  3. hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');
  4. There is no way to delete or “unset” a DBPROPERTY.
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

藤椅
Elena3 发表于 2015-3-17 23:23:45 |只看作者 |坛友微信交流群
  1. Creating Tables
  2. The CREATE TABLE statement follows SQL conventions, but Hive’s version offers significant extensions to support a wide range of flexibility where the data files for tables are stored, the formats used, etc. We discussed many of these options in Text File Encoding of Data Values and we’ll return to more advanced options later in Chapter 15. In this section, we describe the other options available for the CREATE TABLEstatement, adapting the employees table declaration we used previously in Collection Data Types:
  3. CREATE TABLE IF NOT EXISTS mydb.employees (
  4.   name         STRING COMMENT 'Employee name',
  5.   salary       FLOAT  COMMENT 'Employee salary',
  6.   subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
  7.   deductions   MAP<STRING, FLOAT>
  8.                COMMENT 'Keys are deductions names, values are percentages',
  9.   address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
  10.                COMMENT 'Home address')
  11. COMMENT 'Description of the table'
  12. LOCATION '/user/hive/warehouse/mydb.db/employees'
  13. TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...);

  14. First, note that you can prefix a database name, mydb in this case, if you’re not currently working in the target database.
  15. If you add the option IF NOT EXISTS, Hive will silently ignore the statement if the table already exists. This is useful in scripts that should create a table the first time they run.
  16. The clause has a gotcha you should know. If the schema specified differs from the schema in the table that already exists, Hive won’t warn you. If your intention is for this table to have the new schema, you’ll have to drop the old table, losing your data, and then re-create it. Consider if you should use one or more ALTER TABLE statements to change the existing table schema instead. See Alter Table for details.
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

板凳
suyouwoko 发表于 2015-3-18 02:28:44 |只看作者 |坛友微信交流群
  1. Partitioned, Managed Tables
  2. The general notion of partitioning data is an old one. It can take many forms, but often it’s used for distributing load horizontally, moving data physically closer to its most frequent users, and other purposes.

  3. Hive has the notion of partitioned tables. We’ll see that they have important performance benefits, and they can help organize data in a logical fashion, such as hierarchically.

  4. We’ll discuss partitioned managed tables first. Let’s return to our employees table and imagine that we work for a very large multinational corporation. Our HR people often run queries with WHERE clauses that restrict the results to a particular country or to a particular first-level subdivision (e.g., state in the United States or province in Canada). (First-level subdivision is an actual term, used here, for example: http://www.commondatahub.com/state_source.jsp.) We’ll just use the word state for simplicity. We have redundant state information in the address field. It is distinct from the state partition. We could remove the state element from address. There is no ambiguity in queries, since we have to use address.state to project the value inside the address. So, let’s partition the data first by country and then by state:
  5. CREATE TABLE employees (
  6.   name         STRING,
  7.   salary       FLOAT,
  8.   subordinates ARRAY<STRING>,
  9.   deductions   MAP<STRING, FLOAT>,
  10.   address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
  11. )
  12. PARTITIONED BY (country STRING, state STRING);

  13. Partitioning tables changes how Hive structures the data storage. If we create this table in the mydb database, there will still be an employees directory for the table:

  14. hdfs://master_server/user/hive/warehouse/mydb.db/employees
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

报纸
suyouwoko 发表于 2015-3-18 02:29:16 |只看作者 |坛友微信交流群
  1. Changing Columns
  2. You can rename a column, change its position, type, or comment:
  3. ALTER TABLE log_messages
  4. CHANGE COLUMN hms hours_minutes_seconds INT
  5. COMMENT 'The hours, minutes, and seconds part of the timestamp'
  6. AFTER severity;
  7. You have to specify the old name, a new name, and the type, even if the name or type is not changing. The keyword COLUMN is optional as is the COMMENT clause. If you aren’t moving the column, the AFTER other_column clause is not necessary. In the example shown, we move the column after the severity column. If you want to move the column to the first position, use FIRST instead of AFTER other_column.
  8. As always, this command changes metadata only. If you are moving columns, the data must already match the new schema or you must change it to match by some other means.
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

地板
bailihongchen 发表于 2015-3-18 12:58:59 |只看作者 |坛友微信交流群
  1. Adding Columns
  2. You can add new columns to the end of the existing columns, before any partition columns.
  3. ALTER TABLE log_messages ADD COLUMNS (
  4. app_name   STRING COMMENT 'Application name',
  5. session_id BIGINT   COMMENT 'The current session id');
  6. The COMMENT clauses are optional, as usual. If any of the new columns are in the wrong position, use an ALTER COLUMN table CHANGE COLUMNstatement for each one to move it to the correct position.
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

7
bailihongchen 发表于 2015-3-18 13:00:02 |只看作者 |坛友微信交流群
  1. Dropping Tables
  2. The familiar DROP TABLE command from SQL is supported:
  3. DROP TABLE IF EXISTS employees;
  4. The IF EXISTS keywords are optional. If not used and the table doesn’t exist, Hive returns an error.
  5. For managed tables, the table metadata and data are deleted.
  6. Note
  7. Actually, if you enable the Hadoop Trash feature, which is not on by default, the data is moved to the .Trash directory in the distributed filesystem for the user, which in HDFS is /user/$USER/.Trash. To enable this feature, set the property fs.trash.interval to a reasonable positive number. It’s the number of minutes between “trash checkpoints”; 1,440 would be 24 hours. While it’s not guaranteed to work for all versions of all distributed filesystems, if you accidentally drop a managed table with important data, you may be able to re-create the table, re-create any partitions, and then move the files from .Trash to the correct directories (using the filesystem commands) to restore the data.

  8. For external tables, the metadata is deleted but the data is not.
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

8
leejwen 发表于 2015-4-7 14:52:09 |只看作者 |坛友微信交流群
  1. Alter Storage Properties
  2. There are several ALTER TABLE statements for modifying format and SerDe properties.
  3. The following statement changes the storage format for a partition to be SEQUENCEFILE, as we discussed in Creating Tables (see Sequence Files and Chapter 15 for more information):
  4. ALTER TABLE log_messages
  5. PARTITION(year = 2012, month = 1, day = 1)
  6. SET FILEFORMAT SEQUENCEFILE;
  7. The PARTITION clause is required if the table is partitioned.
  8. You can specify a new SerDe along with SerDe properties or change the properties for the existing SerDe. The following example specifies that a table will use a Java class named com.example.JSONSerDe to process a file of JSON-encoded records:
  9. ALTER TABLE table_using_JSON_storage
  10. SET SERDE 'com.example.JSONSerDe'
  11. WITH SERDEPROPERTIES (
  12. 'prop1' = 'value1',
  13. 'prop2' = 'value2');
  14. The SERDEPROPERTIES are passed to the SerDe module (the Java class com.example.JSONSerDe, in this case). Note that both the property names (e.g., prop1) and the values (e.g., value1) must be quoted strings.
  15. The SERDEPROPERTIES feature is a convenient mechanism that SerDe implementations can exploit to permit user customization. We’ll see a real-world example of a JSON SerDe and how it uses SERDEPROPERTIES in JSON SerDe.
  16. The following example demonstrates how to add new SERDEPROPERTIES for the current SerDe:
  17. ALTER TABLE table_using_JSON_storage
  18. SET SERDEPROPERTIES (
  19. 'prop3' = 'value3',
  20. 'prop4' = 'value4');
  21. You can alter the storage properties that we discussed in Creating Tables:
  22. ALTER TABLE stocks
  23. CLUSTERED BY (exchange, symbol)
  24. SORTED BY (symbol)
  25. INTO 48 BUCKETS;
  26. The SORTED BY clause is optional, but the CLUSTER BY and INTO … BUCKETS are required. (See also Bucketing Table Data Storage for information on the use of data bucketing.)
复制代码

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

9
ReneeBK 发表于 2015-7-16 06:54:40 |只看作者 |坛友微信交流群
  1. Deleting or Replacing Columns
  2. The following example removes all the existing columns and replaces them with the new columns specified:
  3. ALTER TABLE log_messages REPLACE COLUMNS (
  4. hours_mins_secs INT    COMMENT 'hour, minute, seconds from timestamp',
  5. severity        STRING COMMENT 'The message severity'
  6. message         STRING COMMENT 'The rest of the message');
  7. This statement effectively renames the original hms column and removes the server and process_id columns from the original schema definition. As for all ALTER statements, only the table metadata is changed.
  8. The REPLACE statement can only be used with tables that use one of the native SerDe modules: DynamicSerDe orMetadataTypedColumnsetSerDe. Recall that the SerDe determines how records are parsed into columns (deserialization) and how a record’s columns are written to storage (serialization).
复制代码

使用道具

10
Nicolle 学生认证  发表于 2015-7-16 08:25:09 |只看作者 |坛友微信交流群

Loading Data into Managed Tables

提示: 作者被禁止或删除 内容自动屏蔽

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-28 22:46