[Hadoop] 在关系数据库和hdfs,hive之间数据导入导出的工具sqoop [推广有奖]

13关注
1粉丝

已卖：468份资源

硕士生

68%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 6374 个
通用积分: 0.7379
学术水平: 1 点
热心指数: 2 点
信用等级: 1 点
经验: 4756 点
帖子: 193
精华: 0
在线时间: 45 小时
注册时间: 2018-4-4
最后登录: 2020-4-14

楼主

LiZara 发表于 2018-4-7 18:41:09 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

1.sqoop是什么
让hadoop技术支持的clouder公司开发的一个在关系数据库和hdfs,hive之间数据导入导出的一个工具
2.sqoop特点

Sqoop中一大亮点就是可以通过hadoop的mapreduce把数据从关系型数据库中导入数据到HDFS。

sqoop架构非常简单，其整合了Hive、Hbase和Oozie，通过map-reduce任务来传输数据，从而提供并发特性和容错。

sqoop主要通过JDBC和关系数据库进行交互。理论上支持JDBC的database都可以使用sqoop和hdfs进行数据交互。
但是，只有一小部分经过sqoop官方测试，如下：

Database          version          --direct support             connect string matches
HSQLDB             1.8.0+             No                                  jdbc:hsqldb:*//
MySQL                5.0+             Yes                                  jdbc:mysql://
Oracle                10.2.0+          No                                  jdbc:oracle:*//
PostgreSQL       8.3+             Yes                            (import only) jdbc:postgresql://

较老的版本有可能也被支持，但未经过测试。
出于性能考虑，sqoop提供不同于JDBC的快速存取数据的机制，可以通过--direct使用。
3.sqoop常用命令

....略--------详情见文件和如下内容

sqoop安装部署...略-----详情见文件

[root@hadoop ~]# sqoopS help import
Warning: /home/sqoop/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/03/12 18:01:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
--connect <jdbc-uri>                      Specify JDBC connect
                                             string
--connection-manager <class-name>          Specify connection manager
                                             class name
--connection-param-file <properties-file> Specify connection
                                             parameters file
--driver <class-name>                      Manually specify JDBC
                                             driver class to use
--hadoop-home <hdir>                      Override
                                             $HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home <dir>                Override
                                             $HADOOP_MAPRED_HOME_ARG
--help                                     Print usage instructions
-P                                           Read password from console
--password <password>                      Set authentication
                                             password
--password-alias <password-alias>          Credential provider
                                             password alias
--password-file <password-file>             Set authentication
                                             password file path
--relaxed-isolation                         Use read-uncommitted
                                             isolation for imports
--skip-dist-cache                         Skip copying jars to
                                             distributed cache
--username <username>                      Set authentication
                                             username
--verbose                                  Print more information
                                             while working

Import control arguments:
--append                                                 Imports data
                                                            in append
                                                            mode
--as-avrodatafile                                        Imports data
                                                            to Avro data
                                                            files
--as-parquetfile                                        Imports data
                                                            to Parquet
                                                            files
--as-sequencefile                                        Imports data
                                                            to
                                                            SequenceFile
                                                            s
--as-textfile                                           Imports data
                                                            as plain
                                                            text
                                                            (default)
--autoreset-to-one-mapper                               Reset the
                                                            number of
                                                            mappers to
                                                            one mapper
                                                            if no split
                                                            key
                                                            available
--boundary-query <statement>                            Set boundary
                                                            query for
                                                            retrieving
                                                            max and min
                                                            value of the
                                                            primary key
--columns <col,col,col...>                               Columns to
                                                            import from
                                                            table
--compression-codec <codec>                               Compression
                                                            codec to use
                                                            for import
--delete-target-dir                                     Imports data
                                                            in delete
                                                            mode
--direct                                                 Use direct
                                                            import fast
                                                            path
--direct-split-size <n>                                  Split the
                                                            input stream
                                                            every 'n'
                                                            bytes when
                                                            importing in
                                                            direct mode
-e,--query <statement>                                     Import
                                                            results of
                                                            SQL
                                                            'statement'
--fetch-size <n>                                        Set number
                                                            'n' of rows
                                                            to fetch
                                                            from the
                                                            database
                                                            when more
                                                            rows are
                                                            needed
--inline-lob-limit <n>                                  Set the
                                                            maximum size
                                                            for an
                                                            inline LOB
-m,--num-mappers <n>                                        Use 'n' map
                                                            tasks to
                                                            import in
                                                            parallel
--mapreduce-job-name <name>                               Set name for
                                                            generated
                                                            mapreduce
                                                            job
--merge-key <column>                                     Key column
                                                            to use to
                                                            join results
--split-by <column-name>                                  Column of
                                                            the table
                                                            used to
                                                            split work
                                                            units
--table <table-name>                                     Table to
                                                            read
--target-dir <dir>                                        HDFS plain
                                                            table
                                                            destination
--validate                                              Validate the
                                                            copy using
                                                            the
                                                            configured
                                                            validator
--validation-failurehandler <validation-failurehandler> Fully
                                                            qualified
                                                            class name
                                                            for
                                                            ValidationFa
                                                            ilureHandler
--validation-threshold <validation-threshold>             Fully
                                                            qualified
                                                            class name
                                                            for
                                                            ValidationTh
                                                            reshold
--validator <validator>                                  Fully
                                                            qualified
                                                            class name
                                                            for the
                                                            Validator
--warehouse-dir <dir>                                     HDFS parent
                                                            for table
                                                            destination
--where <where clause>                                  WHERE clause
                                                            to use
                                                            during
                                                            import
-z,--compress                                              Enable
                                                            compression

Incremental import arguments:
--check-column <column>       Source column to check for incremental
                              change
--incremental <import-type> Define an incremental import of type
                              'append' or 'lastmodified'
--last-value <value>          Last imported value in the incremental
                              check column
Sqoop介绍、安装与操作.pdf (1.13 MB, 需要: 1 个论坛币)