Hadoop with Cloudera VM (the Word Count Example)-经管之家官网！

经济学管理学金融学统计学

您当前的位置> 软件培训>>

Hadoop with Cloudera VM (the Word Count Example)

发布：Lisrelchen | 分类：hadoop

关于本站

人大经济论坛-经管之家：分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

完整电子版已上线CDA网校，累计已有10万+在读~ 教材严格按考试大纲编写，适合CDA考生备考，也适合业务及数据分析岗位的从业者提升自我。

TOP热门关键词

专题页面精选

ThisdemonstratessinglenodeHaddopclusterusingtheClouderaVirtualMachine.ClouderahaspackagesHadoopinstallation,Clouderamanagerinaquickstartvirtualmachinesopeoplecanlearnitinwithouthasselsofinstallingandd ...

坛友互助群

扫码加入各岗位、行业、专业交流群

This demonstrates single node Haddop cluster using the Cloudera Virtual Machine. Cloudera has packages Hadoop installation, Cloudera manager in a quickstart virtual machine so people can learn it in without hassels of installing and dealing with different OS systems.

Downloads

Download VirtualBox, a virtualization software package, according to the operating system on your host machine https://www.virtualbox.org/wiki/Downloads

Download the Cloudera QUickstart Virtual Machine (VM) http://www.cloudera.com/content/dev-center/en/home/developer-admin-resources/quickstart-vm.html

Import and start the VM

In VirtualBox Manager, click File->Import Appliance Then input the file path of Cloudera VM in the prompt window. Import the Cloudera VM.

After importing, in VirtualBox Manager, start the VM by right clicking and clicking "start"

Compile wordcount.jar

Now we are in the VM. Open a terminal

#print working diretory, to check the current working directory[cloudera@localhost ~]$ pwd/home/cloudera#if not, change to this directory by doing[cloudera@localhost ~]$ cd /home/cloudera/#note the machine's name is called localhost, which is what we want. It would be problematic if it appears in other names.

First, open the gedit text editor, copy and paste the Java program herehttps://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount1_source.html to gedit, and then save the file to /home/cloudera/WordCount.java

#export CLASSPATH[cloudera@localhost ~]$ export CLASSPATH=/usr/lib/hadoop/client-0.20/\*:/usr/lib/hadoop/\*#display the value of CLASSPATH[cloudera@localhost ~]$ echo $CLASSPATH/usr/lib/hadoop/client-0.20/*:/usr/lib/hadoop/*#make a directory to store the to-be-compiled class[cloudera@localhost ~]$ mkdir wordcount_classes#compile the class, save it to the wordcount_classes directory[cloudera@localhost ~]$ javac -d wordcount_classes/ WordCount.java#make the .jar file, which is to be used for directing word count job in Hadoop[cloudera@localhost ~]$ jar -cvf wordcount.jar -C wordcount_classes/ .added manifestadding: org/(in = 0) (out= 0)(stored 0%)adding: org/myorg/(in = 0) (out= 0)(stored 0%)adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)#list files in the current directory. Now you should see the wordcount.jar file listed there.[cloudera@localhost ~]$ ls
Put some files on HDFS

This is a word frequency count job, and we will have some text files from which words will be counted. We make some short text files here in the current directory, and then put them in Hadoop Distributed File System (HDFS). The text files need to be on HDFS to run a Hadoop job.

#create a text file with content "Hello World Bye World" and save to file0[cloudera@localhost ~]$ echo "Hello World Bye World" >file0#create a text file with content "Hello Hadoop Bye Hadoop" and save to file1[cloudera@localhost ~]$ echo "Hello Hadoop Bye Hadoop" >file1#make a new directory "wordcount" in HDFS under the /user/cloudera/[cloudera@localhost ~]$ hadoop fs -mkdir /user/cloudera/wordcount#make a new directory on HDFS now that we have the /user/cloudera/wordcount directory[cloudera@localhost ~]$ hadoop fs -mkdir /user/cloudera/wordcount/input#put file0 to HDFS directory /user/cloudera/wordcount/input[cloudera@localhost ~]$ hadoop fs -put file0 /user/cloudera/wordcount/input#put file1 to HDFS directory /user/cloudera/wordcount/input[cloudera@localhost ~]$ hadoop fs -put file1 /user/cloudera/wordcount/input
Run the Hadoop job

In the terminal, run a Hadoop job, need to supply the .jar file, main class, input folder path on HDFS, output folder path on HDFS.

Important: the output folder cannot be an existing folder in HDFS, it will cause error if that's the case. Hadoop will create this folder during the run for you. Make sure the specified output folder does not exist on HDFS, and if so, delete it.

[cloudera@localhost ~]$ hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output14/03/15 11:56:11 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.14/03/15 11:56:12 INFO mapred.FileInputFormat: Total input paths to process : 214/03/15 11:56:12 INFO mapred.JobClient: Running job: job_201403151136_000114/03/15 11:56:13 INFO mapred.JobClient:map 0% reduce 0%14/03/15 11:56:24 INFO mapred.JobClient:map 100% reduce 0%14/03/15 11:56:30 INFO mapred.JobClient:map 100% reduce 100%14/03/15 11:56:31 INFO mapred.JobClient: Job complete: job_201403151136_000114/03/15 11:56:31 INFO mapred.JobClient: Counters: 3314/03/15 11:56:31 INFO mapred.JobClient: File System Counters14/03/15 11:56:31 INFO mapred.JobClient: FILE: Number of bytes read=7114/03/15 11:56:31 INFO mapred.JobClient: FILE: Number of bytes written=48181714/03/15 11:56:31 INFO mapred.JobClient: FILE: Number of read operations=014/03/15 11:56:31 INFO mapred.JobClient: FILE: Number of large read operations=014/03/15 11:56:31 INFO mapred.JobClient: FILE: Number of write operations=014/03/15 11:56:31 INFO mapred.JobClient: HDFS: Number of bytes read=29014/03/15 11:56:31 INFO mapred.JobClient: HDFS: Number of bytes written=3114/03/15 11:56:31 INFO mapred.JobClient: HDFS: Number of read operations=514/03/15 11:56:31 INFO mapred.JobClient: HDFS: Number of large read operations=014/03/15 11:56:31 INFO mapred.JobClient: HDFS: Number of write operations=214/03/15 11:56:31 INFO mapred.JobClient: Job Counters 14/03/15 11:56:31 INFO mapred.JobClient: Launched map tasks=214/03/15 11:56:31 INFO mapred.JobClient: Launched reduce tasks=114/03/15 11:56:31 INFO mapred.JobClient: Data-local map tasks=214/03/15 11:56:31 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=1467114/03/15 11:56:31 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=375614/03/15 11:56:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/03/15 11:56:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/03/15 11:56:31 INFO mapred.JobClient: Map-Reduce Framework14/03/15 11:56:31 INFO mapred.JobClient: Map input records=214/03/15 11:56:31 INFO mapred.JobClient: Map output records=814/03/15 11:56:31 INFO mapred.JobClient: Map output bytes=7814/03/15 11:56:31 INFO mapred.JobClient: Input split bytes=24414/03/15 11:56:31 INFO mapred.JobClient: Combine input records=814/03/15 11:56:31 INFO mapred.JobClient: Combine output records=614/03/15 11:56:31 INFO mapred.JobClient: Reduce input groups=414/03/15 11:56:31 INFO mapred.JobClient: Reduce shuffle bytes=9714/03/15 11:56:31 INFO mapred.JobClient: Reduce input records=614/03/15 11:56:31 INFO mapred.JobClient: Reduce output records=414/03/15 11:56:31 INFO mapred.JobClient: Spilled Records=1214/03/15 11:56:31 INFO mapred.JobClient: CPU time spent (ms)=106014/03/15 11:56:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=40034304014/03/15 11:56:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=199745126414/03/15 11:56:31 INFO mapred.JobClient: Total committed heap usage (bytes)=28187852814/03/15 11:56:31 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter14/03/15 11:56:31 INFO mapred.JobClient: BYTES_READ=46

Examine word count results, stored on a text file in the output folder in HDFS

#list the files in the output folder (in HDFS)[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/wordcount/outputFound 3 items-rw-r--r-- 3 cloudera cloudera 0 2014-03-15 11:56 /user/cloudera/wordcount/output/_SUCCESSdrwxr-xr-x - cloudera cloudera 0 2014-03-15 11:56 /user/cloudera/wordcount/output/_logs-rw-r--r-- 3 cloudera cloudera 31 2014-03-15 11:56 /user/cloudera/wordcount/output/part-00000#examine the word frequency file[cloudera@localhost ~]$ hadoop fs -cat /user/cloudera/wordcount/output/part-00000Bye 2Hadoop 2Hello 2World 2

References
[1] https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount1.html
[2] http://bangforeheadonbrickwall.wordpress.com/2013/01/29/making-the-cloudera-hadoop-wordcount-tutorial-work/
[3] http://stackoverflow.com/questions/16556182/clouderas-cdh4-wordcount-hadoop-tutorial-issues

扫码或添加微信号：坛友素质互助

「经管之家」APP：经管人学习、答疑、交友，就上经管之家！
免流量费下载资料----在经管之家app可以下载论坛上的所有资源，并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库，各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里，你可以找到任何学科方向、有共同话题的朋友。
经管之家（原人大经济论坛），跨越高校的围墙，带你走进经管知识的新世界。
扫描下方二维码下载并注册APP

本文关键词：

本文论坛网址：https://bbs.pinggu.org/thread-5542892-1-1.html

上一篇 | 中国人民大学深圳研究院博士招考问题

下一篇 | doughnut economics - seven ways to thi ...

hadoop 精彩帖子推荐更多

您可能感兴趣的文章

本站推荐的文章

人气文章

本文标题：Hadoop with Cloudera VM (the Word Count Example)

本文链接网址：https://bbs.pinggu.org/jg/ruanjianpeixun_hadoop_5542892_1.html

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。

Hadoop with Cloudera VM (the Word Count Example)-经管之家官网！

hadoop