Q1. What exactly is Hadoop? Q2. What are 5 Vs of Big Data ? Q3. Give me examples of Unstructured data. Q4. Tell me about Hadoop file system and processing framework. Q5/ What is High Availability feature in Hadoop2. Q6. What is Federation. Q7. What is MetaData ? Q8. What are the main components in Hadoop Eco-System and what are their functions ? Q9. Tell me some major benefits of Hadoop? Q10. How Hadoop is cost-effective? Q11. What is the block size in Hadoop? Q12. Please tell me the NameNode port number Q13. What is the default replication factor in HDFS ? Q14. What is the command to change the replication factor ? Q15. Tell me two most commonly used commands in HDFS. Q16. What are the common types of NOSQL data bases ? Q17. Give me an example of document database ? Q18. Give me the examples of Columnar database ? Q19. Tell me about the execution modes of Apache Pig. Q20. How would you import data from MYSQL into HDFS ? Q21. What are the Hadoop features extended to its eco-system components ? To read original article, click here.本帖隐藏的内容
A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.
A2. Volume – Size of the data
Velocity – Speed of change of data
Variety – Different types of data : Structured, Semi-Structured, Unstructured data.
A3. Images, Videos, Audios etc.
A4. Hadoop files system is called as HDFS – Hadoop distributed file system. It consists of Name Node, Data Node and Secondary Name Node.
Hadoop processing framework is known as MapReduce. It caters Map and Reduce tasks that get scheduled in parallel to achieve efficiency.
A5. In Hadoop 2 Passive Name Node is introduced to avoid NameNode becoming single point of failure. This results into High Availability of Hadoop cluster.
A6. Federation is introduced in Hadoop 2 to cater multiple NameNodes in Hadoop cluster. This makes NameNode horizontally scalable and allows to cater huge amount of Meta Data.
A7. MetaData is data about data. Name Node caters MetaData in Hadoop cluster – information about files in HDFS.
A8. Here is a list of Hadoop Eco-System components –
1. HDFS – distributed File System
2. MapReduce – programming paradigm – based on Java
3. Pig- to process and analyse the structured and semi-structured data
4. Hive – to process and analyse structured data
5. HBASE – NOSQL database
6. SQOOP – Import/Export structured data
7. Oozie – Scheduler
A9. Some major benefits of Hadoop are –
a. Cost-Effective
b. Ability to handle multiple data types
c. Ability to handle big data
d. Common platform for machine learning/business intelligence/datawarehousing etc.
A10. Hadoop is used with commodity hardware and is open-source. So, it provides a cost-effective solution from both hardware and software fronts.
A11. Block size in Hadoop 1 is 64 kb and in Hadoop 2 is 128 kb.
A12. Its 50070.
A13. Default replication factor is 3.
A14. Replication factor can be changed using SETREP command.
A15. Get command and put command.
A16. These are –
a. Columnar database.
b. Document database.
c. Graph database.
A17. MongoDB.
A18. Cassandra and HBASE.
A19. Pig can be executed in local and MapReduce modes.
A 20. Using Sqoop.
A 21. High Availability, Horizontal Scalability and Replication/Data Redundancy.