Dear Readers, Welcome to Hadoop Objective Questions and Answers have been designed specially to get you acquainted with the nature of questions you may encounter during your Job interview for the subject of Hadoop Multiple choice Questions. These Objective type Hadoop are very important for campus placement test and job interviews. As per my experience good interviewers hardly plan to ask any particular question during your Job interview and these model questions are asked in the online technical test and interview of many IT & Non IT Industry.
A. ASequenceFilecontains a binaryencoding ofan arbitrary numberof homogeneous writable objects.
B. ASequenceFilecontains a binary encoding of an arbitrary number of heterogeneous writable objects.
C. ASequenceFilecontains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. ASequenceFilecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be sametype.
Answer: D
A. Yes, but only in Hadoop 0.22+.
B. Yes, there is a special format for map files.
C. No, but sequence file input format can read map files.
D. Both 2 and 3 are correct answers.
Answers: C
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplittable to always return false.
Answer: B
A. Input file splits may cross line breaks. A line thatcrosses tile splits is ignored.
B. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.
C. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the brokenline.
D. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the brokenline.
E. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginningof thebroken line.
Answer: D
A. Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.
B. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
C. Pig programs rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce.
D. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.
Answer: D
A. Pig
B. Hue
C. Hive
D. Flume
E. Sqoop
F. Oozie
G. fuse-dfs
Answer: C,E
A. Pig
B. Hue
C. Hive
D. Sqoop
E. Oozie
F. Flume
G. Hadoop Streaming
Answer: C
A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.
Answer: D
A. Hue
B. Pig
C. Hive
D. Oozie
E. HBase
F. Flume
G. Sqoop
Answer: E
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
Answer: D
A. Map or reduce tasks that are stuck in an infinite loop.
B. HDFS is almost full.
C. The NameNode goes down.
D. A DataNode is disconnectedfrom the cluster.
E. MapReduce jobs that are causing excessive memory swaps.
Answer: C
A. JobTracker failure
B. TaskTracker failure
C. DataNode failure
D. NameNode failure
E. Secondary NameNode failure
Answer: A
A. Combine
B. Group (a.k.a. 'shuffle')
C. Reduce
D. Write
Ans: A
A. Ranger
B. Longhorn
C. Lonestar
D. Spur
Ans: A
A. True
B. False
Ans: B
A. Java
B. C
C. FORTRAN
D. Python
Ans: A
A. Hadoop
B. Twister
C. Phoenix
Ans: C
A. Data represented in a distributed filesystem is already sorted.
B. Distributed filesystems must always be resident in memory, which is much faster than disk.
C. Data storage and processing can be co-located on the same node, so that most input data relevant to Map or Reduce will be present on local disks or cache.
D. A distributed filesystem makes random access faster because of the presence of a dedicated node serving file metadata.
Ans: D
A. One key and a list of all values associated with that key.
B. One key and a list of some values associated with that key.
C. An arbitrarily sized list of key/value pairs.
Ans: A
A. Split
B. Map
C. Combine
Ans: A
A. 512 bytes
B. 64 MB
C. 1024 KB
D. None of the above
Answer: B
A. -show
B. -help
C. -?
D. None of the above
Answer: B
A. Remote processing call
B. Remote process call
C. Remote procedure call
D. None of the above
Answer: C
A. open()
B. access()
C. select()
D. None of the above
Answer: A
A. Two
B. Four
C. Three
D. None of the above
Answer: A
A. The most common programming language is Java, but scripting languages are also supported via Hadoop streaming.
B. Any programming language that can comply with Map Reduce concept can be supported.
C. Only Java supported since Hadoop was written in Java.
D. Currently Map Reduce supports Java, C, C++ and COBOL.
Answer: A
A. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs
B. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
C. Sequence files are intermediate files that are created by Hadoop after the map step
D. Both B and C are correct
Answer: A
A. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack.
This is how Hadoop is "rack aware"
B. Map files are the files that show how the data is distributed in the Hadoop cluster.
C. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution
D. Map files are sorted sequence files that also have an index. The index allows fast data look up.
Answer: D
A. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file.
B. Binary data cannot be used by Hadoop fremework. Binary data should be converted to a Hadoop compatible format prior to loading.
C. Binary can be used in map-reduce only with very limited functionlity. It cannot be used as a key for example.
D. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers
Answer: A
A . Map-side join is done in the map phase and done in memory
B . Map-side join is a technique in which data is eliminated at the map step
C . Map-side join is a form of map-reduce API which joins data from different locations
D . None of these answers are correct
Answer: A
A. Reduce-side join is a technique to eliminate data from initial data set at reduce step
B. Reduce-side join is a technique for merging data from different sources based on a specific key.
C. Reduce-side join is a set of API to merge data from different sources.
D. None of these answers are correct
Answer: B
A. Pig is a subset fo the Hadoop API for data processing
B. Pig is a part of the Apache Hadoop project that provides C-like scripting languge interface for data processing
C. Pig is a part of the Apache Hadoop project. It is a "PL-SQL" interface for data processing in Hadoop cluster
D. PIG is the third most popular form of meat in the US behind poultry and beef.
Answer: B
A. The Hadoop administrator has to set the number of the reducer slot to zero on all slave nodes. This will disable the reduce step.
B. It is imposible to disable the reduce step since it is critical part of the Mep-Reduce abstraction.
C. A developer can always set the number of the reducers to zero. That will completely disable the reduce step.
D. While you cannot completely disable reducers you can set output to one. There needs to be at least one reduce step in Map-Reduce abstraction.
Answer: C
A. Developers should design Map-Reduce jobs without reducers only if no reduce slots are available on the cluster.
B. Developers should never design Map-Reduce jobs without reducers. An error will occur upon compile.
C. There is a CPU intensive step that occurs between the map and reduce steps. Disabling the reduce step speeds up data processing.
D. It is not possible to create a map-reduce job without at least one reduce step. A developer may decide to limit to one reducer for debugging purposes.
Answer: C
A. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input.
B. There is no default input format. The input format always should be specified.
C. The default input format is a sequence file format. The data needs to be preprocessed before using the default input format.
D. The default input format is TextInputFormat with byte offset as a key and entire line as a value.
Answer: D
A. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file.
B. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster.
C. The default input format is controlled by each individual mapper and each line needs to be parsed indivudually.
D. None of these answers are correct.
Answer: B
A. The most common problem with map-side joins is introducing a high level of code complexity.
This complexity has several downsides: increased risk of bugs and performance degradation.
Developers are cautioned to rarely use map-side joins.
B. The most common problem with map-side joins is lack of the avaialble map slots since map-side joins require a lot of mappers.
C. The most common problems with map-side joins are out of memory exceptions on slave nodes.
D. The most common problem with map-side join is not clearly specifying primary index in the join.
This can lead to very slow performance on large datasets.
Answer: C
A. Both techniques have about the the same performance expectations.
B. Reduce-side join because join operation is done on HDFS.
C. Map-side join is faster because join operation is done in memory.
D. Reduce-side join because it is executed on a the namenode which will have faster CPU and more memory.
Answer: C
A. No. The configuration settings in the configuration file takes precedence
B. Yes. The configuration settings using Java API take precedence
C. It depends when the developer reads the configuration file. If it is read first then no.
D. Only global configuration settings are captured in configuration files on namenode. There are only a very few job parameters that can be set using Java API.
Answer: B
A. Avro is a java serialization library
B. Avro is a java compression library
C. Avro is a java library that create splittable files
D. None of these answers are correct
Answer: A
A. Yes, Avro was specifically designed for data processing via Map-Reduce
B. Yes, but additional extensive coding is required
C. No, Avro was specifically designed for data storage only
D. Avro specifies metadata that allows easier data access. This data cannot be used as part of mapreduce execution, rather input specification only.
Answer: A
A. The distributed cache is special component on namenode that will cache frequently used data for faster client response. It is used during reduce step.
B. The distributed cache is special component on datanode that will cache frequently used data for faster client response. It is used during map step.
C. The distributed cache is a component that caches java objects.
D. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.
Answer: D
A. The best performance expectation one can have is measured in seconds. This is because Hadoop can only be used for batch processing
B. The best performance expectation one can have is measured in milliseconds. This is because Hadoop executes in parallel across so many machines
C. The best performance expectation one can have is measured in minutes. This is because Hadoop can only be used for batch processing
D. It depends on on the design of the map-reduce program, how many machines in the cluster, and the amount of data being retrieved
Answer: A
A. Writable is a java interface that needs to be implemented for streaming data to remote servers.
B. Writable is a java interface that needs to be implemented for HDFS writes.
C. Writable is a java interface that needs to be implemented for MapReduce processing.
D. None of these answers are correct.
Answer: C
A. Writable data types are specifically optimized for network transmissions
B. Writable data types are specifically optimized for file system storage
C. Writable data types are specifically optimized for map-reduce processing
D. Writable data types are specifically optimized for data retrieval
Answer: A
A. No, Hadoop does not provide techniques for custom datatypes.
B. Yes, but only for mappers.
C. Yes, custom data types can be implemented as long as they implement writable interface.
D. Yes, but only for reducers.
Answer: C
A. No, Hadoop does not provide techniques for custom datatypes.
B. Yes, but only for mappers.
C. Yes, custom data types can be implemented as long as they implement writable interface.
D. Yes, but only for reducers.
Answer: C
A. Yes, but only in Hadoop 0.22+.
B. No, Hadoop always operates on one input directory.
C. Yes, developers can add any number of input paths.
D. Yes, but the limit is currently capped at 10 input paths.
Answer: C
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplitable to always return false.
Answer: D
A. The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.
B. The TaskTracker spawns a new Mapper to process all records in a single input split.
C. The TaskTracker spawns a new Mapper to process each key-value pair.
D. The JobTracker spawns a new Mapper to process all records in a single file.
Answer: C
A. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the intermediate data should arrive before the reduce method begins.
B. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.
C. Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.
D. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.
Answer: D
A. 6
B. 3
C. 1
D. 0
E. 5
Answer: B
A. Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper.
B. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.
C. Place the data file in the DataCache and read the data into memory in the configure method of the mapper.
D. Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper.
Answer: B
A. The values are in sorted order.
B. The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.
C. The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.
D. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.
Answer: B
A. Processor and network I/O
B. Disk I/O and network I/O
C. Processor and RAM
D. Processor and disk I/O
Answer: B
A. Yes, because the sum operation is both associative and commutative and the input and output types to the reduce method match.
B. No, because the sum operation in the reducer is incompatible with the operation of a Combiner.
C. No, because the Reducer and Combiner are separate interfaces.
D. No, because the Combiner is incompatible with a mapper which doesn’t use the same data type for both the key and value.
E. Yes, because Java is a polymorphic object-oriented language and thus reducer code can be reused as a combiner.
Answer: A
A. TaskTracker
B. NameNode
C. DataNode
D. JobTracker
E. Secondary NameNode
Answer: D
A. HBase
B. Hue
C. Pig
D. Hive
E. Oozie
F. Flume
G. Sqoop
Answer: A
A. A Sequence Filecontains a binary encoding of an arbitrary number of homo geneous writable objects.
B. A Sequence Filecontains a binary encoding of an arbitrary number of hetero geneous writable objects.
C. A Sequence Filecontains a binary encoding of an arbitrary number of Writable Comparable objects, in sorted order.
D. A Sequence Filecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type.
Answer: D
A. BDBInputFormat
B. KeyValueTextInputFormat
C. SequenceFileInputFormat
D. SequenceFileAsTextInputFormat
Answer: C
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplittable to always return false.
Answer: B
A. Input file splits may cross line breaks. A line thatcrosses tile splits is ignored.
B. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.
C. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the brokenline.
D. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
E. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of thebroken line.
Answer: D
A. Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.
B. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
C. Pig programs rely on MapReduce but are extensible, allowing developers to do specialpurpose processing not provided by MapReduce.
D. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.
Answer: D
A. Pig
B. Hue
C. Hive
D. Flume
E. Sqoop
F. Oozie
G. fuse-dfs
Answer: C,E
A. Pig
B. Hue
C. Hive
D. Sqoop
E. Oozie
F. Flume
G. Hadoop Streaming
Answer: C
A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.
Answer: D
A. Hue
B. Pig
C. Hive
D. Oozie
E. HBase
F. Flume
G. Sqoop
Answer: E
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Str
Answer: D