Offering Free access to CCAH CCA-500 Exam Questions Pool Bank

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Testing Engine

Product Type: Testing Engine

$43.75 ~~$124.99~~

Add to Cart

PDF + Testing Engine

Product Type: PDF + Testing Engine

$61.25 ~~$174.99~~

Add to Cart

PDF Study Guide

Product Type: PDF Study Guide

$38.5 ~~$109.99~~

Add to Cart

Question 1

You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

Options:

Hadoop fs –getmerge –R westUsers.txt

Hadoop fs –getemerge westUsers westUsers.txt

Hadoop fs –cp westUsers/* westUsers.txt

Hadoop fs –get westUsers westUsers.txt

Question 2

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

Options:

MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of “tasks” into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.

In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing –D mapreduce-reduces.memory-mb-2048

In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing –D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.

Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing –D mapreduce.job.reduces-2 will specify reduce tasks.

In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing –p yarn.nodemanager.cpu-vcores=2

Question 3

Which is the default scheduler in YARN?

Options:

YARN doesn’t configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml

Capacity Scheduler

Fair Scheduler

FIFO Scheduler

Question 4

Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

Options:

SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar

Sample.jar is placed in a temporary directory in HDFS

SampleJar.jar is sent directly to the ResourceManager

SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster

Question 5

You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

Options:

Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine

Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine

Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster

Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine

Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

Question 6

A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass.

She runs the command:

Hadoop jar j.jar DriverClass /data/input/data/output

The error message returned includes the line:

PriviligedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:

Input path does not exist: file:/data/input

What is the cause of the error?

Options:

The user is not authorized to run the job on the cluster

The output directory already exists

The name of the driver has been spelled incorrectly on the command line

The directory name is misspelled in HDFS

The Hadoop configuration files on the client do not point to the cluster

Question 7

You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?

Options:

Run hdfs fs –du / and locate the DFS Remaining value

Run hdfs dfsadmin –report and locate the DFS Remaining value

Run hdfs dfs / and subtract NDFS Used from configured Capacity

Connect to http://mynamenode:50070/dfshealth.jsp and locate the DFS remaining value

Question 8

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:

1. Group the individual images into a set of larger files

2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.

Which data serialization system gives the flexibility to do this?

Options:

CSV

XML

HTML

Avro

SequenceFiles

JSON

Question 9

You are running Hadoop cluster with all monitoring facilities properly configured.

Which scenario will go undeselected?

Options:

HDFS is almost full

The NameNode goes down

A DataNode is disconnected from the cluster

Map or reduce tasks that are stuck in an infinite loop

MapReduce jobs are causing excessive memory swaps

Load More CCA-500 Questions

Summer Special Flat 65% Limited Time Discount offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Exam Practice Test

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Testing Engine

PDF + Testing Engine

PDF Study Guide

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer: