New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70special

Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Exam Practice Test

Page: 1 / 6
Total 60 questions

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Testing Engine

  • Product Type: Testing Engine
$37.5  $124.99

PDF Study Guide

  • Product Type: PDF Study Guide
$33  $109.99
Question 1

You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

Options:

A.

Hadoop fs –getmerge –R westUsers.txt

B.

Hadoop fs –getemerge westUsers westUsers.txt

C.

Hadoop fs –cp westUsers/* westUsers.txt

D.

Hadoop fs –get westUsers westUsers.txt

Question 2

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

Options:

A.

MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of “tasks” into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.

B.

In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing –D mapreduce-reduces.memory-mb-2048

C.

In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing –D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.

D.

Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing –D mapreduce.job.reduces-2 will specify reduce tasks.

E.

In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing –p yarn.nodemanager.cpu-vcores=2

Question 3

Which is the default scheduler in YARN?

Options:

A.

YARN doesn’t configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml

B.

Capacity Scheduler

C.

Fair Scheduler

D.

FIFO Scheduler

Question 4

Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

Options:

A.

SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar

B.

Sample.jar is placed in a temporary directory in HDFS

C.

SampleJar.jar is sent directly to the ResourceManager

D.

SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster

Question 5

You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

Options:

A.

Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine

B.

Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine

C.

Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster

D.

Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine

E.

Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

Question 6

A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass.

She runs the command:

Hadoop jar j.jar DriverClass /data/input/data/output

The error message returned includes the line:

PriviligedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:

Input path does not exist: file:/data/input

What is the cause of the error?

Options:

A.

The user is not authorized to run the job on the cluster

B.

The output directory already exists

C.

The name of the driver has been spelled incorrectly on the command line

D.

The directory name is misspelled in HDFS

E.

The Hadoop configuration files on the client do not point to the cluster

Question 7

You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?

Options:

A.

Run hdfs fs –du / and locate the DFS Remaining value

B.

Run hdfs dfsadmin –report and locate the DFS Remaining value

C.

Run hdfs dfs / and subtract NDFS Used from configured Capacity

D.

Connect to http://mynamenode:50070/dfshealth.jsp and locate the DFS remaining value

Question 8

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:

1. Group the individual images into a set of larger files

2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.

Which data serialization system gives the flexibility to do this?

Options:

A.

CSV

B.

XML

C.

HTML

D.

Avro

E.

SequenceFiles

F.

JSON

Question 9

You are running Hadoop cluster with all monitoring facilities properly configured.

Which scenario will go undeselected?

Options:

A.

HDFS is almost full

B.

The NameNode goes down

C.

A DataNode is disconnected from the cluster

D.

Map or reduce tasks that are stuck in an infinite loop

E.

MapReduce jobs are causing excessive memory swaps

Page: 1 / 6
Total 60 questions