DS-200 Exam Questions
68 real DS-200 exam questions with expert-verified answers and explanations. Page 1 of 2.
- Question #1
What is the result of the following command (the database username is foo and password is bar)? $ sqoop list-tables - - connect jdbc : mysql : / / localhost/databasename - - table...
- Question #2
You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected?
- Question #3
Which of the following scenarios makes HDFS unavailable?
- Question #4
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follow...
- Question #5
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follow...
- Question #6
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follow...
- Question #7
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer. The makeup of the groups as follow...
- Question #8
You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also...
- Question #9
You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also...
- Question #10
Many machine learning algorithm involve finding the Global minimum of a convex loss function, primarily because:
- Question #11
You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also...
- Question #12
Which two techniques should you use to avoid overfitting a classification model to a data set?
- Question #13
You are building a k-nearest neighbor classifier (k-NN) on a labeled set of points in a high- dimensional space. You determine that the classifier has a large error on the training...
- Question #14
Which best describes the primary function of Flume?
- Question #15
You have a directory containing a number of comma-separated files. Each file has three columns and each filename has a .csv extension. You want to have a single tab-separated file...
- Question #16
What are three benefits of running feature selection analysis before filtering a classification model?
- Question #17
When optimizing a function using stochastic gradient descent, how frequently should you update your estimate of the gradient?
- Question #18
In what format are web server log files usually generated and how must you transform them in order to make them usable for analysis in Hadoop?
- Question #19
Which recommender system technique is domain specific?
- Question #20
You are about to sample a 100-dimensinal unit-cube. To adequately sample any single given dimension, you need only capture 10 points. How many points do you need to order to sample...
- Question #21
You have acquired a new data source of millions of customer records, and you've this data into HDFS. Prior to analysis, you want to change all customer registration to the same dat...
- Question #22
A company has 20 software engineers working to fix on a project. Over the past week, the team has fixed 100 bugs. Although the average number of bugs. Although the average number o...
- Question #23
A company has 20 software engineers working to fix on a project. Over the past week, the team has fixed 100 bugs. Although the average number of bugs. Although the average number o...
- Question #24
In what way can Hadoop be used to improve the performance of LIoyd's algorithm for k-means clustering on large data sets?
- Question #25
You have a data file that contains two trillion records, one record per line (comma separated). Each record lists two friends and unique message sent between them. Their names will...
- Question #26
You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job in a directory named westUsers, located just belo...
- Question #27
Function is convex if the line segment between two points, a and b is greater than equal to the value of the a x b Which two functions are convex?
- Question #28
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because your Hadoop cluster isn't optimized for storing and processing many small...
- Question #29
You have user profile records in an OLTP database that you want to join with web server logs which you have already ingested into HDFS. What is the best way to acquire the user pro...
- Question #30
You are building a system to perform outlier detection for a large online retailer. You need to build a system to detect if the total dollar value of sales are outside the norm for...
- Question #31
When setting the HDFS block size, which of the following considerations is the least important?
- Question #32
What is the standard configuration of slave nodes in a Hadoop cluster?
- Question #33
Consider the following sample from a distribution that contains a continuous X and label Y that is either A or B: Which is the best cut point for X if you want to discretize these...
- Question #34
Consider the following sample from a distribution that contains a continuous X and label Y that is either A or B: Which is the best choice of cut points for X if you want to discre...
- Question #35
You want to understand more about how users browse your public website. For example, you war know which pages they visit prior to placing an order. You have a server farm of 200 we...
- Question #36
You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit sl...
- Question #37
You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit sl...
- Question #38
You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit sl...
- Question #39
You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit sl...
- Question #40
Assuming the trends shown in this chart continue, what would we expect the value of the revenue to be in Q1 of 2013?
- Question #41
From historical data, you know that 50% of students who take Cloudera's Introduction to Data Science: Building Recommenders Systems training course pass this exam, while only 25% o...
- Question #42
From historical data, you know that 50% of students who take Cloudera's Introduction to Data Science: Building Recommenders Systems training course pass this exam, while only 25% o...
- Question #43
You want to build a classification model to identify spam comments on a blog. You decide to use the words in the comment text as inputs to your model. Which criteria should you use...
- Question #44
Given the following sample of numbers from a distribution: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 What are the five numbers that summarize this distribution (the five number summary...
- Question #45
What additional capability does Ganglia provide to that Hadoop alone DOES NOT provide?
- Question #46
Given the following sample of numbers from a distribution: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 How do high-level languages like Apache Hive and Apache Pig efficiently calculate ap...
- Question #47
What is the best way to determine the learning rate parameters for stochastic gradient descent when the distribution of the input data shifts over time?
- Question #48
Which two machine learning algorithm should you consider as likely to benefit from discretizing continuous features?
- Question #49
You've built a model that has ten different variables with complicated independence relationships between them, and both continuous and discrete variables that have complicated, mu...
- Question #50
What is one limitation encountered by all systems that employ collaborative filtering and use preferences as input. In order to output product recommendations to consumers?