E20-065 Exam Questions
63 real E20-065 exam questions with expert-verified answers and explanations. Page 1 of 2.
- Question #1
Which scenario would be ideal for processing Hadoop data with Hive?
- Question #2
The naive Bayer classifier is trained over 1600 movie reviews and then tested over 400 reviews. Here is the resulting confusion matrix: 190 (TP) 10(FN) 80 (FP) 120(TN) What are the...
- Question #3
Why would a company decide to use HBase to replace an existing relational database?
- Question #5
Which graph structure would best model the relationship between job seekers and employers?
- Question #6
What is an ideal use case for HDFS?
- Question #7
A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20. What is the lie factor o...
- Question #8
How does Latent Dinchlet Allocation (LDA) interpret a document?
- Question #9
What is a key beneficial characteristic of the Random Forest algorithm?
- Question #10
What advantage does replication provide while storing a file in HDFS?
- Question #11
Which library is NOT part of the Apache Spark distribution?
- Question #12
In which step in the visualization lifecycle would you determine how the raw data is stored?
- Question #13
What runs more efficiently because of Apache Tez?
- Question #14
What is an important simu-lation design consideration?
- Question #15
In a social network, what does it mean for a node to have a high degree but low betweenness?
- Question #16
A hotel chain runs a simul-ation on room pricing. They want to estimate revenue, per hotel, within +/- $10 with 95% confidence (Za/2=1.96). The estimated revenue standard deviation...
- Question #17
What is NOT a category of a NoSQL data store?
- Question #18
What are two visualization tools used for trivariate data?
- Question #19
You are analyzing written transcripts of focus groups conducted on product X. You approach is to use TF-IDF for your analysis. What combination of TF-IDF scores should you examine...
- Question #22
What is a characteristic of the trigram language model?
- Question #23
Refer to the exhibit. Assuming the node index starts at 1, what is the out-degree of node 3 in the adjacency matrix shown?
- Question #24
What describes how nodes in a social network are similar to each other in characteristics?
- Question #25
In multinomial logistic regression, what is used to calculate the probability of outcome occurring?
- Question #26
What best describes tokenization?
- Question #27
What does YARN provide over and above MapReduce?
- Question #28
Consider the two sentences below. I mailed my credit card application to the bank We walked along the river bank until we came to a waterwheel What type of NLP ambiguity might occu...
- Question #29
What is a characteristic of lemmatization?
- Question #30
What elements are needed to determine the time complexity of finding all the cliques of size k in social network analysis?
- Question #31
What is a random subspace of features, as used by Random Forests?
- Question #34
What is an effective use of color in visualization?
- Question #35
How can you improve processing performance in HIVE?
- Question #36
What are key characteristics of regular lattices?
- Question #37
What are the major components of the YARN architecture?
- Question #38
Refer to the exhibit. In the graph, which edge would be considered a weak lie?
- Question #39
Refer to the exhibit. If two of the communities are re-designated to be one community, how does that change the network characteristics?
- Question #40
A simul-ation to compare two different sales models yields different results for the same set of input variables in different runs. What is the likely cause?
- Question #41
What is a characteristic of stemming?
- Question #42
What best describes the meaning behind the phrase "Six Degrees of Separation'"?
- Question #43
What are three of the eight visual variables?
- Question #44
After a client submits a job request to the YARN ResourceManager, what happens next?
- Question #45
What is a characteristic of spark?
- Question #46
In a typical multinomial logistic regression analysis involving six outcomes (class labels) and three input variables, how many model parameters will need to be estimated?
- Question #47
Which Hadoop Files System shell command copies data from a local file system into HDFS?
- Question #48
What do first-order and second-order Markov processes have in common concerning next word prediction?
- Question #49
Given an input vector of features, a Random Forests model performs a classification task and ends in a tie. How does the model handle this outcome?
- Question #50
Which HDFS feature protects against user errors causing accidental loss of data?
- Question #51
What process must address acoustic ambiguity in NLP?
- Question #52
A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datas...
- Question #53
What is a characteristic of stop words?
- Question #54
Which is NOT a tenet of the Apache Pig Philosophy?
- Question #55
What is a property of a good color model for ordinal data?