Databricks
DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST · Question #61
DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST Question #61: Real Exam Question with Answer & Explanation
Sign in or unlock DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST to reveal the answer and full explanation for question #61. The question stem and answer options stay visible for context.
Question
You are working in a classification model for a book, written by HadoopExam Learning Resources and decided to use building a text classification model for determining whether this book is for Hadoop or Cloud computing. You have to select the proper features (feature selection) hence, to cut down on the size of the feature space, you will use the mutual information of each word with the label of hadoop or cloud to select the 1000 best features to use as input to a Naive Bayes model. When you compare the performance of a model built with the 250 best features to a model built with the 1000 best features, you notice that the model with only 250 features performs slightly better on our test data. What would help you choose better features for your model?
Options
- AInclude least mutual information with other selected features as a feature selection criterion
- BInclude the number of times each of the words appears in the book in your model
- CDecrease the size of our training data
- DEvaluate a model that only includes the top 100 words
Unlock DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST to see the answer
You've previewed enough free DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST questions. Unlock DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.