A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON

Sign in or unlock MLS-C01 to reveal the answer and full explanation for question #191. The question stem and answer options stay visible for context.

Modeling

Question

A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON format. During model evaluation, the data scientist discovered that the model recommends certain stopwords such as "a," "an," and "the" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries. After a few iterations of tag review with the content team, the data scientist notices that the rare words are unusual but feasible. The data scientist also must ensure that the tag recommendations of the generated model do not include the stopwords. What should the data scientist do to meet these requirements?

Options

AUse the Amazon Comprehend entity recognition API operations. Remove the detected
BRun the SageMaker built-in principal component analysis (PCA) algorithm with the blog
CUse the SageMaker built-in Object Detection algorithm instead of the NTM algorithm for
DRemove the stopwords from the blog post data by using the CountVectorizer function in

Unlock MLS-C01 to see the answer

You've previewed enough free MLS-C01 questions. Unlock MLS-C01 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock MLS-C01 - $49.99 / 30 days Sign in

Topics

#Text Preprocessing#Stopword Removal#Neural Topic Model (NTM)#CountVectorizer

Full MLS-C01 Practice