MLS-C01 · Question #291
MLS-C01 Question #291: Real Exam Question with Answer & Explanation
The correct answer is A: Latent Dirichlet allocation (LDA). {"question_number": 10, "correct_answer": "A, C", "explanation": "This is an unsupervised topic modeling problem: the goal is to discover latent topics in a corpus of text documents without labeled training data. (A) Latent Dirichlet Allocation (LDA) is the classical probabilisti
Question
A pharmaceutical company performs periodic audits of clinical trial sites to quickly resolve critical findings. The company stores audit documents in text format. Auditors have requested help from a data science team to quickly analyze the documents. The auditors need to discover the 10 main topics within the documents to prioritize and distribute the review work among the auditing team members. Documents that describe adverse events must receive the highest priority. A data scientist will use statistical modeling to discover abstract topics and to provide a list of the top words for each category to help the auditors assess the relevance of the topic. Which algorithms are best suited to this scenario? (Choose two.)
Options
- ALatent Dirichlet allocation (LDA)
- BRandom forest classifier
- CNeural topic modeling (NTM)
- DLinear support vector machine
- ELinear regression
Explanation
{"question_number": 10, "correct_answer": "A, C", "explanation": "This is an unsupervised topic modeling problem: the goal is to discover latent topics in a corpus of text documents without labeled training data. (A) Latent Dirichlet Allocation (LDA) is the classical probabilistic generative model for topic modeling. LDA assumes each document is a mixture of topics and each topic is a distribution over words, producing interpretable lists of top words per topic - exactly what the auditors need to assess relevance. (C) Neural Topic Modeling (NTM) is Amazon SageMaker's built-in neural network-based topic modeling algorithm that scales well to large document corpora and produces similar word-per-topic outputs. Both LDA and NTM are unsupervised and output a specified number of topics (10 in this case) with associated top words. Options B (random forest), D (linear SVM), and E (linear regression) are all supervised algorithms that require labeled data and are not suited for unsupervised topic discovery.", "generated_by": "claude-sonnet", "llm_judge_score": 2}
Topics
Community Discussion
No community discussion yet for this question.