Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this: You fol

Sign in or unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to reveal the answer and full explanation for question #24. The question stem and answer options stay visible for context.

Submitted by stefanr· Apr 18, 2026Data processing and feature engineering

Question

Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this: You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?

Exhibit

Options

ADistribute texts randomly across the train-test-eval subsets:
BDistribute authors randomly across the train-test-eval subsets: (*)
CDistribute sentences randomly across the train-test-eval subsets:
DDistribute paragraphs of texts (i.e., chunks of consecutive sentences) across the train-test-eval

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER to see the answer

You've previewed enough free PROFESSIONAL-MACHINE-LEARNING-ENGINEER questions. Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock PROFESSIONAL-MACHINE-LEARNING-ENGINEER - $49.99 / 30 days Sign in

Topics

#Data Splitting#Data Leakage#Dataset Preparation#Train-Test-Eval

Full PROFESSIONAL-MACHINE-LEARNING-ENGINEER Practice