PROFESSIONAL-DATA-ENGINEER · Question #211
PROFESSIONAL-DATA-ENGINEER Question #211: Real Exam Question with Answer & Explanation
Sign in or unlock PROFESSIONAL-DATA-ENGINEER to reveal the answer and full explanation for question #211. The question stem and answer options stay visible for context.
Question
You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?
Options
- AIncrease the share of the test sample in the train-test split.
- BTry to collect more data and increase the size of your dataset.
- CTry out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
- DIncrease the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.
Unlock PROFESSIONAL-DATA-ENGINEER to see the answer
You've previewed enough free PROFESSIONAL-DATA-ENGINEER questions. Unlock PROFESSIONAL-DATA-ENGINEER for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.