nerdexam
AmazonAmazon

MLA-C01 · Question #106

MLA-C01 Question #106: Real Exam Question with Answer & Explanation

The correct answer is B: Decrease the learning rate. Reduce the batch size.. Decreasing the learning rate stabilizes the optimizer by taking smaller, more controlled steps - this directly addresses why identical runs land in different local minima, since a high learning rate causes large jumps sensitive to stochastic variation (data shuffling, dropout, et

ML Model Development

Question

A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist do to improve the training process?

Options

  • AIncrease the learning rate. Keep the batch size the same.
  • BDecrease the learning rate. Reduce the batch size.
  • CDecrease the learning rate. Keep the batch size the same.
  • DDo not change the learning rate. Increase the batch size.

Explanation

Decreasing the learning rate stabilizes the optimizer by taking smaller, more controlled steps - this directly addresses why identical runs land in different local minima, since a high learning rate causes large jumps sensitive to stochastic variation (data shuffling, dropout, etc.). Reducing the batch size simultaneously introduces beneficial gradient noise that acts as regularization, helping the model escape sharp, narrow minima and consistently find flatter, more generalizable solutions.

Why the distractors are wrong:

  • A - Increasing the learning rate amplifies the instability, making convergence even more sensitive and erratic.
  • C - Lowering the learning rate alone helps but doesn't address the batch size; large batches reduce gradient variance and can lock the optimizer into sharp minima, perpetuating inconsistent results.
  • D - Leaving the learning rate unchanged ignores the root cause; increasing batch size reduces noise but cannot compensate for an LR that's too large to converge reliably.

Memory tip: Think "volatile training = overcorrecting steps + stale gradients." Cure it with smaller steps (lower LR) and fresher, noisier feedback (smaller batches) - the combination tames the optimizer without killing its ability to explore.

Topics

#Model Training#Hyperparameter Tuning#Learning Rate#Batch Size

Community Discussion

No community discussion yet for this question.

Full MLA-C01 PracticeBrowse All MLA-C01 Questions