nerdexam
AmazonAmazon

MLS-C01 · Question #149

MLS-C01 Question #149: Real Exam Question with Answer & Explanation

The correct answer is A: Increase the batch size by a factor of 8. The problem of multi-GPU training being slower than single-GPU often stems from inefficient GPU utilization, which can be mitigated by scaling up the batch size. To maintain training effectiveness with a larger batch size, the learning rate should also be adjusted proportionally.

ML Implementation and Operations

Question

A manufacturer has deployed an array of 50000 sensors throughout its plant to predict failures in components. Data Scientists have built a long short-term memory (LSTM) model in Gluon and are training it using the Amazon SageMaker API. The Data Scientists are training the model using a time series with 10 million examples. Training is currently taking 100 hours and Data Scientists are attempting to speed it up by using multiple GPUs. However, when they modified the code to use 8 GPUs, it is running slightly slower than on 1 GPU. The current hyperparameter settings are: Hyperparameter Value Batch size 128 Clip gradient 10 Autoregressive window 160 Learning rate 0.01 Epochs 80 Which of the following changes together are recommended to speed up training on 8 GPUs while maintaining test accuracy? (Select TWO)

Options

  • AIncrease the batch size by a factor of 8
  • BIncrease the clip gradient by 8
  • CIncrease the autoregressive window by a factor of 8
  • DIncrease the learning rate by a factor of 8
  • EDecrease the number of epochs by 20

Explanation

The problem of multi-GPU training being slower than single-GPU often stems from inefficient GPU utilization, which can be mitigated by scaling up the batch size. To maintain training effectiveness with a larger batch size, the learning rate should also be adjusted proportionally.

Common mistakes.

  • B. Increasing the clip gradient value might make the model more prone to exploding gradients or might not effectively address the inefficiency issues in multi-GPU training.
  • C. Increasing the autoregressive window size would increase the input dimension and computational load per example, likely slowing down training rather than speeding it up.
  • E. Decreasing the number of epochs directly reduces the total training time, but it often comes at the cost of reduced test accuracy because the model may not have enough time to converge optimally.

Concept tested. Distributed deep learning training optimization (batch size, learning rate scaling)

Reference. https://aws.amazon.com/blogs/machine-learning/train-deep-learning-models-faster-with-large-batches-on-amazon-sagemaker/

Topics

#Distributed Training#Hyperparameter Optimization#Model Training Performance#Batch Size Scaling

Community Discussion

No community discussion yet for this question.

Full MLS-C01 PracticeBrowse All MLS-C01 Questions