SECTION 7.4 참고
• Adam 옵티마이저 논문: Diederik P. Kingma and Jimmy Ba, “Adam: A Method for Stochastic Optimization” (2014), https://arxiv.org/abs/1412.6980
• 다중 GPU 훈련을 위한 DeepSpeed와 Colossal AI 자료: https://github.com/microsoft/DeepSpeed, https://github.com/hpcaitech/ColossalAI
• DeepSpeed 팀의 파이프라인 병렬화 튜토리얼과 연구: https://www.deepspeed.ai/tutorials/pipeline, Yanping Huang et al, “GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism” (2018), https://arxiv.org/abs/1811.06965