CMU의 "Machine Learning in Production" (MLIP) / "AI Engineering" 강의

9bow · 1월 31, 2025, 3:30오전

CMU의 Machine Learning in Production 및 AI Engineering 강의 소개

CMU(카네기 멜론 대학교)의 Machine Learning in Production (MLIP) 및 AI Engineering 강의는 머신러닝 모델을 단순히 만드는 것에서 끝나는 것이 아니라, 이를 실제 제품으로 개발하고 배포하는 전 과정에 초점을 맞춘 실무 중심의 강의입니다. 이 강의에서는 AI 엔지니어링과 MLOps의 핵심 개념을 다루며, 최신 기술과 연구를 기반으로 실무적인 경험을 제공하며, AI 시스템의 책임성(Responsible AI), 안정성(Safety), 보안(Security), 공정성(Fairness), 설명 가능성(explainability) 등과 같이 실전에서 꼭 필요한 내용들을 포함하고 있습니다. AI 시스템을 실제 서비스로 제공하려면 MLOps, 데이터 품질 관리, 지속적 배포(CD), 확장성 등 여러 요소를 고려해야 합니다. 본 강의는 이러한 내용을 깊이 있게 다루며, 최신 기술과 연구 결과를 기반으로 실제 프로덕션 환경에서 AI 시스템을 운영할 수 있는 역량을 키우는 데 중점을 둡니다.

MLIP 및 AI Engineering 강의 개요

CMU의 "Machine Learning in Production" (MLIP) / "AI Engineering" 강의 구성 개요

MLIP(Machine Learning in Production) 강의(MLIP: 17-445/17-645/17-745, AI Engineering: 11-695)는 2025년 봄 학기에 개설되며, 머신러닝 모델을 실제 프로덕션 환경에서 안정적으로 운영하는 방법을 학습할 수 있도록 구성되어 있습니다. 데이터 과학(Data Science)의 이론적 내용이 아닌, 소프트웨어 엔지니어링과 머신러닝이 협업하는 방식을 중점적으로 다루며, 머신러닝을 활용한 제품 개발과 운영에 관심이 있는 분들에게 적합합니다. 강의에서는 다양한 사례 연구를 통해 AI 시스템이 실무에서 어떻게 활용되는지를 탐구합니다.

예를 들어, 자동 음성 인식 및 실시간 번역이나 스마트 의료 진단 시스템, 추천 시스템(예: 영화 추천), 스마트 교통 시스템 및 자율주행 기술 등을 실제 사례로 분석하며, 이를 AI 시스템에 적용하는 방안을 학습합니다. 또한, Apache Kafka, Docker, Jenkins, Prometheus, Grafana 등 최신 MLOps 도구를 활용하여 모델을 효과적으로 배포하고 모니터링하는 방법도 익힙니다.

이 강의는 기본적인 머신러닝 지식(예: scikit-learn 사용 경험)과 파이썬 프로그래밍 능력을 필요로 하지만, 소프트웨어 엔지니어링 경험이 없더라도 수강이 가능합니다.

강의 구성

주요 내용

이 강의는 다음과 같은 핵심 내용을 포함하고 있습니다:

프로덕션 환경에서의 머신러닝 시스템 설계
- 모델이 오류를 일으킬 가능성을 고려한 안전한 시스템 구축
- 사용자 인터페이스 및 전체 시스템 아키텍처 설계
MLOps 및 배포 전략
- 지속적 통합(CI) 및 지속적 배포(CD) 파이프라인 구축
- A/B 테스트 및 Canary Deployment를 통한 실험적 배포
데이터 품질 및 모델 유지보수
- 데이터 드리프트 감지 및 대응
- 데이터 품질 평가 및 자동화된 테스트 기법
확장성 및 대규모 데이터 처리
- 대량의 데이터 및 사용자 요청을 처리하는 시스템 설계
- 배치 프로세싱 vs 스트림 프로세싱 아키텍처 비교
책임 있는 AI 개발
- 공정성(Fairness), 설명 가능성(Explainability), 프라이버시 및 보안 고려
- 알고리즘적 편향(Bias) 문제 해결 및 윤리적 AI 구현

실습 및 프로젝트

학생들은 100만명의 사용자(1 million users)에게 추천 서비스를 제공하는 영화 추천 시스템을 개발하는 실습 기반 프로젝트 를 수행하게 됩니다. 이를 통해 실제 프로덕션 환경에서 AI 모델을 배포하고 운영하는 경험을 쌓을 수 있습니다.

또한, Apache Kafka, Docker, Jenkins, Prometheus, Grafana, MLFlow 등 다양한 MLOps 최신 도구 를 활용하여 실무에서 필요로 하는 기술을 익히는 것을 목표로 합니다.

전체 강의 내용

Date	Topic	Book Chapter	Reading	Assignment due
Mon, Jan 13	Introduction and Motivation (md, pdf)	1
Wed, Jan 15	From Models to AI-Enabled Systems (md, pdf)	2,4,5	Building Intelligent Systems, Ch. 4, 5, 7, 8
Fri, Jan 17	Calling, securing, and creating APIs: Flask
Mon, Jan 20	MLK Day, no classes
Wed, Jan 22	Gathering Requirements (md, pdf)	6	The World and the Machine
Fri, Jan 24	Stream processing: Apache Kafka
Mon, Jan 27	Planning for Mistakes (md, pdf)	7	Building Intelligent Systems, Ch. 6, 8, 24	I1: ML Product
Wed, Jan 29	Model Quality (md, pdf)	15	Building Intelligent Systems, Ch. 19
Fri, Jan 31	Collaboration with git
Mon, Feb 3	Fostering Interdisciplinary (Student) Teams (md, pdf)		Meetings	I2: Requirements
Wed, Feb 5	Behavioral Model Testing (md, pdf)	15	Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Fri, Feb 7	Model testing
Mon, Feb 10	Toward Architecture and Design (md, pdf)	8,9,11	Building Intelligent Systems, Ch. 18 and Choosing the Right Machine Learning Algorithm
Wed, Feb 12	Deploying a Model (md, pdf)	10	Building Intelligent Systems, Ch. 13 and Machine Learning Design Patterns, Pat. 16
Fri, Feb 14	Containers: Docker
Mon, Feb 17	Testing and Experimenting in Production (md, pdf)	19	Building Intelligent Systems, Chs. 14 and 15	M1: Modeling and First Deployment
Wed, Feb 19	Data Quality (md, pdf)	16	Data Cascades in High-Stakes AI
Fri, Feb 21	Continuous Integration: Jenkins
Mon, Feb 24	Automating and Testing ML Pipelines (md, pdf)	11,18,19	The ML Test Score
Wed, Feb 26	Midterm 1
Fri, Feb 28	No lab (happy spring break)
Mon, Mar 3	Spring break, no classes
Wed, Mar 5	Spring break, no classes
Fri, Mar 7	Spring break, no classes
Mon, Mar 10	Scaling the System (md, pdf)	12	Big Data: Principles and best practices of scalable realtime data systems, Ch. 1
Wed, Mar 12	Planning for Operations (md, pdf)	13	Operationalizing machine learning: An interview study
Fri, Mar 14	Monitoring: Prometheus, Grafana
Mon, Mar 17	Versioning, Provenance, and Reproducability (md, pdf)	24	Hidden Technical Debt in Machine Learning Systems	M2: Infrastructure Quality
Wed, Mar 19	Process & Technical Debt (md, pdf)	20	Collaboration Challenges in Building ML-Enabled Systems
Fri, Mar 21	Pipeline automation: MLFlow
Mon, Mar 24	Intro to Ethics + Fairness (md, pdf)	23,26	Algorithmic Accountability: A Primer	I3: MLOps Tools
Wed, Mar 26	Measuring Fairness (md, pdf)	26	Human Perceptions of Fairness in Algorithmic Decision Making
Fri, Mar 28	Container orchestration: Kubernetis
Mon, Mar 31	Building Fairer Systems (md, pdf)	26	Improving Fairness in Machine Learning Systems
Wed, Apr 2	AVAILABLE (md, pdf)	25
Fri, Apr 4	Carnival, no classes
Mon, Apr 7	Explainability (md, pdf)	29	Interpretability Podcast or equivalent artice	M3: Monitoring and CD
Wed, Apr 9	Transparency & Accountability (md, pdf)	28	Google chapter on Explainability and Trust
Fri, Apr 11	Model Explainability Tools
Mon, Apr 14	Security and Privacy (md, pdf)	27	Building Intelligent Systems, Ch. 25, and The Top 10 Risks of Machine Learning Security	I4: Explainability
Wed, Apr 16	Safety (md, pdf)		Practical Solutions for Machine Learning Safety in Autonomous Vehicles
Fri, Apr 18	LLM Jailbreaking
Mon, Apr 21	Explainability Discussion / Summary / Review
Wed, Apr 23	Midterm 2
Fri, Apr 25	No lab			M4: Fairness, Security and Feedback Loops
TBD	Final Project Presentations (5:30-8:30pm in GHC 4401)			Final report

라이선스

본 강의의 자료(강의 슬라이드, 과제 등)는 Creative Commons License로 공개되어, 누구나 GitHub에서 열람하고 활용하실 수 있습니다.

강의 홈페이지

https://mlip-cmu.github.io/s2025/

강의 도서: Machine Learning in Production: From Models to Productions

https://mlip-cmu.github.io/book/

강의 GitHub 저장소

이 글은 GPT 모델로 정리한 글을 바탕으로 한 것으로, 원문의 내용 또는 의도와 다르게 정리된 내용이 있을 수 있습니다. 관심있는 내용이시라면 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)

아래쪽에 좋아요를 눌러주시면 새로운 소식들을 정리하고 공유하는데 힘이 됩니다~