주제에 llm-benchmark 태그가 달렸습니다

글	댓글	조회수	활동
The LLM Evaluation Guidebook: Hugging Face가 공개한 LLM 평가를 위한 종합적이고 실질적인 안내서 읽을거리&정보공유 huggingface , guide , llm-evaluation , benchmark , evaluation-framework , evaluation-tool , llm-benchmark , llm-evaluation-guidebook , evaluation-datasets	0	510	12월 9, 2025
LLM Optimizer: 다양한 LLM의 추론 성능을 벤치마킹하고 최적화하는 도구 (feat. BentoML) 읽을거리&정보공유 evaluation-tool , llm-benchmark , bentoml , llm-optimizer	0	302	9월 12, 2025
[2025/08/25 ~ 31] 이번 주에 살펴볼 만한 AI/ML 논문 모음 읽을거리&정보공유 paper , time-series , multimodal-agent , visual-reasoning , llm-benchmark , chain-of-agents , ai-ml-papers-of-the-week , computerrl , embedding-search-limitations , survey , timemaster , time-r1 , compositional-visual-reasoning , m3-agent , agent-fine-tuning , meta-clip2	0	4482	8월 31, 2025
[2025/07/07 ~ 13] 이번 주에 살펴볼 만한 AI/ML 논문 모음 읽을거리&정보공유 paper , ai-security , survey-paper , small-llm , llm-benchmark , multi-agent-orchestration , ai-ml-papers-of-the-week , embodied-web-agents , deep-research-bench , llm-agent-communication , math-reasoning-transferability , ai4research , ab-mcts , ai-collaboration	1	829	7월 16, 2025
대규모 언어 모델(LLM) 기반 합성 데이터(Synthetic Data)의 생성, 큐레이션 및 평가에 대한 종합적인 연구(Survey) 읽을거리&정보공유 paper , synthetic-data , survey-paper , benchmark , llm-benchmark , data-generation , data-curation , data-evaluation	0	2650	7월 5, 2024
:hugs: Hugging Face의 OpenLLM 리더보드 개선: Open-LLM Leaderboard v2 읽을거리&정보공유 huggingface , openllm-leaderboard , gpqa , mmlu-pro , llm-benchmark , llm-leaderboard , open-llm , musr , math , ifeval , bbh	0	1774	7월 3, 2024
Salesforce, CRM을 위한 LLM 벤치마크와 리더보드 공개 읽을거리&정보공유 salesforce , leaderboard , llm-benchmark , llm-for-crm , llm-for-business , llm-leaderboard	0	407	6월 27, 2024
MMLU-Pro, LLM 성능 평가를 위한 벤치마크인 MMLU의 개선된 버전 읽을거리&정보공유 dataset , llm-evaluation , benchmark , mmlu , mmlu-pro , llm-benchmark	0	2962	5월 21, 2024

The LLM Evaluation Guidebook: Hugging Face가 공개한 LLM 평가를 위한 종합적이고 실질적인 안내서

huggingface , guide , llm-evaluation , benchmark , evaluation-framework , evaluation-tool , llm-benchmark , llm-evaluation-guidebook , evaluation-datasets

0

510

12월 9, 2025

LLM Optimizer: 다양한 LLM의 추론 성능을 벤치마킹하고 최적화하는 도구 (feat. BentoML)

읽을거리&정보공유

evaluation-tool , llm-benchmark , bentoml , llm-optimizer

0

302

9월 12, 2025

[2025/08/25 ~ 31] 이번 주에 살펴볼 만한 AI/ML 논문 모음

읽을거리&정보공유

paper , time-series , multimodal-agent , visual-reasoning , llm-benchmark , chain-of-agents , ai-ml-papers-of-the-week , computerrl , embedding-search-limitations , survey , timemaster , time-r1 , compositional-visual-reasoning , m3-agent , agent-fine-tuning , meta-clip2

0

4482

8월 31, 2025

[2025/07/07 ~ 13] 이번 주에 살펴볼 만한 AI/ML 논문 모음

읽을거리&정보공유

paper , ai-security , survey-paper , small-llm , llm-benchmark , multi-agent-orchestration , ai-ml-papers-of-the-week , embodied-web-agents , deep-research-bench , llm-agent-communication , math-reasoning-transferability , ai4research , ab-mcts , ai-collaboration