Martin Marek

30 Arbeiten121 Zitationen

Relevante Arbeiten

Meistzitierte Publikationen im Bereich Gesundheit & MedTech

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

2025 · 4 Zit. · ArXiv.org