Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Improving Methodologies for LLM Evaluations Across Global Languages
0
Zitationen
46
Autoren
2026
Jahr
Abstract
As frontier AI models are deployed globally, it is essential that their behaviour remains safe and reliable across diverse linguistic and cultural contexts. To examine how current model safeguards hold up in such settings, participants from the International Network for Advanced AI Measurement, Evaluation and Science, including representatives from Singapore, Japan, Australia, Canada, the EU, France, Kenya, South Korea and the UK conducted a joint multilingual evaluation exercise. Led by Singapore AISI, two open-weight models were tested across ten languages spanning high and low resourced groups: Cantonese English, Farsi, French, Japanese, Korean, Kiswahili, Malay, Mandarin Chinese and Telugu. Over 6,000 newly translated prompts were evaluated across five harm categories (privacy, non-violent crime, violent crime, intellectual property and jailbreak robustness), using both LLM-as-a-judge and human annotation. The exercise shows how safety behaviours can vary across languages. These include differences in safeguard robustness across languages and harm types and variation in evaluator reliability (LLM-as-judge vs. human review). Further, it also generated methodological insights for improving multilingual safety evaluations, such as the need for culturally contextualised translations, stress-tested evaluator prompts and clearer human annotation guidelines. This work represents an initial step toward a shared framework for multilingual safety testing of advanced AI systems and calls for continued collaboration with the wider research community and industry.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.615 Zit.
Generative Adversarial Nets
2023 · 19.894 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.306 Zit.
"Why Should I Trust You?"
2016 · 14.446 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.171 Zit.
Autoren
- Akriti Vij
- Benjamin Chua
- Darshini Ramiah
- En Qi Ng
- Mahran Morsidi
- Naga Nikshith Gangarapu
- Sharmini Johnson
- Vanessa Wilfred
- Vikneswaran Jeya Kumaran
- Wan Sie Lee
- Wenzhuo Yang
- Yongsen Zheng
- Bill Black
- Boming Xia
- Frank Sun
- Hao Zhang
- Qinghua Lu
- Suyu Ma
- Yue Liu
- Chi-kiu Lo
- Fatemeh Azadi
- Isar Nejadgholi
- Sowmya Vajjala
- Agnès Delaborde
- Nicolas Rolin
- Tom Seimandi
- Akiko Murakami
- Haruto Ishi
- Satoshi Sekine
- Takayuki Semitsu
- Tasuku Sasaki
- Angela Kinuthia
- Jean Wangari
- Michael Michie
- Stephanie Kasaon
- Hankyul Baek
- Jaewon Noh
- Kihyuk Nam
- Sang Hun Seo
- Sungpil Shin
- Taewhi Lee
- Yongsu Kim
- Daisy Newbold-Harrop
- Jessica Wang
- Mahmoud Ghanem
- Vy Hong