Skip to content Skip to main menu

Faculty
Youth Program
Undergrad
MBA
EMBA
PhD
Exec Ed
Wharton Online
Alumni

Wharton Faculty Platform

Menu

Research and Publications
All Faculty
Departments

Evaluating the Performance of Large Language Models via Debates

June 13, 2025 | Seyed Hamed Hassani

Post navigation

← A Confidence Interval for the ℓ2 Expected Calibration Error

Joint Coverage Regions: Simultaneous Confidence and Prediction Sets →

Additional Links

Wharton Faculty Platform

Research and Publications
All Faculty
Departments

Programs
Undergraduate
MBA
EMBA
Doctorate
Executive Education
Wharton Online

Locations
Philadelphia
San Francisco
Beijing

The Power of Wharton
Global Influence
Analytics
Entrepreneurship & Innovation

Featured
Give to Wharton
Alumni
Knowledge@Wharton
Recruiters & Corporations

Wharton
Faculty
About Us
Research Centers
Departments

Resources
Contact Us
News
Faculty & Staff

Support Wharton

©2025 The Wharton School, The University of Pennsylvania | Wharton Faculty Platform | Privacy Policy | Report Accessibility Issues and Get Help