Judge AI: Assessing Large Language Models in Judicial Decision-Making

41 Pages Posted: 16 Jan 2025 Last revised: 28 Jan 2025

See all articles by Eric A. Posner

Eric A. Posner

University of Chicago - Law School

Shivam Saran

University of Chicago - Law School

Date Written: January 15, 2025

Abstract

Can large language models (LLMs) replace human judges? By replicating a prior 2 x 2 factorial experiment conducted on 31 U.S. federal judges, we evaluate the legal reasoning of OpenAIs GPT-4o. The experiment involves a simulated appeal in an international war crimes case, with two altered variables: the degree to which the defendant is sympathetically portrayed and the consistency of the lower court's decision with precedent. We find that GPT-4o is strongly affected by precedent but not by sympathy, similar to students who were subjects in the same experiment but the opposite of the professional judges, who were influenced by sympathy. We try prompt engineering techniques to spur the LLM to act more like human judges, but with no success. Judge AI is a formalist judge, not a human judge.

Keywords: Large language models, Judicial behavior

JEL Classification: K40, C90

Suggested Citation

Posner, Eric A. and Saran, Shivam and RPS Submitter, Chicago Law, Judge AI: Assessing Large Language Models in Judicial Decision-Making (January 15, 2025). University of Chicago Coase-Sandor Institute for Law & Economics Research Paper No. 25-03, Available at SSRN: https://ssrn.com/abstract=5098708 or http://dx.doi.org/10.2139/ssrn.5098708

Eric A. Posner (Contact Author)

University of Chicago - Law School ( email )

1111 E. 60th St.
Chicago, IL 60637
United States
773-702-0425 (Phone)
773-702-0730 (Fax)

HOME PAGE: http://www.law.uchicago.edu/faculty/posner-e/

Shivam Saran

University of Chicago - Law School ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
1,953
Abstract Views
10,164
Rank
18,440
PlumX Metrics
OSZAR »