AI Model Scores Well on Law School Exams, Study Shows

The latest generation of generative artificial intelligence has proven capable of acing law school final exams, according to a recent study. OpenAI’s new model, called o3, earned grades ranging from A+ to B on eight spring finals at the University of Maryland Francis King Carey School of Law. The findings, published in a new paper on SSRN, show significant improvement from earlier versions of ChatGPT, which scored lower grades in similar tests in 2022 and 2023.

Improvement from Previous Models

Previous versions of OpenAI’s ChatGPT, which took law school finals in earlier studies, scored B’s, C’s, and even a D. However, o3 has shown marked progress. Unlike ChatGPT, which immediately generates responses to queries, o3 is a reasoning model. This means that o3 generates tentative answers and evaluates multiple approaches before providing a final response. This more thoughtful approach is credited for its impressive performance on law exams.

How o3 Performed

The study, conducted by seven law professors from the University of Maryland, tested o3 on exams graded on the same curve used for students. The AI model received an A+ in Constitutional Law, Professional Responsibility, and Property. It scored an A in Income Taxation, and an A- in Criminal Procedure. In other areas, it earned a B+ in Secured Transactions and Torts, and a B in Administrative Law. o3 performed well on both multiple-choice questions and essay-style responses.

Challenges for o3

Despite its impressive performance, there were limitations. The relatively low grade in Administrative Law was due to o3’s lack of knowledge about the 2024 U.S. Supreme Court decision in Loper Bright Enterprises v. Raimondo, which overturned the Chevron doctrine. This ruling occurred after o3’s knowledge cutoff date. Additionally, o3 performed worse when given access to professor’s notes, as it was “distracted” by the excess information.

Future Experiments and Concerns

The study’s authors are already considering a follow-up experiment to explore AI’s potential as a cheating tool. They plan to instruct the AI to occasionally make spelling and grammar mistakes to see if its work could pass for that of a real student. This could raise concerns about AI’s role in academic integrity.

Conclusions and Next Steps

While OpenAI did not comment on the study, its findings suggest that AI is rapidly catching up to human-level performance in certain academic fields, raising both potential and challenges for the future of education. Further studies will likely delve deeper into the implications of AI’s involvement in assessments and its impact on academic systems.