AI Models Achieve Passing Scores on Rigorous CFA Level III Exam, Study Finds
AI Models Achieve Passing Scores on Rigorous CFA Level III Exam, Study Finds
In a significant development for the financial industry and artificial intelligence, a new study has demonstrated that advanced AI large language models (LLMs) are now capable of passing the CFA Level III exam. This exam is renowned for its difficulty and is considered the pinnacle of professional certification for investment management professionals.
NYU Stern and Goodfin Lead Comprehensive AI Evaluation
The research, a collaboration between the NYU Stern School of Business and Goodfin, an AI wealth platform specializing in private market investments, aimed to rigorously assess the capabilities of LLMs in highly specialized and demanding domains like finance. The study, titled "Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III," benchmarked 23 leading AI models against the CFA Level III exam, a credential widely regarded as the gold standard in investment management.
Frontier Models Demonstrate Advanced Reasoning Capabilities
The findings reveal that cutting-edge LLMs are not only performing well but are now achieving scores that meet or exceed the passing standards for the CFA Level III exam. This marks a substantial leap from previous research, which indicated that while LLMs could pass Levels I and II, Level III remained a significant hurdle. The study highlights that models such as OpenAI's o4-mini achieved a composite score of 79.1%, and Google's Gemini 2.5 Flash model scored 77.3%. These results suggest a marked improvement in the reasoning, quantitative analysis, and strategic thinking abilities of these AI models.
Essay Performance Differentiates Top Models
While many of the evaluated LLMs demonstrated proficiency in the multiple-choice sections of the exam, the study identified that only a select few models excelled in the essay portion. This part of the exam is particularly challenging as it requires a deeper level of analysis, synthesis of information, and strategic application of financial concepts, closely mirroring the real-world tasks performed by human financial professionals. The ability of these models to articulate reasoned responses to complex, open-ended questions is a critical indicator of their advanced capabilities.
The Impact of Prompting Strategies on Performance
The research underscored the significant impact of prompting techniques on AI performance, particularly in the essay sections. The study found that employing "chain-of-thought" prompting—a method that encourages the LLM to explain its reasoning process step-by-step—substantially boosted essay accuracy by an average of 15 percentage points. This indicates that the way questions are posed to AI models can critically influence the quality and accuracy of their responses, a crucial factor for effective deployment in practical financial applications.
LLM as a Judge: A Stricter Evaluator
To grade the essay responses, Professor Srikanth Jagabathula, a key researcher from NYU Stern, utilized another LLM as a judge. This AI judge was provided with the essay response, the correct answer, contextual information about the question, and a grading rubric. Interestingly, the study found that the LLM acting as a judge was often stricter than human graders, assigning fewer points to the same responses. This suggests that AI, when prompted appropriately, can offer a rigorous and consistent evaluation standard, potentially even surpassing human leniency.
CFA Institute Emphasizes Holistic Qualifications
In response to the study's findings, Chris Wiese, managing director of education at the CFA Institute, acknowledged the advancements in AI but reiterated that passing the exams is only one component of earning the CFA designation. The charter also requires 4,000 hours of qualifying work experience, professional references, adherence to a strict code of ethics, and completion of practical skills modules. Wiese stated, "Without knowing the details of how this study was conducted, we can only note that at CFA Institute, we continue to believe that a combination of trust, human relationships, sound ethical judgment and professionalism are as important as ever in financial markets." He also noted the Institute's commitment to keeping its members informed about AI's growing utility in the investment management field.
AI as an Augmentation, Not a Replacement
When questioned about whether an LLM could replace a human CFA professional, Professor Jagabathula expressed caution. While acknowledging the rapid development of AI capabilities, he pointed to preliminary results from a smaller-scale user study. This study indicated that while LLMs are adept at providing precise answers to specific questions, they often struggle with capturing unstated context and can face challenges in building user trust. Currently, the consensus is that LLMs are powerful tools that can significantly augment the abilities of financial professionals, rather than fully replace them. The jury remains out on the long-term implications for job replacement.
Cost-Effectiveness and Model Selection
The study also delved into the cost-effectiveness of different AI models and prompting strategies. It was observed that models achieving the highest accuracy often came with the highest computational costs, sometimes up to 11 times more expensive. This highlights a critical trade-off for financial institutions considering AI deployment. Models like Gemini 2.5 Flash offered a more balanced performance-cost ratio. Furthermore, the research provided guidance on model selection, noting that while frontier models excel, specialized financial models like Palmyra-fin demonstrated respectable performance at lower costs, making them strategic options for specific applications.
Future of Wealth Management and AI Integration
The implications of AI mastering the CFA Level III exam extend to the future of wealth management. Companies like Goodfin are leveraging these advancements to democratize access to sophisticated financial products and advice. Their platform utilizes AI agents to manage client onboarding, provide product information, conduct market research, tailor guidance, and execute investment actions. This research validates the potential for AI to redefine wealth management, making institutional-grade financial expertise more accessible and affordable to a broader audience. As AI continues to evolve, its role in augmenting human expertise and transforming client experiences in the financial sector is expected to grow significantly.
Limitations and Future Research
The study acknowledges certain limitations, including the use of mock exam questions rather than official CFA materials and the inherent challenges in fully capturing the nuances of human professional judgment through automated evaluation. Future research directions include improving AI self-assessment calibration, expanding evaluations to other financial certifications, and further exploring the integration of AI with human advisors. The findings emphasize the need for continued human oversight in financial decision-making, positioning AI as a powerful assistive tool rather than an autonomous replacement for financial professionals.
AI Summary
A groundbreaking study conducted by the NYU Stern School of Business and Goodfin, an AI wealth platform, has revealed that leading AI large language models (LLMs) are now capable of passing the CFA Level III exam. This exam is widely recognized as one of the most challenging professional assessments in the finance industry, requiring advanced financial reasoning, analysis, synthesis, and strategic thinking. The study evaluated 23 state-of-the-art LLMs, including prominent models from OpenAI, Google, and Anthropic. Key findings indicate that frontier models achieved composite scores that surpass the estimated passing threshold for the exam. For instance, OpenAI