{"id":31270,"date":"2025-10-29T12:38:18","date_gmt":"2025-10-29T12:38:18","guid":{"rendered":"https:\/\/naijaglobalnews.org\/?p=31270"},"modified":"2025-10-29T12:38:18","modified_gmt":"2025-10-29T12:38:18","slug":"we-need-a-new-turing-test-to-assess-ais-real-world-knowledge","status":"publish","type":"post","link":"https:\/\/naijaglobalnews.org\/?p=31270","title":{"rendered":"We need a new Turing test to assess AI\u2019s real-world knowledge"},"content":{"rendered":"<p>\n <\/p>\n<p>Artificial intelligence (AI) models can perform as well as humans on law exams when answering multiple-choice, short-answer and essay questions (A. Blair-Stanek et al. Preprint at SSRN https:\/\/doi.org\/p89q; 2025), but they struggle to perform real-world legal tasks. Some lawyers have learnt that the hard way, and have been fined for filing AI-generated court briefs that misrepresented principles of law and cited non-existent cases. The same is true in other fields. For example, AI models can pass the gold-standard test in finance \u2014 the Chartered Financial Analyst exam \u2014 yet score poorly on simple tasks required of entry-level financial analysts (see go.nature.com\/42tbrgb).<\/p>\n<p><p class=\"recommended__title u-serif\">How should we test AI for human-level intelligence? OpenAI\u2019s o3 electrifies quest<\/p>\n<\/p>\n<p>Whenever assessments measure the intended skill inaccurately, it is considered a proxy failure. For example, a lawyer who scored A+ on an exam would be expected to avoid the kinds of error that an AI tool with a similar score might make in a real-world scenario. Better tests are urgently required to help guide the use of AI in complex, high-stakes situations.<\/p>\n<p>One promising idea emerged in March at an Association for the Advancement of Artificial Intelligence workshop in Philadelphia, Pennsylvania: through extensive interaction, a specialist can tell whether an AI system genuinely understands or is merely imitating understanding.<\/p>\n<p>Imagine an AI model attempting to \u2018pass\u2019 an interview with an acclaimed legal scholar such as Cass Sunstein at Harvard University in Cambridge, Massachusetts. Sunstein\u2019s expert probing would be a better measure of the model\u2019s legal knowledge than a standardized test or automatically scored benchmark. Passing the \u2018Sunstein test\u2019 would require an AI tool to display true legal mastery, being able to wade through ambiguity and contradiction, and not just answer multiple-choice questions or write an essay.<\/p>\n<p>One might ask: why not simply test an AI model\u2019s legal readiness with task-specific benchmarks, similar to those used in medicine for checking an AI tool\u2019s ability to take notes for a physician? The goal, however, is not to test an AI tool\u2019s ability to perform a specific legal task, or even a long list of them, but to test whether it has general-purpose legal knowledge that it can exercise systematically when performing any task.<\/p>\n<p><p class=\"recommended__title u-serif\">Why evaluating the impact of AI needs to start now<\/p>\n<\/p>\n<p>I am not suggesting that Sunstein, or any single authority, should be appointed as the arbiter of AI expertise. The goal is to build systems that leading legal specialists broadly agree demonstrate genuine, trustworthy legal knowledge. A \u2018robo-lawyer\u2019 would need to cope in a diverse range of interviews with panels of experts \u2014 ranging from tax and constitutional lawyers to clerks, traffic officers and legal-aid workers. Such an approach would reduce issues around individual or ideological bias and avoid the trap of AI models merely mimicking one person\u2019s style.<\/p>\n<p>Could a machine reach human levels of expertise, subtlety and ethics? Only specialists can say. But imagine a US Supreme Court justice grilling an AI robo-lawyer in public. That would get everyone\u2019s attention. It would be a spectacle much like multinational technology corporation IBM\u2019s 2011 challenge on the US television quiz programme Jeopardy!. The company pitted its supercomputer Watson against human champions to demonstrate how far machine reasoning and natural-language processing had come.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence (AI) models can perform as well as humans on law exams when answering multiple-choice, short-answer and essay questions (A. Blair-Stanek et al. Preprint at SSRN https:\/\/doi.org\/p89q; 2025), but they struggle to perform real-world legal tasks. Some lawyers have learnt that the hard way, and have been fined for filing AI-generated court briefs that<\/p>\n","protected":false},"author":1,"featured_media":31271,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[58],"tags":[4690,735,4036,10192,76,9756],"class_list":{"0":"post-31270","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-science","8":"tag-ais","9":"tag-assess","10":"tag-knowledge","11":"tag-realworld","12":"tag-test","13":"tag-turing"},"_links":{"self":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/31270","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=31270"}],"version-history":[{"count":0,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/31270\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/media\/31271"}],"wp:attachment":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=31270"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=31270"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=31270"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}