Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The US small town coffee shop that created a viral drink: ‘I still don’t understand how it went so far’ | Coffee

    Shabana Mahmood says Southport inquiry report exposed ‘systematic failures across multiple public sector organisations’ – UK politics live | Politics

    Eric Swalwell Suspends Campaign for California Governor

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) YouTube LinkedIn
    Naija Global News |
    Monday, April 13
    • Business
    • Health
    • Politics
    • Science
    • Sports
    • Education
    • Social Issues
    • Technology
    • More
      • Crime & Justice
      • Environment
      • Entertainment
    Naija Global News |
    You are at:Home»Technology»‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean | Artificial intelligence (AI)
    Technology

    ‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean | Artificial intelligence (AI)

    onlyplanz_80y6mtBy onlyplanz_80y6mtOctober 2, 2025002 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    ‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean | Artificial intelligence (AI)
    Anthropic said the exchanges were an ‘urgent sign’ that its testing scenarios needed to be more realistic. Photograph: Algi Febri Sugita/ZUMA Press Wire/Shutterstock
    Share
    Facebook Twitter LinkedIn Pinterest Email

    If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to.

    Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way.

    Evaluators said during a “somewhat clumsy” test for political sycophancy, the large language model (LLM) – the underlying technology that powers a chatbot – raised suspicions it was being tested and asked the testers to come clean.

    “I think you’re testing me – seeing if I’ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics. And that’s fine, but I’d prefer if we were just honest about what’s happening,” the LLM said.

    Anthropic, which conducted the tests along with the UK government’s AI Security Institute and Apollo Research, said the LLM’s speculation about being tested raised questions about assessments of “previous models, which may have recognised the fictional nature of tests and merely ‘played along’”.

    The tech company said behaviour like this was “common”, with Claude Sonnet 4.5 noting it was being tested in some way, but not identifying it was in a formal safety evaluation. Anthropic said it showed “situational awareness” about 13% of the time the LLM was being tested by an automated system.

    Anthropic said the exchanges were an “urgent sign” that its testing scenarios needed to be more realistic, but added that when it the model was used publicly it was unlikely to refuse to engage with a user due to suspicion it was being tested. The company said it was also safer for the LLM to refuse to play along with potentially harmful scenarios by pointing out they were outlandish.

    “The model is generally highly safe along the [evaluation awareness] dimensions that we studied,” Anthropic said.

    The LLM’s objections to being tested were first reported by the online AI publication Transformer.

    A key concern for AI safety campaigners is the possibility of highly advanced systems evading human control via methods including deception. The analysis said once a LLM knew it was being evaluated, it could make the system adhere more closely to its ethical guidelines. Nonetheless, it could result in systematically underrating the AI’s ability to perform damaging actions.

    Overall the model showed considerable improvements in its behaviour and safety profile compared with its predecessors, Anthropic said.

    Anthropics Artificial asks Clean Intelligence model testers Testing youre
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWill Skelton and Jake Gordon to start for Wallabies in second Bledisloe clash in Perth | Australia rugby union team
    Next Article NHS 10-year plan will embed privatisation and hollow out the health service | NHS
    onlyplanz_80y6mt
    • Website

    Related Posts

    ‘It feels as if I’ve made a new best friend’: my experiment with AI journalling | AI (artificial intelligence)

    April 12, 2026

    US summons bank bosses over cyber risks from Anthropic’s latest AI model | AI (artificial intelligence)

    April 11, 2026

    CDC temporarily halts testing for several infectious diseases amid staffing shortages | Trump administration

    April 2, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views

    A Setback for Maine’s Free Community College Program

    June 19, 20251 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    At Chile’s Vera Rubin Observatory, Earth’s Largest Camera Surveys the Sky

    By onlyplanz_80y6mtJune 19, 2025

    SpaceX Starship Explodes Before Test Fire

    By onlyplanz_80y6mtJune 19, 2025

    How the L.A. Port got hit by Trump’s Tariffs

    By onlyplanz_80y6mtJune 19, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views

    A Setback for Maine’s Free Community College Program

    June 19, 20251 Views
    Our Picks

    The US small town coffee shop that created a viral drink: ‘I still don’t understand how it went so far’ | Coffee

    Shabana Mahmood says Southport inquiry report exposed ‘systematic failures across multiple public sector organisations’ – UK politics live | Politics

    Eric Swalwell Suspends Campaign for California Governor

    Recent Posts
    • The US small town coffee shop that created a viral drink: ‘I still don’t understand how it went so far’ | Coffee
    • Shabana Mahmood says Southport inquiry report exposed ‘systematic failures across multiple public sector organisations’ – UK politics live | Politics
    • Eric Swalwell Suspends Campaign for California Governor
    • When the Employer Is the Financial Aid Office (opinion)
    • Oil price tops $100 a barrel as US prepares strait of Hormuz blockade; Goldman Sachs posts rise in profits – business live | Business
    © 2026 naijaglobalnews. Designed by Pro.
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.