
Published by CEOWorld.biz
You’ve heard all the hype and fear. Artificial Intelligence (AI) is the most productive technology ever invented to solve all our problems – and it’s coming for your job. That’s the incessant media headline.
But hang on! A recent AI research paper written by the non-profit Center for AI Safety (CAIS) in San Francisco has calmed me down a bit.
I suggest you review it, especially if you’re a CEO or business owner looking to write a big fat check to implement AI-driven solutions, here’s why:
It’s an unbiased eye-opening reality check on AI expectations versus AI spending – and how much seems never enough to add to next year’s operating budget.
It all started soon after Sam Altman at OpenAI released the first LLM model we all know as ChatGPT back in late 2022, when expectations of job-snatching computer overlords predicted a precipitous replacement cycle and declining need for us, ignorant humans, on the job.
Large Language Models promised to lead us to the new promised land of Generative AI where an all-knowing technology could essentially do anything a human could do exponentially faster.
AND I BELIEVED IT — a life-changing new technology with capabilities that rival genius, passing nearly every professional exam from medical to legal with flying colors.
Fast forward to today, AI functionality has ascended from learning to doing, experts say.
And they call it Agentic AI.
Agentic AI is like it sounds, a step-by-step taskmaster like a travel agent that can navigate the multi-step mangled online process of a task, like booking flights, rooms, transport, and amenities, all at the push of a button.
But, can an unassisted computer complete all that better, faster and cheaper?
Let’s find out.
The team at CAIS tested 240 freelance remote work projects as posted on freelance platforms like Upwork across 23 diverse categories, including game development, product design, architecture design, video content, writing, data analysis, and more.
The goal of the Agents was to reproduce the remote-work job deliverables and compare them to the actual results paid for by the original job posters, hence the baseline to compare the AI agents’ results.
Researchers input each project’s description and tested all the leading AI models you may recognize including:
Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT Agent, and Gemini 2.5 Pro
However, when they put these genius-born, job-killing Agentic AIs to the test, something went terribly wrong. They all failed miserably, not by a little, but by a lot.
They call it “AI Slop.”
In fact, when comparing the results to the actuals, there was a 97.5% AI failure rate to produce “client-acceptable” work product as compared to the actual results!
“This demonstrates that contemporary AI systems fail to complete the vast majority of projects at a quality level that would be accepted as commissioned work,” according to the researchers.
Despite the billions of capital expenditure cash being spent by companies worldwide in hopes of replacing workers, the LLMs weren’t even close to delivering a polished final product.
This means more than 3 years after the first ChatGPT debut the study concluded that LLMs are not going to take away millions of real-world jobs anytime soon, maybe never.
AI thus far has gone far, yes, but given their inability to perform real and economically valuable work as self-operating autonomous digital workers, Agentic AI is still a distant target in a foggy Generative AI landscape.
The AI agents in this case wildly misinterpreted the project descriptions or couldn’t find what they needed, or couldn’t initiate substitutes or workarounds, or correct mistakes, or check their own output, for starters. And if an agent can’t check its work along the way, minor errors can be rolled forward and compounded into incomprehensibly useless slop.
And that’s exactly what happened.
This proved to me how functionally dynamic and useful our human brains still are by comparison, and how far from that Agentic AI tech utopia AI still is.
For now, this means a narrowing arena of real-world use-cases for AI in the workplace, mostly benefitting humans as a superior productivity companion tool across multiple job categories, not a replacement. And I’m all in favor of that as long as humans get time and training to adapt.
Because it could have been much worse.
The Agentic AI results could have thrown us a real knock-out punch, but instead by hook or crook we dodged it, followed by a collective sigh of relief from most freelance professionals who feared the AI grim reaper that way.
Nevertheless, have a look at the CAIS study, it’s a comprehensive analysis that will inform and potentially save you a bundle if you’re thinking about how AI can or can’t fully automate jobs at your company.
The good news is, these results may have saved thousands of jobs from over-hyped AI expectations and layoffs this year. The bad news is there’s always next year.
Until then, Happy Holidays.
Rick
