AI Agents Make Poor Freelancers

AI Agents Make Poor Freelancers

Despite impressive advancements, artificial intelligence agents still struggle significantly with online freelance tasks, challenging the notion that AI will soon replace office workers en masse.

The Remote Labor Index, a new metric developed by researchers at Scale AI and the Center for AI Safety (CAIS), a non-profit organization, evaluates the capability of leading AI models to automate economically valuable tasks.

Researchers assigned various simulated freelance jobs to several top AI agents and discovered that even the most proficient could only complete under 3 percent of the tasks, earning $1,810 out of a potential $143,991. Among the tools tested, Manus from a Chinese startup of the same name emerged as the most capable, followed closely by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.

“I hope this provides a more accurate understanding of AI capabilities,” states Dan Hendrycks, director of CAIS. He notes that although some AI agents have made substantial progress in recent years, this improvement may not continue at the same pace.

Remarkable advancements in AI have stirred assumptions that it could soon outpace human intelligence and eliminate large numbers of jobs. In March, Dario Amodei, CEO of Anthropic, claimed that within months, 90 percent of coding tasks could be automated.

Historically, waves of AI development have sparked erroneous predictions regarding job displacement, such as the anticipated replacement of radiologists by AI algorithms.

The researchers developed a variety of freelance tasks by collaborating with verified Upwork workers. These tasks encompass areas like graphic design, video editing, game development, and administrative work, such as data scraping. They provided a detailed job description, a directory of necessary files, and an example of a human-generated finished project.

Hendrycks mentions that while AI models have improved in coding, mathematics, and logical reasoning, they still face challenges when using different tools and executing complex, multi-step tasks. “They lack long-term memory storage and cannot learn continuously from experiences. They are unable to acquire skills on the job like humans can,” he explains.

This analysis serves as a counterpoint to a work benchmark introduced by OpenAI in September, called GDPval, which claims to measure economically valuable work. According to GDPval, advanced AI models like GPT-5 are nearing human abilities across 220 tasks related to various office jobs. OpenAI has not issued a comment on this matter.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant