OpenAI Requests Contractors to Submit Previous Work for Assessing AI Agent Performance

OpenAI is soliciting third-party contractors to submit authentic assignments and tasks from their current or past jobs, utilizing the data to assess the performance of its upcoming AI models, according to documents from OpenAI and training data firm Handshake AI acquired by WIRED.

This initiative seems to be part of OpenAI’s strategy to set a human benchmark for various tasks, allowing comparisons with AI models. In September, the organization initiated a new evaluation process aimed at measuring its AI models’ effectiveness against human professionals in different sectors. OpenAI claims this is crucial for tracking its advancement towards achieving AGI, or an AI system that surpasses human capabilities in most economically valuable tasks.

“We’ve engaged individuals from various professions to assist in gathering real-world tasks based on those performed in your full-time roles, enabling us to assess AI models’ performance on these tasks,” states a confidential document from OpenAI. “Transform long-term or intricate work (taking hours or days) you’ve done in your role into a task.”

OpenAI is inviting contractors to outline tasks they’ve completed in their current or previous jobs and submit genuine work examples based on an OpenAI presentation reviewed by WIRED. Each example should be “a concrete output (not a summary, but the actual file), such as a Word document, PDF, Powerpoint, Excel, image, or repository,” as noted in the presentation. OpenAI also mentions that individuals can provide fabricated examples to illustrate how they would realistically react in particular situations.

Neither OpenAI nor Handshake AI provided comments.

According to the OpenAI presentation, real-world tasks consist of two elements: the task request (instructions from a manager or colleague) and the task deliverable (the actual output produced). The company repeatedly emphasizes that the examples shared by contractors should represent “real, on-the-job work” that the individual has “actually accomplished.”

One instance in the OpenAI presentation depicts a task from a “Senior Lifestyle Manager at a luxury concierge firm for ultra-high-net-worth clients.” The objective is to “draft a concise, 2-page PDF summary of a 7-day yacht trip itinerary to the Bahamas for a family visiting for the first time.” It includes additional specifics about the family’s interests and preferred itinerary structure. The “experienced human deliverable” then illustrates what the contractor would provide: a legitimate Bahamas itinerary created for a client.

OpenAI instructs contractors to remove corporate intellectual property and any personally identifiable information from their uploaded work files. Within a segment labeled “Important reminders,” OpenAI advises workers to “remove or anonymize any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategies, unreleased product specifics).”

One document reviewed by WIRED references a ChatGPT tool called “Superstar Scrubbing,” which offers guidance on deleting confidential data.

Evan Brown, an intellectual property lawyer with Neal & McDevitt, informs WIRED that AI labs receiving confidential information from contractors on this scale could face claims of trade secret misappropriation. Contractors providing documents from previous employers to an AI company, even after scrubbing, may risk violating nondisclosure agreements or disclosing trade secrets.

“The AI lab places significant trust in its contractors to determine what is and isn’t confidential,” Brown states. “If something is overlooked, are the AI labs genuinely ensuring what qualifies as a trade secret? It appears that the AI lab is exposing itself to significant risk.”

OpenAI Requests Contractors to Submit Previous Work for Assessing AI Agent Performance

Rajat Sharma

A Deep Learning Approach Can Enhance AI Agents’ Interaction with the Real World

The Mysterious Incident of the Vanishing Captcha

‘Eerie Gap’: Misinformation in Minneapolis, TikTok’s New Management, and the Buzz Around Moltbot

SERVICES

Resources

Anthropic Responds After US Military Designates It as a ‘Supply Chain Threat’

Trump Aims to Prohibit Anthropic’s Engagement with the U.S. Government

OpenAI Terminates Employee for Engaging in Insider Trading in Prediction Market

legals