You Can Now Raise the Alarm About Poor AI Behavior

Creating AI Lab weekly sometimes brings me face-to-face with AI models behaving erratically and unexpectedly. Typically, there’s little that can be done aside from sharing those experiences with you. However, that might change soon.

A team of AI researchers has launched a crowdsourced platform, Flaw Reporting for AI (FLARE-AI), aimed at documenting and monitoring AI-related harms. For instance, if a chatbot produces malware or a bomb-making guide, exposes private data, or induces irrational thoughts in users, FLARE-AI can raise the alarm. The open-source framework allows others to validate issues and direct reports to model developers and organizations like MITRE, a nonprofit focused on tracking technical system challenges. It’s somewhat similar to Downdetector, which aggregates real-time user feedback on global service interruptions affecting apps and websites.

This website marks a continued effort by the group in AI reporting, which I initially covered last year. The team has also collaborated on a congressional proposal introduced in June that would position the US government at the forefront of monitoring AI misconduct.

“Currently, there is no unified, accountable method to report defects in AI systems,” states Avijit Ghosh, an artificial intelligence policy researcher at HuggingFace, who co-directed the creation of FLARE-AI alongside computer scientists Elaine Zhu and Shayne Longpre.

The alert system was designed in partnership with 49 AI specialists from 32 distinct organizations. In a paper describing the initiative, the researchers emphasize its potential importance as AI usage expands and agentic systems acquire more autonomy. They assert that the absence of a standardized reporting protocol for AI flaws represents a significant challenge.

“I think this initiative is excellent,” comments Jessica Ji, a researcher at the Center for Security and Emerging Technology think tank. Ji agrees with the researchers that current reporting systems are disjointed and that AI models function as black boxes. “I support any effort to enhance AI transparency,” she remarks.

While bugs and cybersecurity issues are frequently highlighted—especially recently—Ghosh points out that AI system problems encompass areas such as psychological damage, discrimination or bias, and misinformation. He notes that varying standards among companies regarding these issues lead to some concerns being overlooked. “Without a coordinated disclosure system, there are no external mechanisms enforcing transparency,” Ghosh explains.

Recent incidents involving popular AI tools illustrate how quickly things can go awry.

This week, LayerX reported a method for deceiving AI-powered web browsers, including OpenAI’s Atlas and Perplexity’s Comet, into bypassing their protective measures. For instance, convincing the AI model behind the browser that it was engaged in a game could result in the browser acting irresponsibly and attempting to hack a website. (LayerX says the companies responsible for the affected browsers have resolved the issue.) In April, security researcher Johann Rehberger found a way to manipulate Claude into revealing personal information using images created by ChatGPT.

AI also introduces peculiar new challenges. Last year, OpenAI had to update its models after realizing they were overly flattering, which sometimes seemed to promote irrational thinking.

Rumman Chowdhury, CEO and founder of Humane Intelligence PBC, believes FLARE-AI could be a valuable resource for many AI developers to create reporting mechanisms for issues with their tools. However, she cautions that such initiatives often face significant obstacles.

You Can Now Raise the Alarm About Poor AI Behavior

Rajat Sharma

Mastering YouTube Automation: A Step-by-Step Guide to Creating Viral Faceless Videos

The Rise of Robots: How AI and Robotics Are Transforming the Future of Work

How to Achieve Sustainability in AI

SERVICES

Resources

You Can Now Raise the Alarm About Poor AI Behavior

Goose: A New LGBTQ+ Dating App or a Psychological Operation?

Anthropic Implemented a New Security Protocol to Win Favor with the Trump Administration

legals

Rajat Sharma

You may also like

Mastering YouTube Automation: A Step-by-Step Guide to Creating Viral Faceless Videos

The Rise of Robots: How AI and Robotics Are Transforming the Future of Work

How to Achieve Sustainability in AI

You Can Now Raise the Alarm About Poor AI Behavior

Goose: A New LGBTQ+ Dating App or a Psychological Operation?

Anthropic Implemented a New Security Protocol to Win Favor with the Trump Administration