U k r V i s t i

l o a d i n g

New Security Threat to Language Models: Bypassing Filters

Researchers uncovered new methods to bypass security filters in language models like ChatGPT using information overload techniques.

image

A team of researchers from Intel, Idaho State University, and the University of Illinois has revealed a novel method for circumventing security filters in large language models (LLMs) such as ChatGPT and Gemini. This information comes from 404 Media.

In their study, they discovered that chatbots could be manipulated to disclose restricted information by presenting queries in complex or ambiguous forms or citing non-existent sources. This method is referred to as "information overload."

Researchers employed a specialized tool called InfoFlood, which automates the "overloading" process of models with information. As a result, systems become disoriented and may provide prohibited or dangerous content that is usually blocked by built-in security filters.

The vulnerability lies in the fact that models focus on the superficial structure of text, failing to recognize dangerous content in a hidden form. This opens up opportunities for malicious actors to bypass restrictions and access harmful information.

As part of responsible disclosure of vulnerabilities, the authors of the study intend to share their findings with companies working with large LLMs to enhance their security systems. They will also provide the solution method they discovered during their research.

"LLM models primarily rely on protective mechanisms during data input and output to identify harmful content. InfoFlood can be used to train these protective mechanisms—it enables relevant information extraction from potentially dangerous queries, making models more resilient to such attacks," the study states.