AI-generated code is already widespread — by some estimates around 40% of new code this past year was written by AI. Microsoft CTO Kevin Scott predicts that in five years this figure will hit 95%. How to properly maintain and protect such code is a burning issue.
Experts still rate the security of AI code as low, as it’s teeming with all the classic coding flaws: vulnerabilities (SQL injections, embedded tokens and secrets, insecure deserialization, XSS), logical defects, outdated APIs, insecure encryption and hashing algorithms, no handling of errors and incorrect user input, and much more. But using an AI assistant in software development adds another unexpected problem: hallucinations. A new study examines in detail how large language models (LLMs) create hallucinations that pop up in AI code. It turns out that some third-party libraries called by AI code simply don’t exist.
Fictitious dependencies in open-source and commercial LLMs
To study the phenomenon of phantom libraries, the researchers prompted 16 popular LLMs to generate 576,000 Python and JavaScript code samples. The models showed varying degrees of imagination: GPT4 and GPT4 Turbo hallucinated the least (fabricated libraries were seen in less than 5% of the code samples); next came DeepSeek models (more than 15%); while CodeLlama 7B was the most fantasy-prone (more than 25%). What’s more, even the parameters used in LLMs to control randomness (temperature, top-p, top-k) are unable to reduce the hallucination rate to insignificant values.
Python code contained fewer fictitious dependencies (16%) than JavaScript (21%). Age is also a contributing factor. Generating code using packages, technologies and algorithms that started trending only this past year results in 10% more non-existent packages.
But the most dangerous aspect of phantom packages is that their names aren’t random, and neural networks reference the same libraries over and over again. That was demonstrated by stage two of the experiment, in which the researchers selected 500 prompts that had provoked hallucinations, and re-ran each of them 10 times. This revealed that 43% of hallucinated packages crop up during each code generation run.
Also of interest is the naming of hallucinated packages: 13% were typical “typos” that differed from the real package name by only one character; 9% of package names were borrowed from another development language (Python code, npm packages); and a further 38% were logically named but differed more significantly from the real package names.
Meet slopsquatting
All of the can provoke a new generation of attacks on open-source repositories, which has already been dubbed “slopsquatting” by analogy with typosquatting. In this case, squatting is made possible not by names with typos, but by names from AI slop (low-quality output). Because AI-generated code repeats package names, attackers can run popular models, find recurring hallucinated package names in the generated code, and publish real — and malicious — libraries with these same names. If someone mindlessly installs all packages referenced in the AI-generated code, or the AI assistant installs the packages by itself, a malicious dependency gets injected into the compiled application, exposing the supply chain to a full-blown attack (ATT&CK T1195.001). This risk is set to rise significantly with the advance of vibe coding — where the programmer writes code by giving instructions to AI with barely a glance at the actual code produced.
Given that all major open-source repositories have been hit by dozens of malicious packages this past year (1, 2), and close to 20,000 malicious libraries have been discovered in the same time period, we can be sure that someone out there will try to conveyorize this new type of attack. This scenario is especially dangerous for amateur programmers, as well as for corporate IT departments that solve some automation tasks internally.
How to stop slopsquatting and use AI safely
Guidelines on the safe implementation of AI in development already exist (for example, OWASP, NIST and our own), but these tend to describe a very broad range of measures, many of which are long and complicated to implement. Therefore, we’ve compiled a small subset of easy-to-implement measures to address the specific problem of hallucinated packets:
- Make source-code scanning and static security testing part of the development pipeline. All code, including AI-generated, must meet clear criteria are: no embedded tokens or other secrets; use of correct versions of libraries and other dependencies, and so forth. These tasks are well integrated into the CI/CD cycle — for example, with the help of our Kaspersky Container Security.
- Introduce additional AI validation cycles where the LLM checks its own code for errors, to reduce the number of hallucinations. In addition, the model can be prompted to analyze the popularity and usability of each package referenced in a project. Using a prebuilt database of popular libraries to fine-tune the model and allow retrieval-augmented generation (RAG) also reduces the number of errors. By combining all these methods, the authors of the study were able to cut the number of hallucinated packages to 2.4% for DeepSeek and 9.3% for CodeLlama. Unfortunately, both figures are too far off zero for these measures to suffice.
- Ban the use of AI assistants in coding critical and trusted components. For non-critical tasks where AI-assisted coding is allowed, assign a component developer to build a code review process. For the review, there needs to be a checklist tailored to AI code.
- Draw up a fixed list of trusted dependencies. AI assistants and their flesh-and-blood users must have limited scope to add libraries and dependencies to the code — ideally, only libraries from the organization’s internal repository, tested and approved in advance, should be available.
- Train developers. They must be well versed in AI security in general, as well as in the context of AI use in code development.