GHSA-6fvq-23cw-5628

Source

https://github.com/advisories/GHSA-6fvq-23cw-5628

Import Source

https://github.com/github/advisory-database/blob/main/advisories/github-reviewed/2025/10/GHSA-6fvq-23cw-5628/GHSA-6fvq-23cw-5628.json

JSON Data

https://api.test.osv.dev/v1/vulns/GHSA-6fvq-23cw-5628

Aliases

CVE-2025-61620

Related

CGA-7mqq-hv83-c65c

Published

2025-10-07T21:35:22Z

Modified

2025-10-07T21:57:21.465970Z

Severity

6.5 (Medium) CVSS_V3 - CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H CVSS Calculator

Summary

vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

Details

Summary

A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the chat_template and chat_template_kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.

Details

When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a chat_template parameter that lets users specify that template. In addition, the server accepts a chat_template_kwargs parameter to pass extra keyword arguments to the rendering function.

Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.

Importantly, simply forbidding the chat_template parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for apply_hf_chat_template and then updates that dictionary with the user-supplied chat_template_kwargs via dict.update. Since dict.update can overwrite existing keys, an attacker can place a chat_template key inside chat_template_kwargs to replace the template that will be used by apply_hf_chat_template.

# vllm/entrypoints/openai/serving_engine.py#L794-L816
_chat_template_kwargs: dict[str, Any] = dict(
    chat_template=chat_template,
    add_generation_prompt=add_generation_prompt,
    continue_final_message=continue_final_message,
    tools=tool_dicts,
    documents=documents,
)
_chat_template_kwargs.update(chat_template_kwargs or {})

request_prompt: Union[str, list[int]]
if isinstance(tokenizer, MistralTokenizer):
    ...
else:
    request_prompt = apply_hf_chat_template(
        tokenizer=tokenizer,
        conversation=conversation,
        model_config=model_config,
        **_chat_template_kwargs,
    )

Impact

If an OpenAI-Compatible Server exposes endpoints that accept chat_template or chat_template_kwargs from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding chat_template inside chat_template_kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.

Fixes

https://github.com/vllm-project/vllm/pull/25794

Database specific

{
    "severity": "MODERATE",
    "cwe_ids": [
        "CWE-20",
        "CWE-400",
        "CWE-770"
    ],
    "nvd_published_at": null,
    "github_reviewed_at": "2025-10-07T21:35:22Z",
    "github_reviewed": true
}

References

Affected packages

PyPI / vllm

Package

Name: vllm; View open source insights on deps.dev
Purl: pkg:pypi/vllm

Affected ranges

Type: ECOSYSTEM
Events: Introduced

0.5.1

Fixed

0.11.0

Affected versions

0.*

0.5.1

0.5.2

0.5.3

0.5.3.post1

0.5.4

0.5.5

0.6.0

0.6.1

0.6.1.post1

0.6.1.post2

0.6.2

0.6.3

0.6.3.post1

0.6.4

0.6.4.post1

0.6.5

0.6.6

0.6.6.post1

0.7.0

0.7.1

0.7.2

0.7.3

0.8.0

0.8.1

0.8.2

0.8.3

0.8.4

0.8.5

0.8.5.post1

0.9.0

0.9.0.1

0.9.1

0.9.2

0.10.0

0.10.1

0.10.1.1

0.10.2