🐛 Describe the bug
Summary
Repeated calls to torchvision.io.image.decode_jpeg() on a malformed JPEG cause near-linear RSS growth until OOM. Normal JPEGs do not show this behavior. This looks like an error-path memory leak in the CPU JPEG decode path.
I have checked past issues, #3613 ,#4378, those reports are about GPU/nvJPEG memory leaks. This report is CPU-only and leaks on the error path when decoding malformed JPEGs (RSS grows linearly even after gc + malloc_trim)
This issue mirrors a report I previously filed through the repo’s GitHub Security Advisory (private), including PoC and malformed JPEG samples. Since there has been no maintainer response for over 90 days, I’m posting a public issue to ensure the problem is visible and can be tracked.
For responsible disclosure, I will not publish the malformed JPEG samples here. I can provide them privately to maintainers, or they can review the samples already attached in the Security Advisory thread.
Reproduction
Command:
python poc.py case1.jpg --repeat 50 --mode RGB --quiet
Modes tested: UNCHANGED / RGB / GRAY (all leak to varying degrees)
PoC script:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os, sys, argparse, contextlib, gc, ctypes
os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "")
import torch, torchvision
from torchvision.io import ImageReadMode
from torchvision.io.image import decode_jpeg
@contextlib.contextmanager
def swallow_stderr(enable=True):
if not enable:
yield; return
sys.stderr.flush()
fd = sys.stderr.fileno()
old = os.dup(fd)
try:
with open(os.devnull, "wb") as null:
os.dup2(null.fileno(), fd)
yield
finally:
os.dup2(old, fd); os.close(old)
def rss_hwm_kb():
rss = hwm = None
with open("/proc/self/status") as f:
for line in f:
if line.startswith("VmRSS:"):
rss = int(line.split()[1])
elif line.startswith("VmHWM:"):
hwm = int(line.split()[1])
return rss, hwm
def main():
ap = argparse.ArgumentParser()
ap.add_argument("unit", help="the case path")
ap.add_argument("--repeat", type=int, default=50)
ap.add_argument("--mode", choices=["UNCHANGED","RGB","GRAY"], default="RGB")
ap.add_argument("--quiet", action="store_true")
args = ap.parse_args()
print("torch:", torch.__version__)
print("torchvision:", torchvision.__version__)
print("cuda_available:", torch.cuda.is_available())
with open(args.unit, "rb") as f:
data = f.read()
mode = {
"UNCHANGED": ImageReadMode.UNCHANGED,
"RGB": ImageReadMode.RGB,
"GRAY": ImageReadMode.GRAY,
}[args.mode]
# reduce noise
u8 = torch.frombuffer(bytearray(data), dtype=torch.uint8).contiguous()
libc = ctypes.CDLL("libc.so.6")
torch.set_num_threads(1)
print(f"[repro] unit={args.unit} bytes={len(data)} repeat={args.repeat} mode={args.mode}")
for i in range(1, args.repeat + 1):
try:
with swallow_stderr(args.quiet):
_ = decode_jpeg(u8, mode=mode)
except Exception as e:
# Bad JPEG will come here: this is exactly where we need to verify if there is an 'error path leak'
pass
# Try to recycle the 'non leaking' parts as much as possible
gc.collect()
try:
libc.malloc_trim(0)
except Exception:
pass
rss, hwm = rss_hwm_kb()
print(f"[{i}/{args.repeat}] VmRSS={rss/1024:.1f} MB VmHWM={hwm/1024:.1f} MB", flush=True)
if __name__ == "__main__":
main()
Observed results
Normal JPEG: RSS stabilizes around ~269 MB after repeated calls.
Malformed JPEG: RSS grows ~linearly to ~5 GB after 50 iterations (see logs below).
for normal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
...
[45/50] VmRSS=269.0 MB VmHWM=270.9 MB
[46/50] VmRSS=269.0 MB VmHWM=270.9 MB
[47/50] VmRSS=269.0 MB VmHWM=270.9 MB
[48/50] VmRSS=269.0 MB VmHWM=270.9 MB
[49/50] VmRSS=269.0 MB VmHWM=270.9 MB
[50/50] VmRSS=269.0 MB VmHWM=270.9 MB
for abnormal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
[1/50] VmRSS=363.8 MB VmHWM=366.2 MB
[2/50] VmRSS=457.4 MB VmHWM=457.4 MB
[3/50] VmRSS=551.1 MB VmHWM=551.1 MB
[4/50] VmRSS=644.7 MB VmHWM=644.7 MB
[5/50] VmRSS=738.3 MB VmHWM=738.3 MB
[6/50] VmRSS=831.9 MB VmHWM=831.9 MB
[7/50] VmRSS=925.6 MB VmHWM=925.6 MB
...
[45/50] VmRSS=4483.3 MB VmHWM=4483.3 MB
[46/50] VmRSS=4576.9 MB VmHWM=4576.9 MB
[47/50] VmRSS=4670.6 MB VmHWM=4670.6 MB
[48/50] VmRSS=4764.2 MB VmHWM=4764.2 MB
[49/50] VmRSS=4857.8 MB VmHWM=4857.8 MB
[50/50] VmRSS=4951.4 MB VmHWM=4951.4 MB
Meanwhile, you can also check the memory usage using "htop".
For case 1, the memory usage is 5GB, and for case 2, the memory usage is over 100GB.
Sample files
I can provide the malformed samples to maintainers privately.
Impact
If a service decodes untrusted user-provided JPEGs, an attacker could repeatedly submit crafted malformed images to exhaust memory and trigger DoS.
Versions
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu (0.25.0 also)
🐛 Describe the bug
Summary
Repeated calls to torchvision.io.image.decode_jpeg() on a malformed JPEG cause near-linear RSS growth until OOM. Normal JPEGs do not show this behavior. This looks like an error-path memory leak in the CPU JPEG decode path.
I have checked past issues, #3613 ,#4378, those reports are about GPU/nvJPEG memory leaks. This report is CPU-only and leaks on the error path when decoding malformed JPEGs (RSS grows linearly even after gc + malloc_trim)
This issue mirrors a report I previously filed through the repo’s GitHub Security Advisory (private), including PoC and malformed JPEG samples. Since there has been no maintainer response for over 90 days, I’m posting a public issue to ensure the problem is visible and can be tracked.
For responsible disclosure, I will not publish the malformed JPEG samples here. I can provide them privately to maintainers, or they can review the samples already attached in the Security Advisory thread.
Reproduction
Command:
python poc.py case1.jpg --repeat 50 --mode RGB --quiet
Modes tested: UNCHANGED / RGB / GRAY (all leak to varying degrees)
PoC script:
Observed results
Normal JPEG: RSS stabilizes around ~269 MB after repeated calls.
Malformed JPEG: RSS grows ~linearly to ~5 GB after 50 iterations (see logs below).
Meanwhile, you can also check the memory usage using "htop".
For case 1, the memory usage is 5GB, and for case 2, the memory usage is over 100GB.
Sample files
I can provide the malformed samples to maintainers privately.
Impact
If a service decodes untrusted user-provided JPEGs, an attacker could repeatedly submit crafted malformed images to exhaust memory and trigger DoS.
Versions
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu (0.25.0 also)