AUTHOR=Tang Guangfa , Cai Shanshan , Meng Xiangjun , Huo SiYan , Wang Mengbo , Lu Zichen , Chen Zhuokang , Luo XiaoLing 

TITLE=High-fidelity medical image generation: controllable synthesis of high-resolution medical images via hierarchical fusion in vector-quantized generative networks

JOURNAL=Frontiers in Physics

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2025.1661146

DOI=10.3389/fphy.2025.1661146

ISSN=2296-424X

ABSTRACT=ObjectiveHigh-resolution medical images are scarce, and existing image generation methods perform poorly at high resolutions, struggling with the representation of small lesions, loss of detailed information, distortion of anatomical structure, high computational cost, and mode collapse. This study aims to develop a novel generative framework to address the challenges of high-resolution medical image generation.MethodsClinical X-ray data from 255 patients and a public dataset containing 1,657 lung CT images with lung nodules were collected. We propose a pioneering medical image generation method that employs a two-route synthesis strategy: a foreground generation route that utilizes a generative model from a single lesion image (SinGAN) to create new lesion configurations and structures while preserving the original patch distribution and a background generation route that utilizes a high-fidelity medical image generation model, high-resolution medical image (HiResMed) Vector-Quantized Generative Adversarial Network (VQGAN), which incorporates a hierarchical dual-path fusion block (HDFB) and integrates it into a VQGAN, trained on the collected data. The HDFB module combines a dual-path learning strategy: a residual path with skip connections to capture hierarchical dependencies and multi-scale textures and a multi-scale convolutional feedforward feature extraction module (MSConvFE) that preserves low-level anatomical features through localized detail enhancement. Finally, based on the location of lesions in historical data as prior knowledge to guide the fusion position of the synthesized lesions in the background image, a high-resolution synthetic medical image with small lesions is obtained. We compared our method with denoising diffusion model (DDM), StyleSwin, VQGAN, and SinGAN using Frechet Inception Distance (FID), learned perceptual image patch similarity (LPIPS), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). Two urologists participated in a visual Turing test to assess perceptual fidelity.ResultsThe experimental results demonstrate that the proposed method achieves state-of-the-art performance, reducing FID by 43.3% (145.64 vs. 256.11) and LPIPS by 5% (0.48 vs. 0.51), enhancing the PSNR by 4% (59.03 vs. 56.54) and SSIM by 6% (0.67 vs. 0.63), and accelerating training convergence by 83% compared to baseline VQGAN. Clinicians misclassified 55% of synthetic images as real, validating their anatomical fidelity.ConclusionThis study proposes a method for generating high-resolution medical images of small lesions. It not only ensures high-quality lesion generation but also allows controls over the number and location of lesions. Moreover, the innovative architecture enhances the detailed quality of anatomical structures and improves computational efficiency during training.