Diffusion Models · Vision

Adaptive Image Composition of two Diffusion Models

An adaptive image composition framework designed to address data scarcity in medical X-ray datasets, particularly for cases involving implants where labeled data is limited. The approach originates from a practical need in medical imaging applications, where implant-containing X-rays are significantly underrepresented compared to normal scans. To tackle this imbalance, diffusion models are leveraged to generate realistic composite images by synthesizing implant structures within existing anatomical contexts.

The framework, DiffusionMix, combines two Denoising Diffusion Probabilistic Models (DDPMs), with one specialized for background generation and the other for object, such as implant, generation along with corresponding segmentation masks, enabling controlled and realistic image composition. By jointly denoising and iteratively merging outputs from both models with a resampling mechanism, the method allows mutual interaction between object and background during generation, producing naturally blended results. This design supports efficient and flexible dataset generation for downstream tasks such as detection, segmentation, and anomaly analysis in medical imaging.

Manuscript Download

Visualization Results

In DiffusionMix, two Denoising Diffusion Probabilistic Models (DDPMs) are used, one of which is trained with a 4-channel input (RGB + mask). The object diffusion model generates an object image along with its corresponding segmentation mask within a given bounding box, while the background diffusion model synthesizes the rest of the scene. The outputs from the two models are progressively merged during the denoising process, allowing interaction between the object and background and resulting in a naturally blended composition.

Visual results for the composition of anomaly (object diffusion) and normal image (background diffusion) on MVTec Dataset. During denoising, the object and background influence each other, gradually adapting to reflect mutual changes. For example, in the Hazelnut dataset, anomalies blend into normal hazelnut images while the surrounding background adjusts accordingly. Notably, in crack anomalies, even when the anomaly deviates from the hazelnut’s natural round shape, the hazelnut structure wraps around it, producing realistic results. Similarly, in the Carpet dataset, woven patterns seamlessly connect between the anomaly and the background, demonstrating organic integration. This behavior is enabled by the resampling process, where anomalies and backgrounds are generated interactively, allowing for more coherent and natural compositions.