Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration

1ShangHai Jiao Tong University, 2Youtu Lab, 3Zhejiang University

Abstract

Image restoration (IR), which aims to recover high-quality images from degraded inputs, is a crucial task in modern image processing. Recent advancements in deep learning, particularly with Convolutional Neural Networks (CNNs) and Transformers, have significantly improved image restoration performance. However, existing methods lack a unified training benchmark that specifies the training iterations and configurations. Additionally, we construct an image complexity evaluation metric using the gray-level co-occurrence matrix (GLCM) and find that there exist a bias between the image complexity distributions of commonly used IR training and testing datasets, leading to suboptimal restoration results. Therefore, we construct a new large-scale IR dataset called ReSyn, that utilizes a novel image filtering method based on image complexity to achieve a balanced image complexity distribution, and contains both real and AIGC synthetic images. From the perspective of measuring the model's convergence ability and restoration capability, we construct a unified training standard that specifies the training iterations and configurations for image restoration models. Furthermore, we explore how to enhance the performance of transformer-based image restoration models based on linear attention mechanism. We propose RWKV-IR, a novel image restoration model that incorporates the linear complexity RWKV into the transformer-based image restoration structure, and enables both global and local receptive fields. Instead of directly integrating the Vision-RWKV into the transformer architecture, we replace the original Q-Shift in RWKV with a novel Depth-wise Convolution shift, which effectively models the local dependencies, and is further combined with Bi-directional attention to achieve both global and local aware linear attention. Moreover, we propose a Cross-Bi-WKV module that combines two Bi-WKV modules with different scanning orders to achieve a balanced attention for horizontal and vertical directions. Extensive quantitative and qualitative experiments demonstrate the effectiveness and competitive performance of our RWKV-IR model.

ReSyn Dataset

The diversity analysis of our ReSyn dataset. It contains both real and synthetic images from a variety of data sources and covers a wide range of resolutions.

Some Images from ReSyn Dataset

Some images from our ReSyn Dataset, contain real and synthetic images.

Image Complexity Analysis

Complexity Distribution

The complexity distributions of different datasets. The complexity distributions of the training datasets DIV2K and DF2K have a typical shift, containing more images of low complexity. Our ReSyn dataset balances the distribution of low and high complexity images by image filtering based on the newly proposed GLCM image complexity measure.

Urban100 Dataset

Manga109 Dataset

BS100 Dataset

Image Complexity Analysis

DIV2K Dataset

DF2K Dataset

ReSyn Dataset


IComplexity = ENT - ENE + DISS

Compleity-PSNR Relation Analysis

Using image complexity, you could determine whether the image is hard to restore.

PSNR (x2 SR on Urban100) performance can be predicted by the proposed GLCM image complexity and BPP. For each predictor, we sort images and compute the Pearson correlation (rho) with PSNR. Compared to BPP, GLCM Complexity has a higher correlation to PSNR.

Data Collection and Shuffle

The images used for image restoration model training need to have a high pixel-level quality. To this end, we divide the shuffle process into three steps. 1) Firstly, the images of resolution smaller than 800x800 are discarded, since for super-resolution tasks, the images need to be down-sampled. This can help remove most low-quality images. 2) Secondly, to remove the blurry or noisy degraded images, we follow the blur and noise suppression process proposed in LSDIR. The remaining images are under blur detection by the variance of the image Laplacian, and flat region detection through the Sobel filter. 3) Thirdly, all the images are shuffled through the GLCM complexity metric (detailed below) to ensure a balanced distribution. We ensure that the number of images with complexity values below zero is equal to that above zero. Therefore, we can form a dataset of balanced image complexity distribution. It should be mentioned that images from different sources are filtered individually.

Final Image Complexity Shuffle

Model Framework

Our framework consists of three stages: shallow feature extraction, deep feature enhancement, and HQ image reconstruction. For deep feature enhancement, a series of Global&Local Linear attention Layers (GLLL, which is based on RWKV) and a Conv Block is used. Each GLLL layer contains several GLLB blocks, which contain linear complexity attention and a channel mix module.

RWKVIR Sub Block

DC-Shift

Different shift methods. The Q-shift is a simple channel replacement operation using four neighboring pixels, while our DC-shift is a depth-wise conv leveraging the surrounding pixels in a kxk neighborhood.

Cross-Bi-WKV

Illustration of Cross-Bi-WKV, which consists of two cross scanning Bi-WKV modules.


Compleity-PSNR Relation Analysis

Super-Resolution Experiments

Classical SR Experiments(100K training iters)

Quantitative comparison of classic image super-resolution with state-of-the-art methods on 10K iters training. The best and the second-best results are in red and blue.

Classical SR Experiments(500K training iters)

Quantitative comparison on classic image super-resolution with state-of-the-art methods on 500K iters training.

LightWeight SR Experiments(50K training iters)

Quantitative comparison on lightweight image super-resolution with state-of-the-art methods on 50K training iterations. The best is in red.

LightWeight SR Experiments(500K training iters)

Quantitative comparison on lightweight image super-resolution with state-of-the-art methods on 500K training iters.