ReSyn Dataset
The diversity analysis of our ReSyn dataset. It contains both real and synthetic images from a variety of data sources and covers a wide range of resolutions.

Image restoration (IR), which aims to recover high-quality images from degraded inputs, is a crucial task in modern image processing. Recent advancements in deep learning, particularly with Convolutional Neural Networks (CNNs) and Transformers, have significantly improved image restoration performance. However, existing methods lack a unified training benchmark that specifies the training iterations and configurations. Additionally, we construct an image complexity evaluation metric using the gray-level co-occurrence matrix (GLCM) and find that there exist a bias between the image complexity distributions of commonly used IR training and testing datasets, leading to suboptimal restoration results. Therefore, we construct a new large-scale IR dataset called ReSyn, that utilizes a novel image filtering method based on image complexity to achieve a balanced image complexity distribution, and contains both real and AIGC synthetic images. From the perspective of measuring the model's convergence ability and restoration capability, we construct a unified training standard that specifies the training iterations and configurations for image restoration models. Furthermore, we explore how to enhance the performance of transformer-based image restoration models based on linear attention mechanism. We propose RWKV-IR, a novel image restoration model that incorporates the linear complexity RWKV into the transformer-based image restoration structure, and enables both global and local receptive fields. Instead of directly integrating the Vision-RWKV into the transformer architecture, we replace the original Q-Shift in RWKV with a novel Depth-wise Convolution shift, which effectively models the local dependencies, and is further combined with Bi-directional attention to achieve both global and local aware linear attention. Moreover, we propose a Cross-Bi-WKV module that combines two Bi-WKV modules with different scanning orders to achieve a balanced attention for horizontal and vertical directions. Extensive quantitative and qualitative experiments demonstrate the effectiveness and competitive performance of our RWKV-IR model.
The diversity analysis of our ReSyn dataset. It contains both real and synthetic images from a variety of data sources and covers a wide range of resolutions.
Some images from our ReSyn Dataset, contain real and synthetic images.
The complexity distributions of different datasets. The complexity distributions of the training datasets DIV2K and DF2K have a typical shift, containing more images of low complexity. Our ReSyn dataset balances the distribution of low and high complexity images by image filtering based on the newly proposed GLCM image complexity measure.
Urban100 Dataset
Manga109 Dataset
BS100 Dataset
DIV2K Dataset
DF2K Dataset
ReSyn Dataset
Using image complexity, you could determine whether the image is hard to restore.
The images used for image restoration model training need to have a high pixel-level quality. To this end, we divide the shuffle process into three steps. 1) Firstly, the images of resolution smaller than 800x800 are discarded, since for super-resolution tasks, the images need to be down-sampled. This can help remove most low-quality images. 2) Secondly, to remove the blurry or noisy degraded images, we follow the blur and noise suppression process proposed in LSDIR. The remaining images are under blur detection by the variance of the image Laplacian, and flat region detection through the Sobel filter. 3) Thirdly, all the images are shuffled through the GLCM complexity metric (detailed below) to ensure a balanced distribution. We ensure that the number of images with complexity values below zero is equal to that above zero. Therefore, we can form a dataset of balanced image complexity distribution. It should be mentioned that images from different sources are filtered individually.
Final Image Complexity Shuffle
Our framework consists of three stages: shallow feature extraction, deep feature enhancement, and HQ image reconstruction. For deep feature enhancement, a series of Global&Local Linear attention Layers (GLLL, which is based on RWKV) and a Conv Block is used. Each GLLL layer contains several GLLB blocks, which contain linear complexity attention and a channel mix module.
Different shift methods. The Q-shift is a simple channel replacement operation using four neighboring pixels, while our DC-shift is a depth-wise conv leveraging the surrounding pixels in a kxk neighborhood.
Illustration of Cross-Bi-WKV, which consists of two cross scanning Bi-WKV modules.