SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution

Mohamed Ramzy Ibrahim*, Roberto Benavente Vidal, Daniel Ponsa Mussarra, Felipe Lumbreras Ruiz

*Corresponding author for this work

Research output: Chapter in BookChapterResearchpeer-review

1 Citation (Scopus)

Abstract

Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM
Original languageEnglish
Title of host publication19th International Conference on Computer Vision Theory ad Applications (VISAPP'2024))
Pages575-582
Number of pages8
Volume3
ISBN (Electronic)978-989-758-679-8
DOIs
Publication statusPublished - 1 Jan 2024

Publication series

NameProceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

Keywords

  • Computer Vision
  • Deep Learning
  • Remote Sensing
  • Super-Resolution

Fingerprint

Dive into the research topics of 'SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution'. Together they form a unique fingerprint.

Cite this