Search CORE

1,721,166 research outputs found

AIM 2020 challenge on video extreme super-resolution: methods and results

Author: Timofte Radu
Huang Zhiwu
Fuoli Dario
Publication venue
Publication date
Field of study

This paper reviews the video extreme super-resolution challenge associated with the AIM 2020 workshop at ECCV 2020. Common scaling factors for learned video super-resolution (VSR) do not go beyond factor 4. Missing information can be restored well in this region, especially in HR videos, where the high-frequency content mostly consists of texture details. The task in this challenge is to upscale videos with an extreme factor of 16, which results in more serious degradations that also affect the structural integrity of the videos. A single pixel in the low-resolution (LR) domain corresponds to 256 pixels in the high-resolution (HR) domain. Due to this massive information loss, it is hard to accurately restore the missing information. Track 1 is set up to gauge the state-of-the-art for such a demanding task, where fidelity to the ground truth is measured by PSNR and SSIM. Perceptually higher quality can be achieved in trade-off for fidelity by generating plausible high-frequency content. Track 2 therefore aims at generating visually pleasing results, which are ranked according to human perception, evaluated by a user study. In contrast to single image super-resolution (SISR), VSR can benefit from additional information in the temporal domain. However, this also imposes an additional requirement, as the generated frames need to be consistent along time

Southampton (e-Prints Soton)

NTIRE 2020 challenge on video quality mapping: methods and results

Author: Timofte Radu
Huang Zhiwu
Danelljan Martin
Fuoli Dario
Publication venue
Publication date: 01/01/2020
Field of study

This paper reviews the NTIRE 2020 challenge on videoquality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the mapfrom more compressed videos to less compressed videos ina supervised training manner. In track 2, algorithms arerequired to learn the quality mapping from one device toanother when their quality varies substantially and weaklyaligned video pairs are available. For track 1, in total 7teams competed in the final test phase, demonstrating noveland effective solutions to the problem. For track 2, some existing methods are evaluated, showing promising solutionsto the weakly-supervised video quality mapping problem

Southampton (e-Prints Soton)

LocalViT: Analyzing Locality in Vision Transformers

Author: Van Goo Luc
Timofte Radu
Benini Luca
Cao Jiezhang
Zhang Kai
Li Yawei
Magno Michele
Publication venue
Publication date: 01/01/2023
Field of study

The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for infor-mation exchange within a local region. In this paper, locality mechanism is systematically investigated by carefully designed controlled experiments. We add locality to vision transformers into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to vision transformers with different architecture designs, which shows the generalization of the locality concept. For ImageNet2012 classification, the locality-enhanced transformers outperform the baselines Swin-T [1], DeiT-T [2] and PVT-T [3] by 1.0%, 2.6 % and 3.1 % with a negligible increase in the number of parameters and computational effort. Code is available at https://github.com/ofsoundof/LocalViT

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Author: Paudel Danda Pani
Timofte Radu
Van Gool Luc
Huang Zhiwu
Fuoli Dario
Publication venue
Publication date: 11/12/2022
Field of study

Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are often coupled with the lack of example pairs, which inhibits the application of supervised learning strategies. To address these challenges, we propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate spatio-temporal information across frames and reduces the need for high complexity networks. Our setting enables learning from unpaired videos in a cyclic adversarial manner, where the proposed recurrent units are employed in all architectures. Efficient training is accomplished by introducing one single discriminator that learns the joint distribution of source and target domain simultaneously. The enhancement results demonstrate clear superiority of the proposed video enhancer over the state-of-the-art methods, in all terms of visual quality, quantitative metrics, and inference speed. Notably, our video enhancer is capable of enhancing over 35 frames per second of FullHD video (1080x1920)

arXiv.org e-Print Archive

ETHzürich Repository for Publications and Research Data

Southampton (e-Prints Soton)

Generative Flows with Invertible Attentions

Author: SUKTHANKER Rhea Sanjay
Timofte Radu
Sukthanker Rhea
Van Gool Luc
Huang Zhiwu
Kumar Suryansh
Publication venue
Publication date: 01/01/2022
Field of study

Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while it has made breakthroughs in other domains. To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transformer-based attentions, for both unconditional and conditional generative flows. The key idea is to exploit a masked scheme of these two attentions to learn long-range data dependencies in the context of generative flows. The masked scheme allows for invertible attention modules with tractable Jacobian determinants, enabling its seamless integration at any positions of the flow-based models. The proposed attention mechanisms lead to more efficient generative flows, due to their capability of modeling the long-term data dependencies. Evaluation on multiple image synthesis tasks shows that the proposed attention flows result in efficient models and compare favorably against the state-of-the-art unconditional and conditional generative flows

ETHzürich Repository for Publications and Research Data

Southampton (e-Prints Soton)

Institutional Knowledge at Singapore Management University

Trilevel neural architecture search for efficient single image super-resolution

Author: Wu Yan
Timofte Radu
Van Gool Luc
Huang Zhiwu
Sukthanker Rhea Sanjay
Kumar Suryansh
Publication venue
Publication date: 01/01/2022
Field of study

Southampton (e-Prints Soton)

The Vid3oC and IntVID Datasets for Video Super Resolution and Quality Mapping

Author: Kim Sohyeong
Timofte Radu
KIM S.
TIMOFTE R.
FUOLI D.
Shuhang Gu
Danelljan Martin
Fuoli Dario
Dario Fuoli
LI G.
HUANG Zhiwu
Sohyeong Kim
Gu Shuhang
Guanju Li
Radu Timofte
GU S.
Li Guanju
DANELLJAN M.
Zhiwu Huang
Martin Danelljan
Publication venue
Publication date: 01/01/2019
Field of study

The current rapid advancements of computational hardware has opened the door for deep networks to be applied for real-time video processing, even on consumer devices. Appealing tasks include video super-resolution, compression artifact removal, and quality enhancement. These problems require high-quality datasets that can be applied for training and benchmarking. In this work, we therefore introduce two video datasets, aimed for a variety of tasks. First, we propose the Vid3oC dataset, containing 82 simultaneous recordings of 3 camera sensors. It is recorded with a multi-camera rig, including a high-quality DSLR camera, a high-end smartphone, and a stereo camera sensor. Second, we introduce the IntVID dataset, containing over 150 high-quality videos crawled from the internet. The datasets were employed for the AIM 2019 challenges for video super-resolution and quality mapping

Southampton (e-Prints Soton)

Crossref

Institutional Knowledge at Singapore Management University

Learned Image Signal Processing Pipeline for Mobile Cameras

Author: Elezabi Omar
Publication venue
Publication date: 01/01/2023
Field of study

The image signal processing (ISP) pipeline is a crucial part of the image creation process. This pipeline consists of a handcrafted and complex sequence of image-processing tasks that are used to process the raw image from the camera sensor and produce the final RGB image. Because of the hardware limitation in mobile cameras from their compact size, the ISP of mobile phones became more advanced and complex to overcome these limitations. In previous years a new research direction proposed to replace this complex hand-crafted pipeline with an end-to-end learned-based ISP using deep learning. They achieved that by training a deep learning network to process the raw image of a phone camera by imitating the output of a DSLR camera. This approach showed promising results without the need for the long and complex process of handcrafted conventional ISP. But this approach is still a research direction that has a lot of limitations and problems compare to the conventional ISP used in mobile cameras nowadays. In order to reach production-level accuracy and robustness with this approach a lot of work needs to be done to address its issues. In this work, we tried to improve the current state of learned-based ISP by addressing some of its main problems. We worked on night image rendering by using a learned-based ISP Network. We proposed an efficient network that was trained without the need for annotated data. Our proposed approach was one of the top 10 solutions on the NTIRE 2023 Challenge on Night Photography Rendering. We also worked on the problems of the ISP datasets like alignment and availability. We proposed a novel idea to create a fully aligned high-quality synthetic ISP dataset with a weakly aligned ISP dataset. Our experiments show that We get better performance by training on our synthetic dataset than directly training on the weakly aligned dataset which shows the effectiveness of our pipeline. We also showed the ability of our pipeline to generate a new synthetic dataset from just DSLR RGB images. Lastly, we addressed the problem of missed global information in the learned ISP networks. We proposed a novel color module that utilizes the global information from the full raw image in addition to local information from the input raw patch. Our module is a general module that can be integrated with any ISP Network to improve its color reproduction accuracy. We achieved state-of-the-art performance by utilizing our simple and efficient color module with a simple ISP network. We showed that by just utilizing the global information from the full image we can immensely improve the performance of ISP Networks

NTNU Open (Norwegian University of Science and Technology)

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Efficient Image Restoration on Mobile Devices with Deep Learning

Author: Ihnatov Andrii
Publication venue
Publication date: 01/01/2022
Field of study

ETHzürich Repository for Publications and Research Data