1,720,977 research outputs found
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Despite its crucial role in research experiments, code correctness is often presumed solely based on the perceived quality of results. This assumption, however, comes with the risk of erroneous outcomes and, in turn, potentially misleading findings. To mitigate this risk, we posit that the current focus on reproducibility should go hand in hand with the emphasis on software quality. We support our arguments with a case study in which we identify and fix three bugs in widely used implementations of the state-of-the-art Conformer architecture. Through experiments on speech recognition and translation in various languages, we demonstrate that the presence of bugs does not prevent the achievement of good and reproducible results, which however can lead to incorrect conclusions that potentially misguide future research. As countermeasures, we release pangoliNN, a library dedicated to testing neural models, and propose a Code-quality Checklist, with the goal of promoting coding best practices and improving software quality within the NLP community
Real-time flood maps forecasting for dam-break scenarios with a transformer-based deep learning model
This paper presents a purely data-driven deep-learning approach for flood maps forecasting. For the first time in this context a Transformer-based algorithm is employed to address one of the main issues in early-warning systems for flood propagation, i.e., the long computational times required to forecast the inundation evolution in real time. The proposed model, named “FloodSformer”, is trained to extract the spatiotemporal information from a short sequence of water depth maps and predict the water depth map at one subsequent instant. Then, to forecast a sequence of future maps, we employ an autoregressive procedure based on the trained surrogate model. The method was applied to both synthetic dam-break scenarios and to a real case study, specifically the ideal failure of the Parma River dam (Italy). The training and testing datasets were generated numerically from two-dimensional hydraulic simulations. In the case of the real test case, the average Root Mean Square Error was found to be equal to 10.4 cm. The short computational time (e.g., the forecast of 90 maps, representing a lead time of 3 h, takes less than 1 min) makes the FloodSformer model a suitable tool for real-time emergency applications
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of many different visual tasks. In this context, recent approaches have employed the Masked Image Modeling paradigm, which pre-trains a backbone by reconstructing visual tokens associated with randomly masked image patches. This masking approach, however, introduces noise into the input data during pre-training, leading to discrepancies that can impair performance during the fine-tuning phase. Furthermore, input masking neglects the dependencies between corrupted patches, increasing the inconsistencies observed in downstream fine-tuning tasks. To overcome these issues, we propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT), that employs autoregressive and permuted predictions to capture intra-patch dependencies. In addition, MaPeT employs auxiliary positional information to reduce the disparity between the pre-training and fine-tuning phases. In our experiments, we employ a fair setting to ensure reliable and meaningful comparisons and conduct investigations on multiple visual tokenizers, including our proposed k-CLIP which directly employs discretized CLIP features. Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting. We release an implementation of our code and models at https://github.com/aimagelab/MaPeT
A Transformer-Based Data-Driven Model for Real-Time Spatio-Temporal Flood Prediction
Among the non-structural strategies for mitigating the huge economic losses and casualties caused by floods, the implementation of early-warning systems based on real-time forecasting of flood maps is one of the most effective. The high computational cost associated with two-dimensional (2D) hydrodynamic models, however, prevents their practical application in this context. To overcome this drawback, “data-driven” models are gaining considerable popularity due to their high computational efficiency for predictions. In this work, we introduce a novel surrogate model based on the Transformer architecture, named FloodSformer (FS), that efficiently predicts the temporal evolution of inundation maps, with the aim of providing real-time flood forecasts. The FS model combines an encoder-decoder (2D Convolutional Neural Network) with a Transformer block that handles temporal information. This architecture extracts the spatiotemporal information from a sequence of consecutive water depth maps and predicts the water depth map at one subsequent instant. An autoregressive procedure, based on the trained surrogate model, is employed to forecast tens of future maps.
As a case study, we investigated the hypothetical inundation due to the collapse of the flood-control dam on the Parma River (Italy). Due to the absence of real inundation maps, the training/testing dataset for the FS model was generated from numerical simulations performed through a 2D shallow‐water code (PARFLOOD). Results show that the FS model is able to recursively forecast the next 90 water depth maps (corresponding to 3 hours for this case study, in which maps are sampled at 2-minute intervals) in less than 1 minute. This is achieved while maintaining an accuracy deemed entirely acceptable for real-time applications: the average Root Mean Square Error (RMSE) is about 10 cm, and the differences between ground-truth and predicted maps are generally lower than 25 cm in the floodable area for the first 60 predicted frames. In conclusion, the short computational time and the good accuracy ensured by the autoregressive procedure make the FS model suitable for early-warning systems
Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks
While recent deep monocular depth estimation approaches based on supervised regression have achieved remarkable performance, costly ground truth annotations are required during training. To cope with this issue, in this paper we present a novel unsupervised deep learning approach for predicting depth maps and show that the depth estimation task can be effectively tackled within an adversarial learning framework. Specifically, we propose a deep generative network that learns to predict the correspondence field (i.e. the disparity map) between two image views in a calibrated stereo camera setting. The proposed architecture consists of two generative sub-networks jointly trained with adversarial learning for reconstructing the disparity map and organized in a cycle such as to provide mutual constraints and supervision to each other. Extensive experiments on the publicly available datasets KITTI and Cityscapes demonstrate the effectiveness of the proposed model and competitive results with state of the art methods. The code is available at https://github.com/andrea-pilzer/unsup-stereo-depthGAN
DIETA: A Decoder-only Transformer-based Model for Italian-English Machine TrAnslation
In this paper, we present DIETA, a small, decoder-only Transformer model with 0.5 billion parameters, specifically designed and trained for Italian–English machine translation. We collect and curate a large parallel corpus consisting of approximately 207 million Italian–English sentence pairs across diverse domains, including parliamentary proceedings, legal texts, web-crawled content, subtitles, news, literature and 352 million back-translated data using pretrained models. Additionally, we create and release a new small-scale evaluation set, consisting of 450 sentences, based on 2025 WikiNews articles, enabling assessment of translation quality on contemporary text. Comprehensive evaluations show that DIETA achieves competitive performance on multiple Italian–English benchmarks, consistently ranking in the second quartile of a 32-system leaderboard and outperforming most other sub-3B models on four out of five test suites. The training script, trained models, curated corpus, and newly introduced evaluation set are made publicly available, facilitating further research and development in specialized Italian–English machine translation: https://github.com/pkasela/DIETA-Machine-Translation
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
