1,721,065 research outputs found

    Inscriptions visual recognition. A comparison of state-of-the-art object recognition approaches

    No full text
    In this paper, we consider the task of recognizing inscriptions in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 inscriptions, we used a ð'~-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in comparing state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating Scale Invariant Feature Transform descriptors is the best choice for this task

    Using AI to decode the behavioral responses of an insect to chemical stimuli: towards machine-animal computational technologies

    Full text link
    Orthoptera are insects with excellent olfactory sense abilities due to their antennae richly equipped with receptors. This makes them interesting model organisms to be used as biosensors for environmental and agricultural monitoring. Herein, we investigated if the house cricket Acheta domesticus can be used to detect different chemical cues by examining the movements of their antennae and attempting to identify specific antennal displays associated to different chemical cues exposed (e.g., sucrose or ammonia powder). A neural network based on state-of-the-art techniques (i.e., SLEAP) for pose estimation was built to identify the proximal and distal ends of the antennae. The network was optimised via grid search, resulting in a mean Average Precision (mAP) of 83.74%. To classify the stimulus type, another network was employed to take in a series of keypoint sequences, and output the stimulus classification. To find the best one-dimensional convolutional and recurrent neural networks, a genetic algorithm-based optimisation method was used. These networks were validated with iterated K-fold validation, obtaining an average accuracy of 45.33% for the former and 44% for the latter. Notably, we published and introduced the first dataset on cricket recordings that relate this animal’s behaviour to chemical stimuli. Overall, this study proposes a novel and simple automated method that can be extended to other animals for the creation of Biohybrid Intelligent Sensing Systems (e.g., automated video-analysis of an organism’s behaviour) to be exploited in various ecological scenarios

    Aggregating Local Descriptors for Epigraphs Recognition

    Full text link
    In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.The Fourth International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage—DiPP2014 is supported by the Ministry of Education and Science and is under the patronage of UNESCO

    ARTEMIS: animal recognition through enhanced multimodal integration system

    Full text link
    This paper introduces Animal Recognition Through Enhanced Multimodal Integration System (ARTEMIS), a transformer-based framework designed for multilabel animal action recognition by fusing video, image, and textual modalities. ARTEMIS utilizes state-of-the-art captioning and language models, such as BLIP2 and Llama 3, to generate textual descriptions from video frames, which are input to the model, significantly enhancing its performance unlikely previous results that do not consider this modality. Through comprehensive ablation studies, we explore the contribution of various model components and propose optimization strategies, including genetic algorithms and reinforcement learning, to dynamically adjust ensemble weights. Our feature alignment techniques-using contrastive and cosine similarity losses-further improve multimodal integration. Evaluations on the Animal Kingdom dataset, which includes 30,100 clips across 140 action classes, demonstrate that ARTEMIS achieves a new state-of-the-art mAP of 79.82, outperforming existing methods. The combination of multimodal fusion and ensemble strategies makes ARTEMIS a robust solution for complex animal action recognition tasks. The code of our fusion method is available at https://github.com/edofazza/ARTEMIS

    Enhancing Author Name Disambiguation Workflows in Big Data Scholarly Knowledge Graphs

    No full text
    Open Science, defined by its commitment to transparency, collaboration, openness, and accessibility, has deeply affected scientific research. Following this new paradigm, scientists produce and publish research data and software alongside research publications to enable reproducibility, monitoring, and assessment of science. In this context, Scholarly Knowledge Graphs (SKGs) are “big data” metadata collections, playing a crucial role in research discovery and assessment by aggregating bibliographic metadata records and semantic relationships describing research products and their associations between them (e.g., citations, versions) and with other entities, such as organizations, authors, funders, etc. Examples of SKGs are the OpenAIRE Graph, Google Scholar, OpenAlex, Semantic Scholar, OpenCitations, and ResearchGraph.org. However, constructing and maintaining SKGs demands innovative solutions to address the inherent scalability, heterogeneity, duplication, inconsistency, and incompleteness challenges introduced by the metadata sources to be aggregated. Motivated by the urge of Open Science and the challenges posed by SKG construction, this Ph.D. thesis makes pioneering contributions to the field of Author Name Disambiguation (AND). This perennial issue addresses the challenge of identifying and removing duplicate author nodes representing the same author in the SKG. Acknowledging the pivotal role of AND, the thesis discerns two main interwoven imperatives in the duplicate resolution processes: mitigating the efficiency challenge derived by the inherent quadratic complexity in comparing hundreds of millions of author nodes; and the effectiveness challenge introduced by the efficiency optimization strategies, which renounce parts of the matches, and affected by the poverty of metadata used to compare author nodes, which is often limited to the name’s string. To address the efficiency challenge, the thesis introduces FDup, a groundbreaking framework meticulously designed to reimagine and enhance the traditional disambiguation workflow. At its core, FDup prioritizes the optimization of the similarity match phase. This optimization is achieved through the incorporation of a decision tree-based comparison technique. This innovative approach ensures a customizable and efficient disambiguation workflow and enables parallelization, a crucial aspect in handling the substantial datasets inherent in Scholarly Knowledge Graphs. To address the effectiveness challenge, the thesis leverages Graph Neural Networks (GNNs), which have been recently successfully applied to perform innovative research on node classification, graph classification, and link prediction. The proposed contributions manifest in two dedicated GNN architectures to enhance the effectiveness of Author Name Disambiguation via an evaluation of the outputs of a disambiguation algorithm: the first technique evaluates similarity relationships with an attentive neural network integrating GraphSAGE models; the second technique evaluates groups of duplicates with a combination of Graph Attention Network (GAT) and Long Short Term Memory (LSTM) components. In summary, this thesis is a responsive and forward-thinking contribution within the landscape of Open Science and Scholarly Knowledge Graphs. By introducing novel frameworks and harnessing advanced techniques like Graph Neural Networks, the thesis not only addresses the current challenges but also lays the groundwork for the continual evolution of Open Science practices and the optimal utilization of Scholarly Knowledge Graphs in the ever-expanding realm of scientific knowledge

    Some theoretical and experimental observations on permutation spaces and similarity search

    No full text
    Permutation based approaches represent data objects as ordered lists of predefined reference objects. Similarity queries are executed by searching for data objects whose permutation representation is similar to the query one. Various permutation-based indexes have been recently proposed. They typically allow high efficiency with acceptable effectiveness. Moreover, various parameters can be set in order to find an optimal trade-off between quality of results and costs. In this paper we studied the permutation space without referring to any particular index structure focusing on both theoretical and experimental aspects. We used both synthetic and real-word datasets for our experiments. The results of this work are relevant in both developing and setting parameters of permutation-based similarity searching approaches

    AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models

    No full text
    This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss

    Selective state models are what you need for animal action recognition

    Full text link
    Recognizing animal actions provides valuable insights into animal welfare, yielding crucial information for agricultural, ethological, and neuroscientific research. While video-based action recognition models have been applied to this task, current approaches often rely on computationally intensive Transformer layers, limiting their practical application in field settings such as farms and wildlife reserves. This study introduces Mamba-MSQNet, a novel architecture family for multilabel Animal Action Recognition using Selective Space Models. By transforming the state-of-the-art MSQNet model with Mamba blocks, we achieve significant reductions in computational requirements: up to 90% fewer Floating point OPerations and 78% fewer parameters compared to MSQNet. These optimizations not only make the model more efficient but also enable it to outperform Transformer-based counterparts on the Animal Kingdom dataset, achieving a mean Average Precision of 74.6, marking an improvement over previous architectures. This combination of enhanced efficiency and improved performance represents a significant advancement in the field of animal action recognition. The dramatic reduction in computational demands, coupled with a performance boost, opens new possibilities for real-time animal behavior monitoring in resource-constrained environments. This enhanced efficiency could revolutionize how we observe and analyze animal behavior, potentially leading to breakthroughs in animal welfare assessment, behavioral studies, and conservation efforts

    Mind the Prompt: A Novel Benchmark for Prompt-Based Class-Agnostic Counting

    No full text
    Recently, object counting has shifted towards classagnostic counting (CAC), which counts instances of arbitrary object classes never seen during model training. With advancements in robust vision-and-language foundation models, there is a growing interest in prompt-based CAC, where object categories are specified using natural language. However, we identify significant limitations in current benchmarks for evaluating this task, which hinder both accurate assessment and the development of more effective solutions. Specifically, we argue that the current evaluation protocols do not measure the ability of the model to understand which object has to be counted. This is due to two main factors: (i) the shortcomings of CAC datasets, which primarily consist of images containing objects from a single class, and (ii) the limitations of current counting performance evaluators, which are based on traditional class-specific counting and focus solely on counting errors. To fill this gap, we introduce the Prompt-Aware Counting (PrACo) benchmark. It comprises two targeted tests coupled with evaluation metrics specifically designed to quantitatively measure the robustness and trustworthiness of existing prompt-based CAC models. We evaluate state-of-the-art methods and demonstrate that, although some achieve impressive results on standard class-specific counting metrics, they exhibit a significant deficiency in understanding the input prompt, indicating the need for more careful training procedures or revised designs. The code for reproducing our results is available at https://github.com/ciampluca/PrACo
    corecore