1,720,979 research outputs found

    Montero2K: Hate Speech Against Women Politicians [Dataset]

    No full text
    If you use this dataset, please cite the following paper: Iranzo-Cabrera, Maria; Castro-Bleda, Maria Jose; Simon-Astudillo, Iris; Hurtado, Lluis-F (2024). Journalists’ Ethical Responsibility: Tackling Hate Speech Against Women Politicians in Social Media Through Natural Language Processing Techniques. Social Science Computer Review 0(0), 1-28. DOI: 10.1177/08944393241269417The "Montero2K" dataset is composed by 2,239 tweets, selected on the specific criteria that each message had garnered more than 50 likes or retweets during the specified period. Human experts manually labeled these tweets with “hate” and “improper language” categories.Iranzo Cabrera, M.; Castro Bleda, MJ.; Simon Astudillo, I.; Hurtado Oliver, LF. (2024). Montero2K: Hate Speech Against Women Politicians [Dataset]. Universitat Politècnica de València. https://doi.org/10.4995/Dataset/10251/21333

    Journalists' Ethical Responsibility: Tackling Hate Speech Against Women Politicians in Social Media Through Natural Language Processing Techniques

    Full text link
    [EN] Social media has led to a redefinition of the journalist's role. Specifically on Twitter, these professionals assume an influential position and their discourse is dominated by personal opinions. Taking into consideration that this platform has proven to be a breeding ground for polarization, digital harassment and hate speech, notably against women politicians, this research aims to analyze journalists' involvement in this complex scenario. The investigation aims to determine whether, immersed in online and gender defamation campaigns, journalists enhance the quality of public debate or, on the contrary, they reinforce the visibility of this hostile content. To this end, we examined a sample of 63,926 tweets published from 23 to 25 November 2022 related to a campaign of political violence against the Spanish Minister of Equality using Natural Language Processing tools and qualitative content analysis. Results show that during those three days, at least half of the tweets contained hate speech and improper language. In this climate of hostility, journalists participating in the debate not only have an ability to attract likes and retweets but also exhibit polarization and use hate speech. Each ideological position-for and against the Minister-is also reflected in their own uncivil strategies. Under the umbrella of free speech and regardless of argumentative discourses, those journalists who lean towards ideological progressivism tend to insult their opponents, and those on the political right use divisive constructions, stereotyping and irony as attack techniques.This work was supported by two Ministerio de Ciencia, Innovacion y Universidades (Spain) and ERDF A way of making Europe (PID2021-126061OB-C41) and Ministerio de Ciencia e Innovacion (PID2020-113574RB-I00).Iranzo-Cabrera, M.;Castro-Bleda, Maria Jose;Simon-Astudillo, I.;Hurtado Oliver, Lluis Felip (2025). Journalists' Ethical Responsibility: Tackling Hate Speech Against Women Politicians in Social Media Through Natural Language Processing Techniques. Social Science Computer Review. 43(3):475-502. https://doi.org/10.1177/08944393241269417S47550243

    Digitization and recognition of Jacquard cards for textile design preservation

    No full text
    [EN] This paper introduces an advanced, structured approach to preserving textile design, specifically targeting the intricate silk motifs woven using Jacquard machines in the textile industry. Traditional Jacquard machines utilize perforated cards, where specific hole patterns guide needle movements to create complex designs. Due to deterioration over time, these cards need replacement to maintain design integrity, making digitization essential for preserving cultural heritage. To address this need, we developed a multi-step, computer vision-based method that extracts design information from images of these Jacquard cards. The process involves 1) segmenting each card from images, 2) identifying and locating hole patterns, 3) refining hole detection through interpolation and statistical validation, and 4) post-processing for error detection and correction. This results in highly accurate, vectorized digital files, which can be replicated using laser technology to produce new cards. Additionally, an algorithm generates corresponding fabric patterns directly from the digital designs. This systematic approach not only facilitates the preservation and replication of historic silk motifs but also supports ongoing innovation in textile design.We are grateful for the financial support from by ValgrAI - Valencian Graduate School and Research Network of Artificial Intelligence and the Generalitat Valenciana, the Spanish Ministerio de Ciencia e Innovación and European Union under project BEWORD PID2021-126061OB-C41, and by the Generalitat Valenciana under project PROMETEO/2020/024. Funding for open access charge: CRUE-Universitat Politècnica de València.Marc Rodas Lorente;España Boquera, Salvador;Castro-Bleda, Maria Jose;Robes Martin, E. (2025). Digitization and recognition of Jacquard cards for textile design preservation. Multimedia Tools and Applications. 84:39499-39521. https://doi.org/10.1007/s11042-025-20917-9S39499395218

    UX-comments: Evaluation of User eXperience in E-learning [Dataset]

    No full text
    If you use this dataset, please cite the following paper: Sanchis-Font, Rosario; Castro-Bleda, Maria Jose; Gonzalez, Jose-Angel; Plal, Ferran; Hurtado, Lluis-F (2021). Cross-Domain Polarity Models to Evaluate User eXperience in E-learning. Neural Processing Letters, 53(5), 3199-3215. DOI: 10.1007/s11063-020-10260-5Virtual learning environments are growing in importance as fast as e-learning, which is becoming highly demanded by universities and students worldwide. We have investigated how to automatically evaluate User eXperience in this domain using sentiment analysis techniques. For this purpose, the UX-comments corpus has been built with the opinions of 583 users (107 English speakers and 476 Spanish speakers) about three learning management systems in different courses. All the collected opinions were manually labeled with polarity information (P=positive, N=negative, or NEU=neutral) by three human annotators, both at the whole opinion and sentence levels. Dataset Information P NEU N Total Spanish Observations 338 53 85 476 Spanish Sentences 404 41 142 587 English Observations 56 21 30 107 English Sentences 90 14 80 184Special thanks to the following biomedical organizations: Fundación IVI and Medigene Press S.L.; both have provided data from their Master and Posgraduate Courses through the academic stay research of Rosario Sanchis-Font, during 2017 and 2018. Many thanks to Carlos Turró-Ribalta and Ignacio Despujol-Zabala for supporting this research with data from UPV MOOCs.Sanchis Font, R.; Castro Bleda, MJ.; González Barba, JÁ.; Pla Santamaría, F.; Hurtado Oliver, LF. (2021). UX-comments: Evaluation of User eXperience in E-learning [Dataset]. Universitat Politècnica de València. https://doi.org/10.4995/Dataset/10251/21333

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    A Spanish dataset for reproducible benchmarked offline handwriting recognition

    Full text link
    [EN] In this paper, a public dataset for Offline Handwriting Recognition, along with an appropriate evaluation method to provide benchmark indicators at sentence level, is presented. This dataset, called SPA-Sentences, consists of offline handwritten Spanish sentences extracted from 1617 forms produced by the same number of writers. A total of 13,691 sentences comprising around 100,000 word instances out of a vocabulary of 3288 words occur in the collection. Careful attention has been paid to make the baseline experiments both reproducible and competitive. To this end, experiments are based on state-of-the-art recognition techniques combining convolutional blocks with one-dimensional Bidirectional Long Short Term Memory (LSTM) networks using Connectionist Temporal Classification (CTC) decoding. The scripts with the entire experimental setting have been made available. The SPA-Sentences dataset and its baseline evaluation are freely available for research purposes via the institutional University repository. We expect the research community to include this corpus, as is usually done with English IAM and French RIMES datasets, in their battery of experiments when reporting novel handwriting recognition techniques.España Boquera, S.; Castro-Bleda, MJ. (2022). A Spanish dataset for reproducible benchmarked offline handwriting recognition. Language Resources and Evaluation. 56(3):1009-1022. https://doi.org/10.1007/s10579-022-09587-3S10091022563Amengual, J. C., Benedí, J. M., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V. M., Llorens, D., Marzal, A., Prat, F., Vilar, J.M., Benedí, J.M., Casacuberta, F., Pastor, M., & Vidal. E. (2000). The EUTRANS-I speech translation system. Machine Translation Journal, 15, 75–103.Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Han, T., Hannun, A., Jun, B., LeGresley, P., Lin, L., Narang, S., Ng, A., Ozair, S., Prenger, R., Raiman, J., Satheesh, S., Seetapun, D., Sengupta, S., Wang, Y., Wang, Z., Wang, C., Xiao, B., Yogatama, D., Zhan, J., & Zhu. Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd international conference on international conference on machine learning (ICML) (Vol. 48, pp. 173–182). JMLR.org.Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759. http://arxiv.org/abs/1410.0759.Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A Matlab-like environment for machine learning. In Proceedings of big learning 2011: NIPS 2011 workshop on algorithms, systems, and tools for learning at scale.Díaz-Verdejo, J. E., Peinado, A. M., Rubio, A. J., Segarra, E., Prieto, N., & Casacuberta, F. (1998). ALBAYZIN: A task-oriented Spanish speech corpus. In Proceedings of the first international conference on language resources and evaluation (LREC) (pp. 497–501). Granada, Spain.Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recurrent neural networks for offline handwriting recognition. In Proceedings of the 14th international conference on frontiers in handwriting recognition (ICFHR) (pp. 279–284). IEEE.España Boquera, S., Castro Bleda, M. J., & Hidalgo, J. L. (2004). The SPARTACUS-Database: A Spanish sentence database for offline handwriting recognition. In Proceedings of the fourth international conference on language resources and evaluation (LREC) (pp. 227–230). Lisbon, Portugal.Fischer, A., Baechler, M., Garz, A., Liwicki, M., & Ingold, R. (2014). A combined system for text line extraction and handwriting recognition in historical documents. In Proceedings of the 11th IAPR international workshop on document analysis systems (DAS) (pp. 71–75). IEEE.Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground Truth Creation for Handwriting Recognition in Historical Documents. In Proceedings of the 9th IAPR international workshop on document analysis systems (DAS) (pp. 3–10). ACM, New York, NY, USA. https://doi.org/10.1145/1815330.1815331.Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of machine learning research, 3(Aug), 115–143.Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (ICML) (pp. 369–376). ACM.Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., & Schmidhuber, J. (2008). A novel connectionist system for unconstrained handwriting recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(5), 855–868.Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610.Graves, A., & Schmidhuber, J. (2009). Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in neural information processing systems, pp. 545–552.Grosicki, E., Carré, M., Brodin, J. M., & Geoffrois, E. (2008). RIMES evaluation campaign for handwritten mail processing. In Proceedings of the 11th international conference on frontiers in handwriting recognition (ICFHR), pp. 1–6. Concordia University, Montreal, Canada. https://hal.archives-ouvertes.fr/hal-01395332.Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transaction on Pattern Analysis and Machine Intelligence, 16(5), 550–554.Hussain, R., Raza, A., Siddiqi, I., Khurshid, K., & Djeddi, C. (2015). A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation (p. 46). Image and Video Processing: EURASIP J.Juan, A., Toselli, A. H., Domnech, J., González, J., Salvador, I., Vidal, E., & Casacuberta, F. (2004). Integrated handwriting recognition and interpretation via finite-state models. International Journal of Pattern Recognition and Artificial Intelligence, 18(04), 519–539.LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 86(11), 2278–2324Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the international conference on international conference on machine learning (ICML) (Vol. 30, p. 3).Marti, U. V., & Bunke, H. (2002). The IAM-database: An English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5, 39–46.Mocholí Calvo, C., Mocholí-Calvo Mocholí-Calvo, C. Tutored by E. VIdal and J. Puigcerver. (2017–2018). Development and experimentation of a deep learning system for convolutional and recurrent neural networks. Master’s thesis, ETSINF Universitat Politècnica de València, Valencia (Spain).Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In Proceedings of the 31st conference on neural information processing systems (NIPS). Long Beach, CA, USA.Pérez, D., Tarazón, L., Serrano, N., Castro, F., Terrades, O.R., & Juan-Císcar, A. (2009). The GERMANA database. In 10th International conference on document analysis and recognition (pp. 301–305).Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. Technical report: IEEE signal processing society.Puigcerver, J. (2017). Are multidimensional recurrent layers really necessary for handwritten text recognition? In Proceedings of the 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 01, pp. 67–72). https://doi.org/10.1109/ICDAR.2017.20.Puigcerver, J., Martin-Albo, D., & Villegas, M. (2016). Laia: A deep learning toolkit for HTR.Sabir, E., Rawls, S., & Natarajan, P. (2017). Implicit language model in LSTM for OCR. In Proceedings of the 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 7, pp. 27–31). IEEE.Sanchez, J. A., Toselli, A. H., Romero, V., & Vidal, E. (2015). ICDAR 2015 competition HTRtS: Handwritten text recognition on the tranScriptorium dataset. In Proceedings of the 13th international conference on document analysis and recognition (ICDAR).Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.Slavik, P., & Govindaraju, V. (2001). Equivalence of Different Methods for Slant and Skew Corrections in Word Recognition Applications. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(3), 323–326.Suen, C. Y., Nadal, C., Legault, R., Mai, T. A., & Lam, L. (1992). Computer recognition of unconstrained handwritten numerals. Special Issue of Proceedings of IEEE, 7(80), 1162–1180.Toselli, A. H., Romero, V., & Vidal, E. (2007). Viterbi based alignment between text images and their transcripts. In Proceedings of the workshop on language technology for cultural heritage data (LaTeCH) (pp. 9–16).Viard-Gaudin, C., Lallican, P. M., Knerr, S., & Binter, P. (1999). The IRESTE on/off (IRONOFF) dual handwriting database. In Proceedings of the fifth international conference on document analysis and recognition (ICDAR) (pp. 455–458). Bangalore, India.Wilkinson, R., Geist, J., Janet, S., Grother, P., Burges, C., Creecy, R., Hammond, B., Hull, J., Larsen, N., Vogl, T., & Wilson, C. (1992). The first census optical character recognition systems conference. In #NISTIR 4912. The U.S. Bureau of Census and the National Institute of Standards and Technology, Gaithersburg, MD

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore