1,721,227 research outputs found
Towards the exploitation of statistical language models for plagiarism detection with reference
To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection
CLEF 2013 and Beyond: Evolution of the CLEF Initiative
[EN] The CLEF Initiative is structured in two main parts: a series of Evaluation
Labs, to conduct evaluation of information access systems and a peer-reviewed Conference on a broad range of issues on evaluation. The annual CLEF events have been partially supported by the EU FP7 PROMISE project (contract n. 258191) and by the ELIAS RNP network.The annual CLEF events have been partially supported by the EU FP7 PROMISE project (contract n. 258191) and by the ELIAS RNP network.Ferro, N.; Rosso, P. (2013). CLEF 2013 and beyond: evolution of the CLEF initiative. Ercim News. 95:51-52. https://riunet.upv.es/handle/10251/46694S51529
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Author Profiling Tracks at FIRE
[EN] Benchmarking activities are vital for fostering research and addressing new challenging problems. During the last 10 years of the FIRE initiative we have been involved in the organization of more than ten tracks, with the aim of the creation of new resources in several languages that were made available to the research community. This allowed to compare the new several approaches on the same datasets. In this chapter we will focus on the description of three author profiling tracks, on their data creation as well as the results analysis.The work on the author profiling data in Arabic was made possible by NPRP Grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authorsRosso, P.; Rangel Pardo, FM. (2020). Author Profiling Tracks at FIRE. SN Computer Science. 1:1-11. https://doi.org/10.1007/s42979-020-0073-1S1111Al Sukhni E, Alequr Q. Investigating the use of machine learning algorithms in detecting gender of the Arabic tweet author. Int J Adv Comput Sci Appl. 2016;1(7):319–28.Alsmearat K, Al-Ayyoub M, Al-Shalabi R. An extensive study of the bag-of-words approach for gender identification of Arabic articles. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA). 2014. pp 601–608. IEEE.Alsmearat K, Shehab M, Al-Ayyoub M, Al-Shalabi R, Kanaan G. Emotion analysis of Arabic articles and its impact on identifying the authors gender. In: 12th international conference on computer systems and applications (AICCSA), 2015 IEEE/ACS; 2015.Álvarez-Carmona MA, López-Monroy AP, Montes-Y-Gómez M, Villaseñor-Pineda L, Jair-Escalante H. Inaoe’s participation at pan’15: author profiling task—notebook for pan at clef 2015; 2015.Argamon S, Koppel M, Fine J, Shimoni AR. Gender, genre, and writing style in formal written texts. TEXT. 2003;23:321–46.Argamon S, Dhawle S, Koppel M, Pennebaker JW. Lexical predictors of personality type. In: Proceedings of the joint annual meeting of the interface and the classification society of North America; 2005.Asghari H, Mohtaj S, Fatemi O, Faili H, Rosso P, Potthast M. Algorithms and corpora for Persian plagiarism detection: overview of pan at fire 2016. In: Notebook Papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR Workshop Proceedings. CEUR-WS.org, vol 1737; 2016. pp 135–144.Bachrach Y, Kosinski M, Graepel T, Kohli P, Stillwell D. Personality and patterns of Facebook usage. In: Proceedings of the ACM web science conference. ACM New York, NY, USA; 2012. pp 36–44.Banerjee S, Chakma K, Naskar DA Sudip, Rosso P, Bandyopadhyay S, Choudhury M. Overview of the mixed script information retrieval (MSIR) at fire-2016. In: Notebook papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR workshop proceedings. CEUR-WS.org, vol 1737; 2016. pp 94–99.Barrón-Cedeño A, Rosso P, Lalitha-Devi S, Clough P, Stevenson M. Pan@fire: Overview of the cross-language !ndian text re-use detection competition. In: 2nd and 3th international workshops FIRE 2010 and 2011, multilingual information access in south Asian Languages, Springer, LNCS(7536); 2013. pp 59–70.Bensalem I, Boukhalfa I, Rosso P, Abouenour L, Darwish K, Chikhi S. Overview of the araplagdet pan@ fire2015 shared task on Arabic plagiarism detection. In: Notebook papers of FIRE 2015, FIRE-2015, Gandhinagar, India, December 4–6, CEUR Workshop Proceedings. CEUR-WS.org, vol 1587; 2015. pp 111–122.Bishop-Clark C. Cognitive style, personality, and computer programming. Computers in human behavior, vol. 11–2. New York: Elsevier; 1995. p. 241–60.Castro D, Souza E, de Oliveira AL. Discriminating between brazilian and european portuguese national varieties on twitter texts. In: 5th Brazilian conference on intelligent systems (BRACIS); 2016. pp 265–270.Celli F, Polonio L. Relationships between personality and interactions in Facebook. Social networking: recent trends, emerging issues and future outlook. New York: Nova Science Publishers Inc; 2013. p. 41–54.Celli F, Lepri B, Biel JI, Gatica-Perez D, Riccardi G, Pianesi F. The workshop on computational personality recognition 2014. In: Proceedings of the ACM international conference on multimedia, ACM; 2014. pp 1245–1246.Costa PT, McCrae RR. The revised neo personality inventory (neo-pi-r). The SAGE handbook of personality theory and assessment, vol. 2. Thousand Oaks: Sage Publications Inc.; 2008. p. 179–98.Elfardy H, Diab MT. Sentence level dialect identification in Arabic. In: Association for computational linguistics (ACL); 2013. pp 456–461.Estival D, Gaustad T, Hutchinson B, Bao-Pham S, Radford W. Author profiling for English and Arabic emails; 2008.Flores E, Rosso P, Moreno L, Villatoro-Tello E. Pan@fire: Overview of SOCO track on the detection of source code re-use. In: Notebook papers of FIRE, FIRE-2014. India: Bangalore; 2014.Flores E, Rosso P, Moreno L, Villatoro-Tello E. Pan@ fire 2015: Overview of cl-soco track on the detection of cross-language source code re-use. In: Proceedings of the seventh forum for information retrieval evaluation (FIRE 2015), Gandhinagar, India; 2015. pp 4–6.Franco-Salvador M, Rangel F, Rosso P, Taule M, Marti M. Language variety identification using distributed representations of words and documents. Experimental IR meets multilinguality, multimodality, and interaction. Berlin: Springer; 2015. p. 28–40.Golbeck J, Robles C, Turner K. Predicting personality with social media. In: CHI’11 extended abstracts on human factors in computing systems, ACM; 2011. pp 253–262.Gupta P, Clough P, Rosso P, Stevenson M. Pan@fire: Overview of the cross-language Indian news story search (CLINSS) track. In: Notebook papers of FIRE 2012, FIRE-2012, Kolkata, India, December 17–19; 2012.Gupta P, Clough P, Rosso P, Stevenson M, Banchs R. Pan@fire: Overview of the cross-language Indian news story search (CLINSS) track. In: Notebook Papers of FIRE 2013, FIRE-2013, Delhi, India, December 4–6; 2013.Holmes J, Meyerhoff M. The handbook of language and gender. Blackwell handbooks in linguistics. New York: Wiley; 2003.Huang C, Lee L. Contrastive approach towards text source classification based on top-bag-of-word similarity. In: In PACLIC; 2008. pp 404–410.Karimi Z, Baraani-Dastjerdi A, Ghasem-Aghaee N, Wagner S. Links between the personalities, styles and performance in computer programming. J Syst Softw. 2016;111:228–41.Koppel M, Argamon S, Shimoni AR. Automatically categorizing written texts by author gender. Lit Linguist Comput. 2002;17:4.Kosinski M, Bachrach Y, Kohli P, Stillwell D, Graepel T. Manifestations of user personality in website choice and behaviour on online social networks. New York: Springer; 2013. p. 1–24.Litvinova T, Litvinlova O, Zagorovskaya O, Seredin P, Sboev A, Romanchenko O. “ruspersonality”: a Russian corpus for authorship profiling and deception detection. In: Intelligence, social media and web (ISMW FRUCT), 2016 international FRUCT conference on, IEEE; 2016. pp 1–7.Litvinova T, Seredin P, Litvinova O, Zagorovskaya O, Sboev A, Gudovskih D, Moloshnikov I, Rybka R. Gender prediction for authors of Russian texts using regression and classification techniques. In: CDUD@ CLA; 2016. pp 44–53.Litvinova T, Gudovskikh D, Sboev A, Seredin P, Litvinova O, Pisarevskaya D, Rosso P. Author gender prediction in Russian social media texts. In: Conference on analysis of images, social networks, and texts, AIST-2017, IEEE; 2017. pp 1101–1106.Litvinova T, Rangel F, Rosso P, Seredin P, Litvinova O. Overview of the rusprofiling pan at fire track on cross-genre gender identification in Russian. In: Notebook papers of FIRE 2017, FIRE-2017, Bangalore, India, December 8–11, CEUR Workshop Proceedings. CEUR-WS.org, vol 2036; 2017. pp 1–7.Lui M, Cook P. Classifying English documents by national dialect. In: Proceedings of the Australasian Language Technology Association Workshop; 2013. pp 5–15.Maharjan S, Shrestha P, Solorio T, Hasan R. A straightforward author profiling approach in mapreduce. In: Advances in artificial intelligence. Iberamia; 2014. pp 95–107.Maier W, Gomez-Rodriguez C. Language variety identification in Spanish tweets. In: LT4CloseLang 2014; 2014.Mairesse F, Walker MA, Mehl MR, Moore RK. Using linguistic cues for the automatic recognition of personality in conversation and text. J Artif Intell Res. 2007;30–1:457–500.Malmasi S, Zampieri M, Ljubešić N, Nakov P, Ali A, Tiedemann J. Discriminating between similar languages and Arabic dialect identification: a report on the third DSL shared task. In: Proceedings of the third workshop on NLP for similar languages, varieties and dialects (VarDial3); 2016. pp 1–14.Maulana Siagian AHA, Aritsugi M. Dbms-ku approach for author profiling and deception detection in Arabic. In: Metha P, Rosso P, Majumder P, Mitra M (Eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.Neuman Y, Cohen Y. A vectorial semantics approach to personality assessment. Sci Rep. 2014;4:4761.Oberlander J, Nowson S. Whose thumb is it anyway?: classifying author personality from weblog text. In: Proceedings of the COLING/ACL on main conference poster sessions, Association for Computational Linguistics; 2006. pp 627–634.Paruma-Pabón OH, González FA, Aponte J, Camargo JE, Restrepo-Calle F. Finding relationships between socio-technical aspects and personality traits by mining developer e-mails. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering, ACM; 2016. pp 8–14.Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54(1):547–77.Quercia D, Lambiotte R, Stillwell D, Kosinski M, Crowcroft J. The personality of popular Facebook users. In: Proceedings of the ACM 2012 conference on computer supported cooperative Work, ACM; 2012. pp 955–964.Rangel F, Rosso P. On the multilingual and genre robustness of emographs for author profiling in social media. In: 6th international conference of CLEF on experimental IR meets multilinguality, multimodality, and interaction, Springer-Verlag, LNCS(9283); 2015. pp 274–280.Rangel F, Rosso P. On the impact of emotions on author profiling. Inf Process Manag. 2016;52(1):73–92.Rangel F, Rosso P. On the implications of the general data protection regulation on the organisation of evaluation tasks. Lang Law. 2019;5:95–117.Rangel F, Rosso P. Overview of the 7th author profiling task at pan 2019: Bots and gender profiling. In: Cappellato L, Ferro N, MÃller H, Losada D (Eds) CLEF 2019 labs and workshops, notebook papers. CEUR Workshop Proceedings. CEUR-WS.org; 2019.Rangel F, Rosso P, Potthast M, Stein B, Daelemans W. Overview of the 3rd author profiling task at pan 2015. In: Cappellato L, Ferro N, Jones G, San Juan E (Eds) CLEF 2015 labs and workshops, notebook papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1391; 2015.Rangel F, González F, Restrepo-Calle F, Montes M, Rosso P. Pan at fire: Overview of the PR-SOCO track on personality recognition in source code. In: Notebook papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR workshop proceedings. CEUR-WS.org, vol 1737; 2016. pp 1–5.Rangel F, Rosso P, Franco-Salvador M. A low dimensionality representation for language variety identification. In: 17th international conference on intelligent text processing and computational linguistics, CICLing. Springer; 2016. LNCS. arXiv:1705.10754Rangel F, Rosso P, Potthast M, Stein B. Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In: Working notes papers of the CLEF 2017 evaluation labs, CLEF and CEUR-WS.org, CEUR workshop proceedings; 2017.Rangel F, Rosso P, Charfi A, Zaghouani W, Ghanem B, Sánchez-Junquera J. Overview of the track on author profiling and deception detection in Arabic. In: Metha P, Rosso P, Majumder P, Mitra M (Eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.Rangel F, Paolo R, Zaghouani W, Charfi A. Fine-grained analysis of language varieties and demographics. Nat Lang Eng; 2020. (In Press).Rosso P, Rangel F, Hernández-Farías I, Cagnina L, Zaghouani W, Charfi A. A survey on author profiling, deception, and irony detection for the Arabic language. Lang Ling Compass. 2018;12:4.Sadat F, Kazemi F, Farzindar A. Automatic identification of Arabic language varieties and dialects in social media. In: Proceedings of SocialNLP; 2014. p 22.Schler J, Koppel M, Argamon S, Pennebaker JW. Effects of age and gender on blogging. In: AAAI spring symposium: computational approaches to analyzing weblogs, AAAI; 2006. pp 199–205.Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One. 2013;8–9:773–91.Sequiera R, Choudhury M, Gupta P, Rosso P, Kumar S, Banerjee S, Kumar-Naskar S, Bandyopadhyay S, Chittaranjan G, Das A, Chakma K. Overview of fire-2015 shared task on mixed script information retrieval. In: Notebook papers of FIRE 2015, FIRE-2015, Gandhinagar, India, December 4–6, CEUR workshop proceedings. CEUR-WS.org, vol 1587; 2015. pp 19–25.Sun Y, Ning H, Chen K, Kong L, Yang Y, Wang J, Qi H. Author profiling in arabic tweets:an approach based on multi-classification with word and character features. In: Metha P, Rosso P, Majumder P, Mitra M (eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.Weren E, Kauer A, Mizusaki L, Moreira V, de Oliveira P, Wives L. Examining multiple features for author profiling. J Inf Data Manag. 2014;20:266–79.Xu F, Wang M, Li M. Sentence-level dialects identification in the greater china region. Int J Nat Lang Comput. 2016;5:6.Zaghouani W, Charfi A. Arapâ tweet: a large multiâ dialect twitter corpus for gender, age and language variety identification. In: Proceedings of the 11th international conference on language resources and evaluation (LREC), Miyazaki, Japan; 2018.Zaghouani W, Charfi A. Guidelines and annotation framework for Arabic author profiling. In: Proceedings of the 3rd workshop on open-source Arabic corpora and processing tools, 11th international conference on language resources and evaluation (LREC), Miyazaki, Japan; 2018.Zaidan OF, Callison-Burch C. Arabic dialect identification. Comput Ling. 2014;40(1):171–202.Zampieri M, Gebre B. Automatic identification of language varieties: the case of Portuguese. In: The 11th conference on natural language processing (KONVENS). Osterreichischen Gesellschaft fur Artificial Intelligende (OGAI); 2012. pp 233–237.Zampieri M, Malmasi S, Ljubešić N, Nakov P, Ali A, Tiedemann J, Scherrer Y, Aepli N. Findings of the vardial evaluation campaign 2017. In: Proceedings of the fourth workshop on NLP for similar languages, varieties and dialects; 2017. pp 1–15
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
English-Spanish large statistical dictionary of inflectional forms
The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
