1,721,108 research outputs found
An intelligent decision support system for software plagiarism detection in academia
The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre-processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
An efficient framework for text document security and privacy
Nowadays, with the help of advanced technologies, an illegal copy of digital content can be shared easily. Which rise copyright and authentication problems. Digital text documents are generated and shared daily through different internet technologies such as the cloud, etc. The protection of these documents is a challenging task for researchers. In the past, steganography, cryptography, and watermarking techniques have been applied to resolve the copyright problem. However, most of the existing techniques are applicable for only plain text or protecting the document on the local paradigm. In the said perspective, we proposed a new technique to solve the problem of copyright and authentication on local and cloud paradigms. In this paper, we utilize some custom components of MS Word Document for concealing the watermark into a text document. These components are not referred to as the main document and will not modify the content and format. The experimental analysis and results prove that the proposed method improves the watermark capacity, imperceptible, and robust against formatting attacks
RUDA-2025: Depression Severity Detection Using Pre-Trained Transformers on Social Media Data
Depression is a serious mental health disorder affecting cognition, emotions, and behavior. It impacts over 300 million people globally, with mental health care costs exceeding $1 trillion annually. Traditional diagnostic methods are often expensive, time-consuming, stigmatizing, and difficult to access. This study leverages NLP techniques to identify depressive cues in social media posts, focusing on both standard Urdu and code-mixed Roman Urdu, which are often overlooked in existing research. To the best of our knowledge, a script-conversion and combination-based approach for Roman Urdu and Nastaliq Urdu has not been explored earlier. To address this gap, our study makes four key contributions. First, we created a manually annotated dataset named Ruda-2025, containing posts in code-mixed Roman Urdu and Nastaliq Urdu for both binary and multiclass classification. The binary classes are depression” and not depression, with the depression class further divided into fine-grained categories: Mild, Moderate, and Severe depression alongside not depression. Second, we applied first-time two novel techniques to the RUDA-2025 dataset: (1) script-conversion approach that translates between code-mixed Roman Urdu and Standard Urdu and (2) combination-based approach that merges both scripts to make a single dataset to address linguistic challenges in depression assessment. Finally, we employed 60 different experiments using a combination of traditional machine learning and deep learning techniques to find the best-fit model for the detection of mental disorder. Based on our analysis, our proposed model (mBERT) using custom attention mechanism outperformed baseline (XGB) in combination-based, code-mixed Roman and Nastaliq Urdu script conversions
Android-IoT Malware Classification and Detection Approach Using Deep URL Features Analysis
Currently, malware attacks pose a high risk to compromise the security of Android-IoT apps. These threats have the potential to steal critical information, causing economic, social, and financial harm. Because of their constant availability on the network, Android apps are easily attacked by URL-based traffic. In this paper, an Android malware classification and detection approach using deep and broad URL feature mining is proposed. This study entails the development of a novel traffic data preprocessing and transformation method that can detect malicious apps using network traffic analysis. The encrypted URL-based traffic is mined to decrypt the transmitted data. To extract the sequenced features, the N-gram analysis method is used, and afterward, the singular value decomposition (SVD) method is utilized to reduce the features while preserving the actual semantics. The latent features are extracted using the latent semantic analysis tool. Finally, CNN-LSTM, a multi-view deep learning approach, is designed for effective malware classification and detection
- …
