Mason Journals (George Mason Univ.)
Not a member yet
    3256 research outputs found

    Evaluating Lightweight Transformer Models for Rhetorical Element Classification in Student Essays

    No full text
    As education becomes increasingly digital, automated feedback systems are an emerging technology that aim to provide timely, personalized, and high-quality feedback on students writing at a large scale. The basis of these systems is the identification of rhetorical elements in student essays, but existing approaches often use large, computationally expensive transformer models that may be impractical for classroom settings. This project aims to address the need for lightweight and effective models capable of identifying argumentative structures in student writing. To explore this, various compact transformer models (DistilBERT, TinyBERT, ELECTRA, MiniLM) are evaluated on the PERSUADE Corpus 2.0, a dataset of more than 25000 annotated argumentative essays written by students in 6th to 12th grade. Each model had to classify the rhetorical purpose of a target sentence, using preceding and following sentences as a contextual input. The models were fine-tuned using consistent hyperparameters and early stopping, and all annotations were represented using BIO tagging at the token level. THe results show that DistilBERT achieved the highest macro F1 score of 0.841, followed by MiniLM (0.822), ElectraLM (0.772), and TinyBERT (0.700). In terms of inference time, the model with the lowest inference time was TinyBERT with 2.31 ms per sentence, followed by MiniLM (40.96 ms per sentence), ElectraLM (59.06 ms per sentence), and DistilBERT (95.56 ms per sentence). These results show that MiniLM is particularly well suited for real-time, deployable feedback systems, as it balances both performance and efficiency. Future work will explore utilizing these rhetorical classification models to develop a personalized feedback system for real-time use in classrooms

    Evaluating Pre-Industrial Simulations of the Hydrological Cycle Over Europe and North Africa Using Reanalysis Data and Miocene Proxy Spatial Sampling

    No full text
    Understanding how the Earth’s climate responds to elevated CO₂ is crucial for predicting future climate trends. The Middle Miocene (17-14.8 million years ago) saw CO₂ levels of ~400-600 ppm, similar to today, making it a valuable analog. However, Miocene palaeobotanical samples are limited and assessing how state-of-the-art climate models represent modern climate at each specific proxy site requires further investigation. To validate how the climate model CESM1.3 performs, we compared present-day reanalysis product (ERA5: 1979-2009) against Pre-Industrial (PI) simulations with differing spatial resolutions. Focusing on North Africa, Europe, and the Mediterranean, we identified the latitudes and longitudes of the Miocene proxy sites and matched them with the corresponding ERA5 grid points. At each location, we calculated the annual average surface temperature and precipitation for both ERA5 and high- and low-resolution PI simulations. We then calculated the total difference at each site (PI model value minus ERA5 observation). This allowed us to evaluate model bias at specific locations. We hypothesize that the high-resolution PI model will more accurately capture regional climate features, such as orographic precipitation and coastal climate effects, leading to reduced bias as sampled by the proxy site locations. As CO₂ levels and extreme weather events continue to rise, evaluating how well models simulate past and present climates is essential. This analysis of CESM PI simulations and Miocene proxy data can improve future climate predictions, particularly for vulnerable regions

    Optimizing the Simulation of Atmospheric Dust Concentrations and Investigating its Impact on the Asian Monsoon during the Miocene.

    No full text
    The Middle Miocene Climate Optimum (16.75-14.5 Ma) serves as a valuable comparison for future climate scenarios due to its elevated CO2 levels, but its climate state was also shaped by unique non-CO2 forcings, including dust aerosol concentrations. While dust accounts for considerable regional radiative forcing that mediates monsoon dynamics, its representation in paleoclimate models is poorly understood, creating uncertainty in climate sensitivity estimates and resulting hydroclimate responses. In this study, we focus on the Miocene dust emission parameterization in a climate model, the Community Earth System Model in version 1.2.2 (CESM1.2.2). We conducted sensitivity experiments to investigate how different parameterizations of soil erodibility affect dust concentrations and thereby impact Miocene hydroclimate. We analyzed two sensitivity experiments: one with topography-based soil erodibility (TopoS) and the other with uniform erodibility (UniformS). By comparing annually and seasonally averaged outputs, we quantified the effects of increased dust levels on the Asian Monsoon. Our results show that TopoS produces a higher dust burden on average over a region covering South Asia, the Middle East, and the western Tibetan Plateau. Regions with higher dust burden generally correspond to less precipitation, and regions with lower dust burden generally correspond to more precipitation.  Additionally, we observed a relationship between changes in dust burden and the surface energy balance, with cooler temperatures and reduced net radiation for TopoS compared to UniformS. These findings show that more realistic soil erodibility maps are essential for simulating dust's climatic impact, thus improving the accuracy of paleoclimate simulations

    Performance Evaluation of AI-Themed ETFs Versus Broad Market Benchmarks

    No full text
    In recent years, AI-powered robo-advisors have emerged as low-cost alternatives to traditional human financial advisors, utilizing algorithmic strategies to optimize portfolio allocation and manage risk. As these platforms gain traction, evaluating their effectiveness in achieving strong, risk-adjusted returns is critical. This study compares performance metrics across 12 ETFs, including diversified holdings typically used by robo-advisors (e.g., VTI, BND, AGG, IEFA, SCHB) and ETFs focused on the AI and technology sector (e.g., AIQ, ARKQ, ROBT, ARTY). Daily price data from January 2024 to July 2025 was analyzed using Google Sheets. We calculated daily returns, cumulative returns, annualized volatility, and Sharpe ratios. For example, ROBT produced a cumulative return of 1.63% with annualized volatility of 25.85%, while AGG had a 0.07% cumulative return with lower volatility at 5.10%. Risk-adjusted performance, measured by the Sharpe ratio (Rf = 0), ranged from -0.11 (SCHF) to +0.23 (ROBT), showing that while AI-themed ETFs may boost returns, they introduce higher risk. Meanwhile, bond-focused ETFs like AGG and SCHZ provided more stable performance with reduced downside risk. Our findings support the hypothesis that robo-advised portfolios, composed of diversified ETF baskets, can achieve comparable or slightly lower returns than actively managed portfolios, but with significantly lower volatility and cost. This reinforces the case for robo-advisors as effective and efficient tools for portfolio management, particularly for passive investors who prioritize stability and fees over human oversight

    The Impact of Information Availability on Trust Emergence in an Agent-Based Model of Stag Hunt

    No full text
    Trust is crucial in the success of cooperative behavior within multi-agent systems, particularly in situations requiring coordination under uncertainty. Stag Hunt is a coordination game that presents agents with a choice between cooperating for a riskier, high-reward option or defecting for a guaranteed but modest reward. While other models of the stag hunt include agents with mixed strategies, dynamic networks, and differing levels of information availability, this model in NetLogo attempts to incorporate all of these elements. In this simulation, agents repeatedly play Stag Hunt with others under three conditions: (1) Amnesia, where agents lack memory of past interactions; (2) Experience, where agents track the trustworthiness of previous partners; and (3) Common Knowledge, where all agents possess knowledge of every other agent's trustworthiness. In the latter two models, each agent's trustworthiness dynamically updates based on their cooperation or defection. Agents decide whether or not they want to play Stag Hunt with potential partners based on these reputation-weighted probabilities. This decision is distinct from the choice to cooperate or defect during the Stag Hunt. As information availability increases, unreliable agents perform worse: the bottom 40% of agents earn 28% of total energy in Amnesia, 14.6% in Experience, and just 1.1% in Common Knowledge. Their share of total interactions similarly drops from 45% to 3%. Meanwhile, the proportion of interactions resulting in successful stag hunts increases with more information, indicating higher coordination. These findings suggest that greater information availability punishes unreliability and promotes efficient cooperation, leading to better overall coordination

    Enhancing Flaky Test Detection Using iDFlakies and TuscanE

    No full text
    Flaky tests are software tests that lead to inconsistent results, i.e., tests that pass and fail on the same version code. Prior work found that order dependent tests (OD) are one of the most prominent categories of flaky tests. OD tests are tests whose outcomes depend on the order they are run, while non-order dependent tests (NOD) are tests whose outcomes do not depend on the order they are run. Two tools from prior work to detect flaky tests are iDFlakies and TuscanE. iDFlakies is a tool that detects flaky tests by rerunning tests in various orders, detecting whether they are flaky, and categorizes tests as OD or NOD. Previous work has shown that iDFlakies can detect some flaky tests by running tests in randomly generated test orders. TuscanE is a tool that systematically generates test orders, where once all generated orders are run, will also guarantee the detection of certain OD tests. To help improve the detection of OD tests in iDFlakies, we integrate the two tools together to enable one to detect OD tests more efficiently. Our evaluation of the combined tool that systematically generates test orders is able to detect OD tests faster than using iDFlakies with random test orders

    Detecting and Understanding Operating Systems Dependent Flaky Tests

    No full text
    Developers typically run tests after every code change. Operating System (OS) dependent flaky tests are tests that can non-deterministically pass and fail on the same version of code, depending on the operating system in which they are executed in. These flaky tests can mislead developers about their recent code changes and waste their time. To detect and understand these flaky tests in open source projects, developers and researchers have to compile and run tests on various operating systems in the cloud through tools, such as Github Actions, which can be costly in time and money. To help reduce the cost of detecting OS dependent flaky tests, stakeholders can compile and run potential OS dependent flaky tests locally as opposed to relying on cloud machines. By utilizing a new tool which compiles and runs tests locally, the process of repeatedly running potential OS flaky tests in different environments (e.g., different operating systems) can require drastically less time and resources. To achieve this reduction, we use the nektos/act tool, which simulates Github Actions locally, on different open-source projects, such as Apache Dubbo and Dropwizard. Limitations with the tool, such as its inability to consecutively run tests on different operating systems and its inconsistent output in some test scenarios, were discovered. These insights can help guide revisions to the tool so that it better meets users’ needs

    Analyzing the Correlation Between NDVI and Rainfall in Various Regions of Kenya

    No full text
    Kenya is a climatically diverse country in East Africa, known for its expansive grasslands and agricultural reliance. However, frequent and intensifying droughts pose serious challenges to food security, particularly in regions dependent on rain-fed agriculture. With vegetation growth closely tied to rainfall, prolonged droughts have devastating consequences and have led to 26% of the Kenyan population being food insecure. To better estimate and mitigate the effects of drought, it is essential to understand how rainfall variability influences vegetation cover across Kenya’s distinct regions. This study investigates that relationship by analyzing rainfall and vegetation data from four ecologically and climatically diverse counties: Garissa, Kitui, Narok, and Turkana. Focusing on these select counties allowed for a more in-depth examination of regional differences. For instance, Turkana experiences a subtropical steppe climate with an average temperature of 30°C, while Narok has a cooler marine west coast climate averaging 17°C. This study spans an eight-year period (2016–2024), enabling the identification of both seasonal patterns and long-term trends while minimizing the chance of confounding variables. Satellite remote sensing data products from Google Earth Engine were used, specifically MODIS NDVI data to represent vegetation greenness and CHIRPS data for rainfall estimates. The results revealed a strong correlation between rainfall and NDVI, with an average R² value of 0.53 across the counties studied. These findings suggest that regional climate conditions modulate the strength of rainfall-vegetation interactions and highlight the value of geospatial tools in monitoring drought impacts and informing adaptive responses

    Modeling the Impact of Air Quality on Animal and Human Health using SHAP and AI through the One Health Lens

    No full text
    Environmental pollutants have long been observed to negatively affect ecological and human health systems, affecting the biological mechanisms of organisms. Although poor air quality and its isolated impacts have been researched, a robust unified approach across ecosystems has not been developed. Highly Pathogenic Avian Influenza (HPAI) outbreaks in wild birds and regional disparities in U.S. cancer rates both exhibit spatial differences linked to air quality. However, existing models fail to consider human and wildlife health together as a whole, facing limitations because of the lack of explainable feature extraction, temporal misalignment, and class imbalance. After processing quantitative and geographic data on annual AQI, HPAI outbreaks, and cancer incidence rate, we developed a multi-model machine learning framework, using TabNet, ensemble, XGBoost, and Random Forest models to predict disease rates from air quality data points. To account for imbalance in training data, SMOTE/ADASYN oversampling, polynomial features, and log-transformations were implemented. SHAP added post-hoc explainability to the approach. Considering the delayed effects of pollution exposure, time-lagged features were engineered from the data. Our ensemble and TabNet classifiers were highly predictive for HPAI outbreaks (ROC AUC > 0.85), while SHAP-enhanced regression revealed the key features (e.g., PM2.5, NO2) driving cancer incidence patterns (R² ≈ 0.70). These results show air quality is a reliable predictor of ecological health risks for both humans and wildlife. This integrated approach offers a scalable, interpretable predictive model for environmental public health forecasting, supporting cross-species environmental policy planning

    Typing Aptitude and Error Pattern Optimization Through Keystroke Dynamics and Recurrent Neural Networks

    No full text
    Typing fluency plays a critical role in daily projects, communication, and procedures. In an increasingly digital world, people are continuously more reliant on typing to navigate day-to-day life, making typing speed and fluidity essential in the modern world. However, typing speeds are quite difficult to raise given the spacing of common characters on the traditional keyboard, and many disabilities present unresolved challenges to efficient keyboard use. To combat this issue, keystroke data was collected and analyzed using recurrent neural networks (RNNs) to identify individual typing proficiencies and deficiencies. To develop the model, sequential timing characteristics were extracted from a timestamped keystroke dataset, standardized, and converted into fixed-length input sequences to train a gated recurrent unit (GRU)-based RNN that can capture temporal relationships in typing behavior. Preliminary results indicate that based on time regularity and error density, the model successfully separates high- and low-proficiency important transitions. Early visualizations show clustered areas with high mistake rates and frequent hesitation, suggesting that the model could help identify the keyboard elements that slow users down the most and cause the most typing errors. These findings show how keystroke dynamics and RNNs can be combined to find significant patterns in each person's typing habits. This framework provides a potential solution to increase speed, accuracy, and comfort, particularly in consulting extreme typing deficiencies. Particularly, physical conditions such as cerebral palsy that make motor constraints that make using a regular keyboard laborious, slow, and prone to mistakes, are contrasted by specific regions of the keyboard where errors and delays are concentrated. A custom keyboard layout informed by data such as this repositions problematic keys to more accessible locations, both alleviating deficiencies and optimizing efficiency

    243

    full texts

    3,256

    metadata records
    Updated in last 30 days.
    Mason Journals (George Mason Univ.)
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇