Mason Journals (George Mason Univ.)
Not a member yet
3256 research outputs found
Sort by
Enhancing Wheeled Robot Autonomy in Off-Road Terrain through Data-Driven AI Models
Wheeled robots have incredible potential to improve disaster response, military operations, and even the exploration of extraterrestrial bodies, through their mobility, durability, and decreased direct human interaction. However, they are currently designed for controlled environments with easily traversable terrain. Most wheeled robots face a significant disadvantage in off-road environments, because they are unable to traverse parts of the terrain such as rough patches or slopes. This also affects the possibility of developing fully autonomous off-road vehicles as the chance of failure on the terrains is very likely. To address this limitation, we developed three small scale wheeled robots, called “Verti-Wheelers”, to gather data both about their environment and the manual controls used to navigate them. We drove these robots in a controlled off-road ‘arena’ to collect data, and gathered almost 100 GB of RGB (color) images, depth images, odometry, and manual throttle/steering movement from the controller. We intend on using this data to train AI models to reduce the need for direct human interaction and increase autonomous capabilities of the Verti-Wheelers. This work lays the foundation for developing more intelligent and adaptable wheeled robots capable of navigating challenging off-road environments with greater autonomy
A Comparative Analysis of Kriging and Machine Learning-based Spatial Interpolation Models for Chlorophyll-a Estimation in the Chesapeake Bay
Chlorophyll-a (chl-a) is a key indicator of water quality in coastal and estuarine ecosystems. In regions like the Chesapeake Bay, elevated chl-a levels often signal the presence of harmful algal blooms and biomass accumulation. However, satellite-based chl-a observations are frequently obscured by cloud cover, limiting their use in continuous coastal monitoring. Spatial interpolation models, which estimate values at unsampled locations based on existing data, offer one solution to resolve this scarcity. Existing research has examined the effectiveness of different interpolation models for the bay’s salinity and temperature; however, few studies have investigated the application of these models for chl-a. In this study, we evaluate the performance of three kriging-based models—universal kriging (UK), ordinary kriging (OK), and empirical bayesian kriging (EBK)—and three machine learning-based models—k-Nearest Neighbor (KNN), Extra Trees (ET), and XGBoost—for interpolating chl-a concentrations. Using over 5 million remotely sensed observations across nine days in early 2025, we find that EBK exhibits the best performance among kriging-based models, while ET outperforms all other machine learning models. Our results also demonstrate that while kriging-based models outperform machine learning models in data-rich conditions, machine learning models are more adaptable and accurate for data-sparse conditions. Since data availability varies significantly from day-to-day based on cloud cover, these findings suggest that no one model is universally optimal. Integrating both approaches may offer a hybrid framework for improving the continuity and reliability of chl-a monitoring in coastal regions
Invisible Threats: Uncovering UI Security Vulnerabilities in Augmented Reality Platforms
Augmented Reality (AR) experiences place users within a user interface that allow for interactions with three-dimensional virtual content. Extensive research exists for 2D User Interface (UI) security, however the introduction of AR platforms introduces new security conflicts, particularly regarding how virtual content is handled and user interactions are managed. By utilizing existing knowledge of AR properties identified in prior work such as Same Space, Invisibility, and Synthetic Output, these potential vulnerabilities were investigated to address UI security in AR platforms. Such vulnerability experiments were tested on two leading AR platforms, the ARKit (Apple) and Oculus (Meta). In my research, ARKit vulnerabilities were tested with the use of an iPhone 12 and M2 Macbook Air, while Oculus vulnerabilities were tested with the Unity game engine on a Meta Quest 3S. To test each vulnerability, two independent components were tested as one app in addition to a third library which simulated multiple distinct entities interacting and potentially interfering with the user’s perception and input. It was found that Apple’s ARKit was susceptible to clickjacking attacks where two virtual objects (Cube1 & Cube2) are placed in the same coordinates and a hidden object would secretly receive the input. It was also found in both ARKit and Oculus, objects that were entirely transparent could still receive inputs from the user. Further, both ARKit and Oculus allowed fake, invisible user inputs generated by the computer to control virtual objects with no way to verify the validity of the inputs. The findings are ultimately problematic as they demonstrate how malicious AR applications could manipulate user interactions and perceptions in the background which could lead to unintended actions or a compromised user experience without the user’s awareness
Identifying Environment-Based Risk Factors for Crashes in Fairfax County
When considering the construction and operation of roads, one of the most immediate factors taken into consideration is collision mitigation. It is the hope that by documenting and analyzing these crashes, measures can be taken to prevent more in the future. However, in Fairfax County alone, there have been close to 100,000 recorded crashes since 2017, with the only decrease in crash frequency being attributed to the Coronavirus pandemic in 2019. Excluding said period between 2019 and 2020, the total number of crashes has increased by an average of 3.6 percent per annum. The traditional method for addressing a high frequency of crashes is associating multiple individual crashes to a singular fault (i.e. a faulty traffic light, unpaved road, etc.). While effective, this approach fails to address crashes that occur in less frequent regions, but can still be attributed to a similar cause. Taking data from the Virginia DoT, ArcGIS Pro and Stata were used to compare the relationship between three data points: the road, nearby buildings, and the crash itself. Using combinations of the variables available, multiple correlation algorithms were run in an attempt to identify a common group of characteristics leading to a crash. For instance, in a correlation analysis comparing light conditions and whether a car was speeding, it was found that crashes involving speeding cars were 7.99 percent more common during evening hours where a light source was present than during daylight hours. Currently, the complete compilation of data for this study is ongoing; however, this study’s successful completion could provide more specific research results from findings like the example above. Eventually, we hope this research can contribute to designing safer roads and mitigate problematic road sections
Michelle Gordon, Extreme Violence and the ‘British Way’: Colonial Warfare in Perak, Sierra Leone, and Sudan
Aaron Shatzman, The Old World, the New World and the Creation of the Modern World, 1400–1650: An Interpretative History
Determining the Optimal LLM for a Hybrid-Model, Conversational Feedback Tool in Educational Dialogue Systems
With the growing demand for accessible, personalized education, Intelligent Tutoring Systems (ITSs) have sought to address individual learners’ needs through adaptive feedback. However, evaluating and generating effective, conversational feedback for free response questions remains a major challenge. Current specialized, individual models have achieved great successes in their respective aims (e.g. response grading, student confidence analysis, conversational responses), yet new ITSs struggle to capitalize on these gains, instead relying on outdated all-in-one systems. This research aims to study a hybrid-model approach, wherein multiple modern, specialized models are leveraged to analyze an open-ended answer, adapt to the student, and deliver conversational feedback effectively. GPT-4o, Gemini 2.0 Flash, DeepSeek R1, TinyLlama, and Claude Sonnet 4 received 10 manually-curated prompts with different school subjects (STEM, Humanities, and the Arts), problems, and student responses, corresponding to the future hybrid-model design. Each LLM’s response is graded independently by two researchers based on a rubric with five categories (Conversationality, Relevance, Factual Accuracy, Ease of Understanding, and Helpfulness) of five points each. Claude Sonnet 4 produced the best results with an average response score of 23.325 (SD=1.700, t-test: pvalueGPT=0.0135, pvalueGemini= 0.0005 ). GPT-4o (scored 22.275) and Gemini 2.0 Flash (scored 21.575) were second with similar performance, while DeepSeek R1 (scored 19.400) and TinyLlama (scored 11.700) ran into significantly lower scores. Claude produces extremely digestible, insightful feedback and hints while still allowing the student to learn firsthand and develop the final answer, making it the most effective LLM to use. Going forward, Claude will be the center of the hybrid-model ITS, with non-LLM models contributing information such as response grades and student confidence. 
Comparative Evaluation of AI Assistants for SQL Education through Prompt Engineering Techniques
As large language models become increasingly integrated into education, their role in supporting students’ understanding of structured query language has gained importance. However, the effectiveness of these AI assistants depends heavily on prompt design, particularly in domains like database instruction where precision and structure are essential. This project explores how prompt engineering influences the ability of LLMs to generate accurate, pedagogically valuable SQL responses. To select which models to evaluate, an initial literature review was conducted using multiple benchmark studies comparing LLM performance on SQL-related tasks. Based on metrics such as Execution Accuracy, Exact Match, F1 Score, and Response Quality Score, GPT-4 and Gemini 2.5 were consistently identified as top performing models across independent evaluations. These findings guided their selection for experimental testing in this study.
The experiment tested six prompt templates of varying complexity, using a standardized set of SQL tasks and consistent database schemas. Two researchers independently scored each model’s outputs using a six-criteria rubric: correctness, schema understanding, query logic, pedagogical clarity, assumption transparency, and reproducibility. Scores ranged from 0–5 per criterion, and final scores were averaged across all tasks.GPT-4 achieved a composite score of 29/30, demonstrating consistently high accuracy, clarity, and reusable query patterns. Gemini 2.5 scored 28/30, closely matching GPT-4 but occasionally producing more complex outputs that could pose challenges for novice learners. Both models performed reliably across prompt formats, though small differences emerged in clarity and formatting consistency.
While this study did not evaluate performance in the absence of prompt engineering, this is an area of interest for future research. Both GPT-4 and Gemini 2.5 consistently performed well across all tested prompt structures, reflecting their status as leading AI assistants with very high benchmark scores. The results highlight the importance of prompt clarity and specificity in guiding AI responses, but ultimately, the two models showed only minor differences in output quality. Their scores indicate that either model can effectively support SQL learning, with no significant performance gap between them. In the future, we would like to expand the research by scoring a broader range of AI assistants to gain a more comprehensive understanding of model performance in SQL education. Additionally, involving more researchers in the scoring process would help increase the accuracy and reliability of the evaluation system