Treasures @ UT Dallas
Not a member yet
7697 research outputs found
Sort by
Error Category Recognition in Procedural Videos With Vision-language Models
This study evaluates vision-language models (VLMs) for error category recognition in procedural videos. We use the CaptainCook4D (Peddi et al., 2023), an egocentric 4D dataset,
to enhance AI systems’ understanding of procedural learning and error recognition in instructional videos. The comprehensive dataset, consisting of 384 long-form egocentric videos
recorded in multiple real-world kitchen settings by various participants, is a testament to
our meticulous data collection process. Unique in its inclusion of both error-free and error-
prone videos, the dataset provides a robust resource for evaluating models on zero-shot error
activity recognition. The data was collected using HoloLens2 and GoPro Hero 11 devices,
capturing various sensory inputs, including depth and RGB video, hand and head tracking,
and IMU data. Each recipe is represented as a task graph, detailing step-wise instructions
and dependencies, aiding in the evaluation of AI models’ comprehension capabilities.
Two Vision-Language Models (VLMs), Video-LLaVA (Lin et al., 2023) and TimeChat (Ren
et al., 2024), were employed to assess the dataset’s utility in zero-shot error recognition tasks
(Khattak et al., 2024). Video-LLaVA (Lin et al., 2023), an extension of the LLaVA model,
integrates video processing for enhanced inference, while TimeChat (Ren et al., 2024) focuses
on long video understanding through a time-sensitive multi-modal framework. Experiments
involved prompt engineering using task graphs and a prompt-and-predict paradigm for error
recognition, showcasing the models’ abilities to detect procedural mistakes for zero-shot
evaluation. Evaluation metrics included precision, recall, and F1 scores, highlighting the
models’ performance in classifying errors accurately. The study underscores the models’
potential to detect complex errors in procedural activities in egocentric videos
Selecting the Number of Components for Two-table Multivariate Methods
Research in cognition and neuroscience often involves analyzing relationships between two
sets of variables collected on the same set of observations. These “two-table” relationships
are commonly analyzed using three related component-based methods—partial least squares
correlation (PLSC), canonical correlation analysis (CCA), and redundancy analysis (RDA).
However, selecting the appropriate number of components to retain in these methods remains
a challenge. Several stopping rules—rules that determine the number of components to
keep—have been developed for these two-table methods, but their performances have not
been thoroughly evaluated. Further, many stopping rules have only been applied to one of
the two-table methods despite their relevance for all three methods, and there has been little
exploration into modifications that might improve the performance of these stopping rules.
Additionally, many rules do not have easily accessible software implementations.
To address these gaps, this dissertation evaluated four existing stopping rules and several
new modifications to these rules by using simulated data with a known number of true
components to estimate the Type I error rates and the power of the stopping rules. Out of
34 variations of these rules, four or five best rules were identified for each two-table method.
The Type I error and power of these best rules were further examined in terms of various
characteristics of the data, including the number of observations, variables, true components,
and the strength of the relationships between the tables, in order to identify one or two rules
with superior performance that are recommended for future use. Additionally, the most
popular stopping rule—a permutation test using the singular values as test statistics—is not
supported by this study because it showed high Type I error across the simulated data.
As an illustrative example, a PLSC analysis was included for a real dataset (a subset of the
publicly available LEMON dataset). This analysis explored relationships between participants’ cognitive performance and physiological measurements on two components selected
by several of the best stopping rules.
To facilitate future applications, an R package called componentts was developed. The
package implements the stopping rules and data simulation so that researchers can use and
test the stopping rules with additional simulated data beyond the data in this study, or test
new stopping rules and easily compare their results to the stopping rules evaluated here
Shipboard Power Systems Load Monitoring and Energy Management for Resilience Enhancement
With the increasing electrification and complexity of next-generation shipboard power systems
(SPS), ensuring real-time decision-making for operational resilience and fault detection has
become critical. This thesis presents a comprehensive framework that integrates non-intrusive
load monitoring (NILM), fault detection, and autonomous reconfiguration of SPS using
advanced machine learning techniques.
The NILM system employs a discrete wavelet transform for signal processing and a convo-
lutional neural network (CNN) for real-time load status monitoring in a four-zone medium
voltage direct current (MVDC) SPS. The model achieves over 98% accuracy in monitor-
ing, including the identification of pulsed loads, and maintains functionality under extreme
conditions such as cyber/physical attacks and noisy inputs.
Furthermore, a Wavelet Graph Neural Network (WGNN) is introduced for non-intrusive fault
detection, demonstrating accuracies above 99% for intrusive faults and 97% for non-intrusive
scenarios. The WGNN model’s robustness to pulse loads and noise is validated through
hardware-in-the-loop testing, ensuring high fidelity and low latency in real-world applications.
Additionally, the framework incorporates an autonomous reconfiguration system using graph-
based reinforcement learning (RL), which models the SPS reconfiguration as a Markov
decision process. A graph convolutional network (GCN) is employed within the RL policy
network to optimize the switching control policies, ensuring maximum power availability to
loads during faults.
The proposed approach effectively enhances the operational resilience and autonomy of
shipboard power networks, ensuring real-time performance in both normal and emergency
conditions
The Disordered Spirit: A Portrait of Francisco Amighetti as Seen by Laura Goldstein
“The Disordered Spirit: Francisco Amighetti as seen by Laura Goldstein” addresses the paradox
of the fragment and the whole inherent in the ideas of “essence” or “truth” through the work of
Costa Rican artist and poet Francisco Amighetti, and offers some alternative perspectives to the
anxiety of loss in literary translation. This investigation emphasizes the overuse of discussions on
loss in translation and argues for the value of the fragment, particularly in Amighetti’s work,
which falls in the Modernist period when poets and artists turned to the fragment through style,
technique, and a confrontation with the past, but also more broadly to argue that the fragment has
value through the choices of creative processes, language, and expression, the multiplicity of
subjective experiences, order and disorder, and through the nature of memory and our universe.
The dissertation also analyzes the creative work of Romanian-Brazilian writer Ștefan Baciu, who
wrote poems responding to fragments of Amighetti’s poems, letters, and prose, and finally
includes creative work by the author of the dissertation in the form of original poetry, poetry in
translation, visual art (prints) and memoir, proposing that a translator can reveal the multiplicity
of subjective experiences through the inclusion of their original creative work, especially when
the translated poet is excluded from the canon as Francisco Amighetti and other Costa Rican
poets have been
Hexaarylbenzene-based Monomers for Porous Polymers
Porous polymers are a type of porous material in which monomers form 2D or 3D polymers
containing angstrom- to nanometer-scale pores. Key to formation of porous polymers is the use of
monomers with specific symmetries such that they connect in a repeating 2D or 3D pattern
containing pores. The three main types of crystalline porous polymer are covalent organic
frameworks (COFs), metal-organic frameworks (MOFs), and hydrogen-bonded organic
frameworks (HOFs). All three have shown promising applications resulting from their porosity,
including gas storage, gas separation, use in organic electronics, catalysis, water harvesting, and
removal of environment pollutants. However, large-scale application is limited by scalability of
synthesis, processability, and expense of synthesizing monomers. To address limitations of
scalability and expense of monomers, one goal is to synthesize monomers that start from cheap
building blocks and to use synthetic methods allowing larger scale purification, such as
recrystallization instead of column chromatography. Limitations of processability can be addressed
with HOFs, which are highly reprocessable due to the reversibility of hydrogen bonds. Another
way to improve processability is by investigating covalent organic macrocycles and metal-organic
macrocycles, which possess many porosity properties of COFs and HOFs with the advantage of
easier processability due to solubility of macrocycles in organic solvents.
Chapter 1 of this thesis is a literature review of the three most common types of crystalline porous
polymers: covalent organic frameworks (COFs), metal-organic frameworks (MOFs), and
hydrogen-bonded organic frameworks (HOFs). Then metal-organic macrocycles and covalent
organic macrocycles are introduced, which form materials analogous to MOFs and COFs, with
permanent porosity properties such as gas sorption, but with the added advantage of solution
processability.
Chapter 2 introduces various organic reactions used in the synthesis of monomers for porous
polymers. Since porous polymers can be seen as extensions of the trend from atoms, to molecules,
to porous polymers, extensive knowledge of organic chemistry is needed to assemble atoms into
molecules before advancing to assembling those molecules into porous polymers.
Chapter 3 then brings those organic reactions together to introduce synthetic pathways towards
hexaarylbenzene- and hexa-peri-hexabenzocoronene-based monomers for porous polymers. Our
group has been interested in these types of monomers for their cheap, scalable synthesis, versatility
in adjusting appending functional groups, and excellent CO2 sorption when used in porous
polymers. Chapter 3 also introduces our work into pathways for synthesizing hexaarylbenzene-
based monomers, using sequential Suzuki reactions to a central mixed halobenzene.
Chapter 4 gives synthetic procedures I have improved upon and developed for synthesis of HPB-
1,2-2A, and reports a method I found to grow single crystals of carboxylic acid-functionalized
hexaarylbenzenes by vapor phase diffusion
Fabrication and Characterization of Metallic Nanowires Integrated Microfluidic Channel
The rapid advancements in nano-manufacturing have accelerated the miniaturization of devices,
giving rise to microfluidics — the science of manipulating fluids within channels at sub-millimeter
scales. While early microfluidic devices focused on lab-on-chip systems, they were quickly
implemented in wide range of applications such as sensors, drug delivery, organ-on-chip systems
and microreactors. The integration of nanostructures, particularly nanowires, within microfluidic
channels has further improved their functionality by leveraging increased surface area to optimize
processes such as cell sorting, microfluidic mixing and thermal management. However, traditional
fabrication techniques for nanowires such as Chemical Vapor Deposition and Vapor-Liquid-Solid
growth are complex and involve repetitive steps, which limits their broader adoption.
In this study, we present a novel fabrication technique for nanowires using a platinum (Pt) based
metallic glass (Pt57.5Cu14.7Ni5.3P22.5) via thermoplastic drawing and integrating them into a
microfluidic channel. Metallic glasses are amorphous alloys, known for their fluid-like behavior
when heated above their glass transition temperature. A custom built thermo-mechanical setup was
utilized to produce metallic nanowires from the cavities on a silicon substrate (mold). To
selectively place the metallic nanowires on the mold, two distinctive approaches were employed.
In the first approach, metal masks made from aluminum foil and laser cut stencils, were used to
pattern nanowires in selective regions. In the second approach, maskless lithography was
employed to create molds with cavities only in desired areas. The metallic nanowire-patterned
mold was then bonded with a Polydimethylsiloxane (PDMS) block to form a sealed microfluidic
channel.
Flow characterization was conducted to understand the behavior of metallic nanowires on the fluid
flow within the channel. Our findings revealed that long nanowires offered higher resistance at
increased flowrates, while shorter cone-shaped nanowires offered greater resistance at lower
flowrates. These results emphasize that nanowire length and morphology play a key role in
determining the flow behavior. In addition to the flow studies, the potential for decorating these
metallic nanowires with other materials such as carbon nanotubes and gold-palladium
nanoparticles were explored using spin coating and deposition techniques. This opens new
possibilities for functionalization of nanowires without the need for sophisticated fabrication
processes. To address scalability, new fixtures were designed to extend the usable mold area,
enabling the fabrication of metallic nanowires integrated microfluidic channels for large-scale
applications
Toward Generalizable Models for Medical Imaging: Leveraging Topological Data Analysis and Deep Learning
The rapid advancement in designing machine learning and artificial intelligence software
for biomedical research necessitates the development of models that are not only accurate
but also generalizable across various data sets. This dissertation addresses the challenge of
creating such generalizable models by leveraging topological data analysis (TDA) and deep
learning techniques. We propose a novel framework that integrates TDA with deep learning
to enhance the interpretability and performance of biomedical image analysis models.
Traditional histopathological image analysis is labor-intensive and subjective, leading to
variability in diagnoses. To overcome these challenges, we develop a methodology and software
that utilize Persistent Homology (PH) from TDA to capture topological features of digital
biomedical images, providing a detailed representation of tissue morphology that conventional
approaches might overlook. The topological features are used with deep learning models to
improve classification accuracy and robustness.
In addition to PH, we employ Transfer Learning from pre-trained models, adapting them
to specific biomedical imaging tasks. This approach addresses the scarcity of labeled data
in medical imaging, enhancing the performance and efficiency of our models. Furthermore,
continuous learning techniques enable our models to adapt to new data while retaining
previously learned information, ensuring long-term relevance and effectiveness.
The experimental results demonstrate significant classification accuracy and interpretability
improvements across multiple biomedical imaging tasks, including histopathology and medical
image segmentation. Specifically, our models achieved a classification accuracy of 94.87% in
histopathological image analysis, outperforming traditional methods by a substantial margin.
Notably, in classifying breast cancer and prostate cancer histopathological images, our models
achieved accuracies of 95.3% and 93.8%, respectively, using relatively small models that
leverage PH. The integration of PH and Transfer Learning proved remarkably effective, with
models trained using these techniques achieving a 15% increase in accuracy over baseline
models.
This research contributes to the field by offering a novel methodology for integrating topological
features into deep learning, paving the way for more effective and versatile biomedical imaging
solutions. The developed models provide a scalable and adaptable approach to biomedical
image analysis, with potential applications in cancer diagnosis and treatment planning. By
enhancing the accuracy, efficiency, and scalability of diagnostic tools, this dissertation aims
to improve patient outcomes and advance biomedical research
Evaluating the Performance of Multi-Hop Wireless Networks Employing Collision-Free Binary Countdown MAC
This thesis presents the development of a collision-free binary countdown MAC protocol
for multi-hop wireless networks designed to ensure reliable communication while making
stochastic performance guarantees. Our performance analysis of this protocol reveals that
the performance of individual wireless nodes can only be meaningfully tuned by modifying
the network topology. To address this challenge, we develop two modified versions of our
binary countdown MAC protocol: node-weighted and flow-weighted binary countdown. In
node-weighted binary countdown, it is possible to weigh the transmission probability of
individual wireless nodes. In flow-weighted binary countdown, it is possible to weigh the
transmission probability of individual flows at individual wireless nodes. These modifications
provide flexible methods for tuning the performance of individual wireless nodes and flows in
multi-hop networks employing our collision-free binary countdown MAC protocol
Depression and Associative Recognition Memory the Effects of Depression Symptoms
Depression is a leading cause of disability globally, impacting millions from adolescence
through adulthood. Major Depressive Disorder (MDD), characterized by persistent sadness and
anhedonia, also commonly presents with cognitive impairments, notably difficulties in memory
and concentration. While traditional recall-based tasks reveal memory deficits in groups of
depressed individuals, they frequently fail to capture the effects of individual differences in
memory deficits due to depression severity. However, when recognition tasks are sensitive
enough to detect memory impairments in depressed participants, their performance on these
tasks remains high enough to reveal inter-individual differences. These recognition-based tasks
can capture individual differences by detecting when lower effort/automatic cognitive strategies,
such as a familiarity based one, are being used to complete the task. We hypothesized that low
effort, familiarity-based strategies are utilized more by those who are experiencing higher levels
of depression symptoms. The current study used an associative recognition task, to better
characterize memory impairments in individuals with depression symptoms by assessing
differences in performance based on depression symptom severity. Using an associative memory
task, specifically a face-name recognition task, we were able to evaluate both associative and
simple recognition performance. We hypothesized that participants with greater depression
severity would rely more on inefficient familiarity-based strategies, leading to impaired
performance on associative tasks (require greater effort/more efficient strategies) but preserved
simple recognition abilities (low effort/omnificent strategies are sufficient).
Results showed that individuals with severe and moderate depression exhibited deficits in
associative recognition. However, their associative memory performance did not significantly
differ from the non-depressed group, indicating that differences in depression symptom severity
were not detected by performance on the associative recognition task. In contrast, participants’
simple recognition performance was higher than associative recognition performance for all
three groups, indicating that simple recognition was not impaired by depression symptom
severity as we predicted. We were also able to replicate previous research on depression and
comorbid anxiety, depression and decreased attentional control, and depression and self-reported
episodic memory impairments. These findings underscore the potential value of associative
recognition tasks in identifying cognitive deficits linked to depression severity and highlight the
intricate relationship between mood disorders and memory functioning
Making Active Learning Work in the Real World
Over the last decade, the advent of deep learning methods have achieved remarkable feats
within the space of machine learning. Modern deep learning methods, however, require the
use of immense quantities of data for training, which presents an immediate data acquisition
cost. Of particular interest is active learning – a label-efficient paradigm – as a large barrier of
entry for data acquisition is data annotation, which requires a tremendous amount of human
effort. In active learning, one seeks to select a budget-constrained number of worthwhile
unlabeled instances from a large source of unlabeled data that, when annotated, produces
the largest gain in some performance metric after subsequent supervised training. To date,
deep active learning methods have made tremendous progress in reducing annotation costs;
however, an extremely large collection of active learning methods have only been proven on
standard academic datasets, which tend to be relatively simplistic. Indeed, there are nuances
in real-world data and real-world labeling that change how effective certain active learning
strategies are. Hence, there is a need to better understand facets of how to apply active
learning for realistic scenarios.
In this work, we seek to develop label-efficient labeling strategies for real-world complications
in the data environment. We first perform an evaluation of deep active learning on image
classification tasks to elicit realistic facets that affect existing active learning methods.
Motivated by the surprising finding that state-of-the-art active learning methods tend to
no longer dominate long-lasting and very simple active learning methods when applying a
number of common training techniques, we turn our focus towards methods that mitigate
various complications in real-world data such as rare classes, data redundancy, streaming
environments, and so forth, frequently utilizing the recently proposed submodular information
measures. To better understand when utilizing submodular information measures will be
effective, we derive bounds on multiple selection characteristics of submodular information
measures to theoretically validate the use of submodular information measures as mechanisms
for performing active learning selection. With these theoretical connections, we then utilize
them to handle rare classes, data redundancy, out-of-distribution classes, and non-i.i.d.
streaming environments. Additionally, the class of submodular mutual information functions
provides useful weak labeling capabilities, serving as a powerful component in cold-start
settings and in real-world labeling pipelines wherein the cost of labeling can be reduced with
weak label suggestions. We conclude with an active learning toolkit that can and has been
used in real-world active learning pipelines