34296 research outputs found
Sort by
6D Rigid Object Pose Estimation Using Deep Learning
6D object pose estimation is the task of determining an object’s 3D rotation and translation with respect to a camera, and plays a critical role in applications such as robotic manipulation, autonomous navigation, and augmented reality. While recent advances in deep learning have substantially improved performance, many existing methods still face limitations in learning robust and generalizable representations. Factors such as variations in object appearance, occlusion, sensor noise, and domain shifts can degrade model accuracy, highlighting the need for more effective representation learning strategies that capture rich geometric and semantic cues for reliable pose estimation across diverse conditions.
This dissertation investigates 6D pose estimation from geometric information, emphasizing the development of robust and generalizable representation learning techniques to address both instance-level and category-level settings. Instance-level pose estimation refers to the task of predicting the 6D pose of specific, known objects for which exact 3D models are available during both training and testing. For instance-level pose estimation, this dissertation presents a depth-only fusion framework that converts depth images into normal vector angle maps to explicitly embed geometric cues, and combines them with point cloud features for accurate 3D keypoint localization and semantic segmentation. This approach achieves state-of-the-art performance on the LineMod and Occlusion-LineMod datasets, and delivers competitive results on YCB-Video without post-processing.
Category-level pose estimation, on the other hand, aims to estimate 6D poses for previously unseen object instances that belong to a predefined category. For category-level pose estimation, this dissertation introduces a contrastive learning framework that learns pose-aware point cloud representations while preserving the intrinsic continuity of 6D poses. Specifically, we present two frameworks: the first is a two-phase one that combines pose-aware and geometry-aware representations to estimate target object poses, and the second is an end-to-end hierarchical ranking contrastive learning architecture, eliminating the need for a separate geometric encoder and enhancing the pose estimation modules. The resulting model achieves state-of-the-art accuracy among depth-only methods on the REAL275 and CAMERA25 datasets, while maintaining real-time inference speed.
In addition, we conduct an exploratory study on applying diffusion-based generative modeling to category-level pose estimation. The method generates canonical partial-view point clouds from observed depth-based point clouds before estimating poses via the Umeyama algorithm. While preliminary results reveal limitations in generation fidelity and pose consistency, the study highlights key challenges and opportunities for integrating generative models into pose estimation pipelines.
Overall, this dissertation contributes novel geometric representation learning frameworks for both instance-level and category-level 6D pose estimation, supported by extensive experiments on widely used benchmarks. The findings not only advance geometric-based pose estimation methods but also open pathways toward unified, generative–discriminative approaches for robust object pose estimation in real-world environments. Such capabilities are critical for enabling reliable robotic manipulation in cluttered or unstructured settings, enhancing perception for autonomous navigation in dynamic scenes, and improving interaction in augmented and mixed reality systems. By bridging fundamental representation learning with practical deployment, this work moves closer to making 6D pose estimation an integral component of real-world intelligent systems
Hammer of Eden: Visualization Software and Cultures of Executive Decision-making
Inventor and engineer Andy Hildebrand created the Landmark Graphics software workstation for processing seismic data used in oil exploration and the pitch-correcting software AutoTune used to adjust the notes of a vocalist both in live performing scenarios and post-production. Using these software as the case study, research was conducted via the making of a non-fiction film, titled Hammer of Eden, to ask how software designed to visualize unseen realties on a screen affects the perception and decision-making of the professional class which uses these types of software.
This paper theorizes concepts present in Hammer of Eden including how visualization software interacts with perception on the planes of politics, aesthetics, and labor relations
New Fast Polynomial Root-Finders
Univariate polynomial root-finding has been studied for four millennia and very intensively in the last decades. Our {\em black box root-finder} involves no coefficients and works for a black box polynomial, defined by an oracle (that is, black box subroutine) for its evaluation. Such root-finders have various benefits, e.g., are particularly efficient where a polynomial can be evaluated fast, say, is a sum of a small number of shifted monomials (x-c)^a.
Our root-finder approximates all d complex zeros of a dth degree polynomial p(x) (aka roots of equation p(x)=0) by using Las Vegas expected number of bit-operations within a factor of b of the theoretical lower bound. When extended with disc compression techniques to get an acceleration by a factor of b/\log(b), the algorithm achieves near-optimal complexity. That is, the resulting root-finder is expected to run almost as fast as one accesses the coefficients with a precision required for the solution within a prescribed error bound.
We also introduce an algorithm for DLG root-squaring that circumvents the classical numerical instability of the recursive iterations. By applying root-squaring to Newton\u27s Inverse Ratios rather than coefficients, we enable the fast estimation of extremal root radii for black box polynomials. This technique provides a highly efficient exclusion test for determining whether a fixed disc on the complex plane contains any roots.
Historically the only known near-optimal polynomial root-finder was presented by V. Pan at ACM STOC 1995. It is quite involved and has never been implemented, while already in its initial implementation our new root-finder competed with user\u27s choice package of root-finding subroutines MPSolve, according to extensive numerical experiments with standard test polynomials. Furthermore we readily extend our black box root-finder to approximation of the eigenvalues of a matrix in record expected bit operation time, which is an extension not supported by the root-finder of STOC 1995. More recently, the framework for reaching the theoretical lower bound for polynomial root-finding was further advanced by V. Pan at SODA 2024, providing the basis for the near-optimal extension of our work via disc compression
Development of Novel Anti-Cancer Colchicine Analogs: Synthesis and Configurational Studies
Colchicine, a naturally occurring alkaloid, has long been known as a potent therapeutic agent. Despite its efficacy treating gout and other inflammatory disorders, its severe toxicity, limited selectivity, and poor pharmacological profile have restricted its broader clinical application. Reported advantages of colchicine-site ligands as vascular-disrupting agents, including reduced susceptibility to multidrug resistance, have revived interest in the colchicine site on tubulin as a validated therapeutic target for cancer. This dissertation focuses on the development of new colchicine analogs, detailing the synthetic method to functionalized AC-ring derivatives, as well as the evaluation of their configurational stability and biological activity.
In Chapter 1, we provide a literature review that frames atropisomerism within a broader medicinal chemistry context, highlighting classical approaches, synthetic strategies, emerging directions and potential for troponoids as atropisomeric frameworks due to their electronic versatility and capacity for restricted bond rotation.
Chapter 2 describes our synthetic studies on α-methoxytropones. Building upon 3-hydroxy-4-pyrone-based oxidopyrylium [5+2] cycloaddition chemistry, we developed a samarium(II) iodide-mediated reductive ring-opening that efficiently converts oxidopyrylium cycloadducts into methoxytropones. This method provides a modular entry to a family of functionalized tropone scaffolds that had previously been difficult to access. By systematically varying substitution patterns, we demonstrate how methoxytropones can be tuned for stereochemical stability, positioning them as versatile platforms for atropisomerism.
In Chapter 3, we advance these findings to the synthesis of colchicine AC-ring analogs. Guided by computational dihedral-energy profiling and experimental analyses, we identified analogs that exhibit rotational barriers greater than 30 kcal/mol, consistent with class 3 atropisomerism. Herein, we describe the discovery of a stable colchicine-based atropisomer, marking a De Novo synthesis of a new class 3 colchicine atropisomer. Biological evaluation confirmed that one atropisomer retained tubulin inhibitory activity, while its enantiomer was markedly less active. DFT calculations predicted racemization barriers, while chiral HPLC and VCD provided resolution and experimental confirmation of configurational assignments. We also detail the subsequent structural modification strategies toward improving anti-cancer activity. Overall, the study demonstrates the value of integrating predictive computation with experimental validation
No Tech for Apartheid Memory: Oral Histories of Google Worker Organizing
The NOTA Google NYC Digital Memory Archive is a digital memory project centering the worker narrative in tech labor struggles in the wake of the ongoing genocide of the Palestinian people. Rooted in my dual experiences as a Google Software Engineer and a No Tech for Apartheid organizer, this capstone project, initially created in 2024, seeks to combat the silencing around the nature of labor organizing through expanding upon the existing archive to incorporate oral history testimonies of NOTA organizers (notechforapartheidmemory.com/interviews). After conducting 4 oral history interviews with former Google workers currently based in NYC, the videos were edited and captioned using Final Cut Pro, compressed using HandBrake, uploaded to a private DigitalOcean Spaces bucket, and linked in newly added HTML sub-pages to the archive site. Given the sensitivity of the interviews I was entrusted with, I took precautions to ensure that the tools I used for this capstone project did not compromise the ownership and confidentiality of the data. This project explores the stewardship of oral history, production of long-form resistance media, and development of technological infrastructures of labor cyber-resistance through digital memory archives. Throughout this white paper, I discuss the challenges, successes, and learnings from undergoing the tedious, yet rewarding, editing and captioning process manually, while inviting varied labor perspectives in testimonies from other NOTA Google NYC organizers.
Although NOTA has a presence in Google and Amazon offices throughout the country and internationally, the scope of this project is limited to NOTA Google organizers currently based in NYC, given the pre-existing nature of my relationships with workers here. It is important to note that the experiences of workers in NYC, while being representative of NOTA at-large, do not cover all possible experiences. Nonetheless, the testimonies yield much-needed insights into organizing labor resistance within the tech industry during a time in which technofascism takes new oppressive forms
Postscript: A Digital Inquiry Into Inherited Memory
This Capstone project explores the intimate mechanics of family memory through digital interfaces and media, centering on a set of more than fifty letters my late grandmother wrote in 1947 during a four-month journey across Europe. Through a series of digital interventions — including mapping, text analysis, and interactive storytelling — Postscript captures the process of my own memory formation as I connect to my grandmother\u27s legacy. The project specifically engages with digital technologies\u27 unique affordances (like interactivity and nonlinearity) to interrogate how digital methods of analysis and expression can develop, as well as distort, modes of relationality and identity. Moving viewers through a three-part narrative arc from distant to close reading methods, the project demonstrates that digital tools not only represent memory but also perform, enact, and transform it. Postscript contributes to the digital humanities by offering a new formal approach to memory-based storytelling that treats the technical apparatus itself as part of the memory work. It demonstrates that the intimate scale of family memory, approached with computational creativity, can reveal insights about memory\u27s operations that larger-scale collective memory projects might miss
A Low-Stakes Assignment that Introduces AI Technology into the Analysis and Writing Process
In this ENG 225 assignment, students use AI (ChatGPT) to perform tasks related to summarizing, paraphrasing, quoting, and responding to a text, then compare the AI-generated work to their own. Using Blitzer’s article “The Rhetorical Situation”, students create APA citations, summaries, paraphrases, key quotes, and personal responses with AI guidance. The assignment emphasizes critical evaluation by having students analyze the accuracy, effectiveness, and differences between human and AI-generated work. This exercise develops skills in rhetorical analysis, source engagement, and reflective assessment of AI as a tool for writing and research
Livestreaming a doitocracy: Platform-jumping participatory practices in modular synthesis gear cultures
Earth Modular Society (EMS) is a cross-platform community dedicated to live hardware modular synthesis, with their primary efforts going into a 24/7 modular “radio station” that livestreams on Twitch and YouTube, and a supporting Discord chat server. Although part of the broader modular synthesis gear culture (a large-scale social formation that coalesces around specific classes of fetishized technical objects), EMS represents a new development in gear cultures as it foregrounds live performance and minimizes the accrual of status via conspicuous consumption. The normative governance structures of platform-specific gear culture communities preclude all but one or a handful of users from having any meaningful input into the rules, governance, or participatory modes of the platform. In contrast, EMS is structured as a doitocracy, where all are encouraged to bring things to do to the community, and to support each other in doing. The doitocracy concept contributes to understanding the wide range of DIY (do-it-yourself) and DIT (do-it-together) community practices. This doitocracy case study also demonstrates the importance of analyzing the performance of crafted objects, and how these, alongside platform-jumping practices, can constitute primary organizational forces in online/offline community sociality
Building Blocks of Institutional Resilience: Governance, Crisis Preparedness, and Community Support in Non-Profit Schools
In the post-COVID-19 era, K-12 independent schools in New York City have faced unprecedented volatility, as evidenced by multiple school closures. This thesis presents the building blocks of an integrated framework for building institution resilience, specifically designed for Heads of School and board members. Central to this work is the “double bottom line” model, which views mission fulfillment and financial sustainability as dual priorities and inextricably linked.
The research synthesizes literature, reports, and industry best practices across four key areas: Governance, Risk Management, Emergency Planning, Program Adaptation and Community Support. By analyzing case studies of institutional failure and success, this thesis advocates for a shift from reactive compliance to a proactive, professionalized organizational culture. This includes implementing Enterprise Risk Management (ERM), Policy Governance, and evidence-based psychological support frameworks like PBIS and PREPaRE.
Ultimately, this work argues that resilient schools treat risk awareness as an essential mechanism for mission preservation rather than an administrative burden. While focused on the New York City context, the framework offers a transferable model for any mission-driven organization operating with high public trust