DR-NTU (Data) (Nanyang Technological University)
DR-NTU (Data) (Nanyang Technological University)Not a member yet
1955 research outputs found
Sort by
Replication Data for: Forced and Free Waves of simulated Volcanic Meteo-Tsunamis in the South China Sea
This work assesses and quantifies the forced and free wave components of volcanic meteo-tsunamis for simulated scenarios around the South China Sea. Free waves have the demonstrated potential of arriving much later than their force leading counterpart and this has important implication in tsunami hazard assessments and early warning system advancements
Replication Data for: Do Not DeepFake Me: Privacy-Preserving Neural 3D Head Reconstruction Without Sensitive Images
While 3D head reconstruction is widely used for modeling, existing neural reconstruction approaches rely on high-resolution multi-view images, posing notable privacy issues. Individuals are particularly sensitive to facial features, and facial image leakage can lead to many malicious activities, such as unauthorized tracking and deepfake. In contrast, geometric data is less susceptible to misuse due to its complex processing requirements, and absence of facial texture features. In this paper, we propose a novel two-stage 3D facial reconstruction method aimed at avoiding exposure to sensitive facial information while preserving detailed geometric accuracy. Our approach first uses non-sensitive rear-head images for initial geometry and then refines this geometry using processed privacy-removed gradient images. Extensive experiments show that the resulting geometry is comparable to methods using full images, while the process is resistant to DeepFake applications and facial recognition (FR) systems, thereby proving its effectiveness in privacy protection
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
We present Disco4D, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. 1) Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. 2) It adopts diffusion models to enhance the 3D generation process, e.g., modeling occluded parts not visible in the input image. 3) It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks
Replication Data for: Quantifying interactive photochemical and microbial removal of terrestrial dissolved organic carbon: from experiments to modelling
This dataset contains replication data for manuscript "Quantifying interactive photochemical and microbial removal of terrestrial dissolved organic carbon: from experiments to modelling", revised and resubmitted to Limnology & Oceanography Letters. The dataset contains data on degradation rates of dissolved organic matter during simultaneous photochemical and microbial degradation
Real-world clinical practice of Diabetic Foot Ulcer prevention and care management in Singapore: A qualitative comparative inquiry with healthcare professionals
Research Data for Project: Preventing limb losses in Singapore through Health Literacy and Healthcare Improvements, Work Package 1 Focus Group Discussion Transcripts and Participant Socio-demographic Summary
[In Internal Review
Preregistration Document for: How do Singaporean young adults view disordered speech of children?
Research has shown that speech and language disorders affect many around the world, with Developmental Language Disorder (DLD) affecting approximately 7% (Norbury et. al., 2016) and stuttering having an incidence rate of approximately 8% (Yairi & Ambrose, 2013). However, public awareness of these disorders and what they entail is often limited – Kim et. al.’s 2023 study of Australian public awareness towards DLD and the overlapping Specific Language Impairment label demonstrates limited awareness levels compared to other developmental disorders. Societal perception of individuals based on their speech, which can be affected by the speech and language disorders they experience, is significantly worse compared to unaffected peers (Allard and Williams, 2008), with practical effects on their access to opportunities. However, a better understanding of local perception towards different presentations of speech, especially as early as in childhood, is crucial to designing better responses that address the concerns involved.
Therefore, the present study focuses on the following question: “How do Singaporean young adults view the speech of children who have speech and language disorders, versus those without?” Project initiated as part of a Final Year Project in Psychology at NTU
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more, evolving from Chain-of-Thought prompting to product-level solutions like OpenAI o1. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optimized training pipelines still remain inadequately explored in vision-language tasks. In this paper, we present Insight-V, an early effort to 1) scalably produce long and robust reasoning data for complex multi-modal tasks, and 2) an effective training pipeline to enhance the reasoning capabilities of multi-modal large language models (MLLMs). Specifically, to create long and structured reasoning data without human labor, we design a two-step pipeline with a progressive strategy to generate sufficiently long and diverse reasoning paths and a multi-granularity assessment method to ensure data quality. We observe that directly supervising MLLMs with such long and complex reasoning data will not yield ideal reasoning ability. To tackle this problem, we design a multi-agent system consisting of a reasoning agent dedicated to performing long-chain reasoning and a summary agent trained to judge and summarize reasoning results. We further incorporate an iterative DPO algorithm to enhance the reasoning agent's generation stability and quality. Based on the popular LLaVA-NeXT model and our stronger base MLLM, we demonstrate significant performance gains across challenging multi-modal benchmarks requiring visual reasoning. Benefiting from our multi-agent system, Insight-V can also easily maintain or improve performance on perception-focused multi-modal tasks
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, we redesign LDMs to enhance consistency by making them shift-equivariant. While introducing anti-aliasing operations can partially improve shift-equivariance, significant aliasing and inconsistency persist due to the unique challenges in LDMs, including 1) aliasing amplification during VAE training and multiple U-Net inferences, and 2) self-attention modules that inherently lack shift-equivariance.
To address these issues, we redesign the attention modules to be shift-equivariant and propose an equivariance loss that effectively suppresses the frequency bandwidth of the features in the continuous domain. The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is also robust to irregular warping. Extensive experiments demonstrate that AF-LDM produces significantly more consistent results than vanilla LDM across various applications, including video editing and image-to-image translation. Code is available at: https://github.com/SingleZombie/AFLD
Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025)
Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025)
to uncompress:
cat lines_recognition_part_* | tar --zstd -xvf