DR-NTU (Data) (Nanyang Technological University)

Not a member yet

1955 research outputs found

Sort by

Replication Data for: Forced and Free Waves of simulated Volcanic Meteo-Tsunamis in the South China Sea

Author: Verolino Andrea
Watanabe Masashi
Felix Raquel
Tan Elaine
Yang Jie
Weiss Robert
Lynett Patrick
Switzer D. Adam
Publication venue: DR-NTU (Data)
Publication date: 01/03/2025
Field of study

This work assesses and quantifies the forced and free wave components of volcanic meteo-tsunamis for simulated scenarios around the South China Sea. Free waves have the demonstrated potential of arriving much later than their force leading counterpart and this has important implication in tsunami hazard assessments and early warning system advancements

Replication Data for: Do Not DeepFake Me: Privacy-Preserving Neural 3D Head Reconstruction Without Sensitive Images

Author: Kong Jiayi
Song Xurui
Huai Shuo
Xu Baixin
Luo Jun
He Ying
Publication venue: DR-NTU (Data)
Publication date: 10/01/2025
Field of study

While 3D head reconstruction is widely used for modeling, existing neural reconstruction approaches rely on high-resolution multi-view images, posing notable privacy issues. Individuals are particularly sensitive to facial features, and facial image leakage can lead to many malicious activities, such as unauthorized tracking and deepfake. In contrast, geometric data is less susceptible to misuse due to its complex processing requirements, and absence of facial texture features. In this paper, we propose a novel two-stage 3D facial reconstruction method aimed at avoiding exposure to sensitive facial information while preserving detailed geometric accuracy. Our approach first uses non-sensitive rear-head images for initial geometry and then refines this geometry using processed privacy-removed gradient images. Extensive experiments show that the resulting geometry is comparable to methods using full images, while the process is resistant to DeepFake applications and facial recognition (FR) systems, thereby proving its effectiveness in privacy protection

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Author: Chen Zhaoxi
Tang Jiaxiang
Dong Yuhao
Cao Ziang
Hong Fangzhou
Lan Yushi
Wang Tengfei
Xie Haozhe
Wu Tong
Saito Shunsuke
Pan Liang
Lin Dahua
Liu Ziwei
Publication venue: DR-NTU (Data)
Publication date: 10/03/2025
Field of study

The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Author: Pang Hui En
Liu Shuai
Cai Zhongang
Lei Yang
Zhang Tianwei
Liu Ziwei
Publication venue: DR-NTU (Data)
Publication date: 19/03/2025
Field of study

We present Disco4D, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. 1) Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. 2) It adopts diffusion models to enhance the 3D generation process, e.g., modeling occluded parts not visible in the input image. 3) It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks

Replication Data for: Quantifying interactive photochemical and microbial removal of terrestrial dissolved organic carbon: from experiments to modelling

Author: Martin Patrick
Publication venue: DR-NTU (Data)
Publication date: 07/02/2025
Field of study

This dataset contains replication data for manuscript "Quantifying interactive photochemical and microbial removal of terrestrial dissolved organic carbon: from experiments to modelling", revised and resubmitted to Limnology & Oceanography Letters. The dataset contains data on degradation rates of dissolved organic matter during simultaneous photochemical and microbial degradation

Real-world clinical practice of Diabetic Foot Ulcer prevention and care management in Singapore: A qualitative comparative inquiry with healthcare professionals

Author: Pienkowska Anita
Ho Andy Hau Yan
Publication venue: DR-NTU (Data)
Publication date: 12/06/2025
Field of study

Research Data for Project: Preventing limb losses in Singapore through Health Literacy and Healthcare Improvements, Work Package 1 Focus Group Discussion Transcripts and Participant Socio-demographic Summary [In Internal Review

Preregistration Document for: How do Singaporean young adults view disordered speech of children?

Author: Lim Melissa J. Y.
Goh Kok Yew Shaun
Styles Suzy J.
Publication venue: DR-NTU (Data)
Publication date: 14/04/2025
Field of study

Research has shown that speech and language disorders affect many around the world, with Developmental Language Disorder (DLD) affecting approximately 7% (Norbury et. al., 2016) and stuttering having an incidence rate of approximately 8% (Yairi & Ambrose, 2013). However, public awareness of these disorders and what they entail is often limited – Kim et. al.’s 2023 study of Australian public awareness towards DLD and the overlapping Specific Language Impairment label demonstrates limited awareness levels compared to other developmental disorders. Societal perception of individuals based on their speech, which can be affected by the speech and language disorders they experience, is significantly worse compared to unaffected peers (Allard and Williams, 2008), with practical effects on their access to opportunities. However, a better understanding of local perception towards different presentations of speech, especially as early as in childhood, is crucial to designing better responses that address the concerns involved. Therefore, the present study focuses on the following question: “How do Singaporean young adults view the speech of children who have speech and language disorders, versus those without?” Project initiated as part of a Final Year Project in Psychology at NTU

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Author: Dong Yuhao
Liu Zuyan
Sun Hai-Long
Yang Jingkang
Hu Winston
Rao Yongming
Liu Ziwei
Publication venue: DR-NTU (Data)
Publication date: 09/05/2025
Field of study

Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more, evolving from Chain-of-Thought prompting to product-level solutions like OpenAI o1. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optimized training pipelines still remain inadequately explored in vision-language tasks. In this paper, we present Insight-V, an early effort to 1) scalably produce long and robust reasoning data for complex multi-modal tasks, and 2) an effective training pipeline to enhance the reasoning capabilities of multi-modal large language models (MLLMs). Specifically, to create long and structured reasoning data without human labor, we design a two-step pipeline with a progressive strategy to generate sufficiently long and diverse reasoning paths and a multi-granularity assessment method to ensure data quality. We observe that directly supervising MLLMs with such long and complex reasoning data will not yield ideal reasoning ability. To tackle this problem, we design a multi-agent system consisting of a reasoning agent dedicated to performing long-chain reasoning and a summary agent trained to judge and summarize reasoning results. We further incorporate an iterative DPO algorithm to enhance the reasoning agent's generation stability and quality. Based on the popular LLaVA-NeXT model and our stronger base MLLM, we demonstrate significant performance gains across challenging multi-modal benchmarks requiring visual reasoning. Benefiting from our multi-agent system, Insight-V can also easily maintain or improve performance on perception-focused multi-modal tasks

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Author: Zhou Yifan
Xiao Zeqi
Yang Shuai
Pan Xingang
Publication venue: DR-NTU (Data)
Publication date: 03/06/2025
Field of study

Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, we redesign LDMs to enhance consistency by making them shift-equivariant. While introducing anti-aliasing operations can partially improve shift-equivariance, significant aliasing and inconsistency persist due to the unique challenges in LDMs, including 1) aliasing amplification during VAE training and multiple U-Net inferences, and 2) self-attention modules that inherently lack shift-equivariance. To address these issues, we redesign the attention modules to be shift-equivariant and propose an equivariance loss that effectively suppresses the frequency bandwidth of the features in the continuous domain. The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is also robust to irregular warping. Extensive experiments demonstrate that AF-LDM produces significantly more consistent results than vanilla LDM across various applications, including video editing and image-to-image translation. Code is available at: https://github.com/SingleZombie/AFLD

Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025)

Author: Tan Yew Lee
Publication venue: DR-NTU (Data)
Publication date: 26/06/2025
Field of study

Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025) to uncompress: cat lines_recognition_part_* | tar --zstd -xvf

0

full texts

1,955

metadata records

Updated in last 30 days.

DR-NTU (Data) (Nanyang Technological University) is based in Singapore

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇