1,721,314 research outputs found
A probabilistic approach to mining mobile phone data sequences
We present a new approach to address the problem of large sequence mining from big data. The particular problem of interest is the effective mining of long sequences from large-scale location data to be practical for Reality Mining applications, which suffer from large amounts of noise and lack of ground truth. To address this complex data, we propose an unsupervised probabilistic topic model called the distant n-gram topic model (DNTM). The DNTM is based on latent Dirichlet allocation (LDA), which is extended to integrate sequential information. We define the generative process for the model, derive the inference procedure, and evaluate our model on both synthetic data and real mobile phone data. We consider two different mobile phone datasets containing natural human mobility patterns obtained by location sensing, the first considering GPS/wi-fi locations and the second considering cell tower connections. The DNTM discovers meaningful topics on the synthetic data as well as the two mobile phone datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model. The results show that the DNTM consistently outperforms LDA as the sequence length increases
Probabilistic mining of socio-geographic routines from mobile phone data
There is relatively little work on the investigation of large-scale human data in terms of multimodality for human activity discovery. In this paper, we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, obtained by mobile cell tower connections, to mine meaningful details about human activities from large and noisy datasets. We propose a model, called bag of multimodal behavior, that integrates the modeling of variations of location over multiple time-scales, and the modeling of interaction types from proximity. Our representation is simple yet robust to characterize real-life human behavior sensed from mobile phones, which are devices capable of capturing large-scale data known to be noisy and incomplete. We use an unsupervised approach, based on probabilistic topic models, to discover latent human activities in terms of the joint interaction and location behaviors of 97 individuals over the course of approximately a 10-month period using data from MIT's Reality Mining project. Some of the human activities discovered with our multimodal data representation include “going out from 7 pm-midnight alone” and “working from 11 am-5 pm with 3-5 other people,” further finding that this activity dominantly occurs on specific days of the week. Our methodology also finds dominant work patterns occurring on other days of the week. We further demonstrate the feasibility of the topic modeling framework for human routine discovery by predicting missing multimodal phone data at specific times of the day
Discovering routines from large-scale human locations using probabilistic topic models
In this work, we discover the daily location-driven routines that are contained in a massive real-life human dataset collected by mobile phones. Our goal is the discovery and analysis of human routines that characterize both individual and group behaviors in terms of location patterns. We develop an unsupervised methodology based on two differing probabilistic topic models and apply them to the daily life of 97 mobile phone users over a 16-month period to achieve these goals. Topic models are probabilistic generative models for documents that identify the latent structure that underlies a set of words. Routines dominating the entire group's activities, identified with a methodology based on the Latent Dirichlet Allocation topic model, include “going to work late”, “going home early”, “working nonstop” and “having no reception (phone off)” at different times over varying time-intervals. We also detect routines which are characteristic of users, with a methodology based on the Author-Topic model. With the routines discovered, and the two methods of characterizing days and users, we can then perform various tasks. We use the routines discovered to determine behavioral patterns of users and groups of users. For example, we can find individuals that display specific daily routines, such as “going to work early” or “turning off the mobile (or having no reception) in the evenings”. We are also able to characterize daily patterns by determining the topic structure of days in addition to determining whether certain routines occur dominantly on weekends or weekdays. Furthermore, the routines discovered can be used to rank users or find subgroups of users who display certain routines. We can also characterize users based on their entropy. We compare our method to one based on clustering using K-means. Finally, we analyze an individual's routines over time to determine regions with high variations, which may correspond to specific events
What did you do today?: discovering daily routines from large-scale mobile data
We present a framework built from two Hierarchical Bayesian topic models to discover human location-driven routines from mobile phones. The framework uses location-driven bag representations of people's daily activities obtained from celltower connections. Using 68 000+ hours of real-life human data from the Reality Mining dataset, we successfully discover various types of routines. The first studied model, Latent Dirichlet Allocation (LDA), automatically discovers characteristic routines for all individuals in the study, including "going to work at 10am", "leaving work at night", or "staying home for the entire evening". In contrast, the second methodology with the Author Topic model (ATM) finds routines characteristic of a selected groups of users, such as "being at home in the mornings and evenings while being out in the afternoon", and ranks users by their probability of conforming to certain daily routines.</p
Mining Human Location-Routines Using a Multi-Level Approach to Topic Modeling
In this work we address the problem of modeling varying time duration sequences for large-scale human routine discovery from cellphone sensor data using a multi-level approach to probabilistic topic models. We use an unsupervised learning approach that discovers human routines of varying durations ranging from half-hourly to several hours. Our methodology can handle large sequence lengths based on a principled procedure to deal with potentially large routine-vocabulary sizes, and can be applied to rather naive initial vocabularies to discover meaningful location-routines. We successfully apply the model to a large, real-life dataset, consisting of 97 cellphone users and 16 months of their location patterns, to discover routines with varying time durations.LIDIA
Learning and Predicting Multimodal Daily Life Patterns from Cell Phones
In this paper, we investigate the multimodal nature of cell phone data in terms of discovering recurrent and rich patterns in people's lives. We present a method that can discover routines from multiple modalities (location and proximity) jointly modeled, and that uses these informative routines to predict unlabeled or missing data. Using a joint representation of location and proximity data over approximately 10 months of 97 individuals' lives, Latent Dirichlet Allocation is applied for the unsupervised learning of topics describing people's most common locations jointly with the most common types of interactions at these locations. We further successfully predict where and with how many other individuals users will be, for people with both highly and lowly varying lifestyles.LIDIA
Discovering Human Routines from Cell Phone Data with Topic Models
We present a framework to automatically discover people's routines from information extracted by cell phones. The framework is built from a probabilistic topic model learned on novel bag type representations of activity-related cues (location, proximity and their temporal variations over a day) of peoples' daily routines. Using real-life data from the Reality Mining dataset, covering 68 000+ hours of human activities, we can successfully discover location-driven (from cell tower connections) and proximity-driven (from Bluetooth information) routines in an unsupervised manner. The resulting topics meaningfully characterize some of the underlying co-occurrence structure of the activities in the dataset, including ``going to work early/late", ``being home all day", ``working constantly", ``working sporadically" and ``meeting at lunch time".LIDIAPIDIAP-RR 08-3
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model
Mining patterns of human behavior from large-scale mobile phone data has potential to understand certain phenomena in society. The study of such human-centric massive datasets requires new mathematical models. In this paper, we propose a probabilistic topic model that we call the distant n-gram topic model (DNTM) to address the problem of learning long duration human location sequences. The DNTM is based on Latent Dirichlet Allocation (LDA). We define the generative process for the model, derive the inference procedure and evaluate our model on real mobile data. We consider two different real-life human datasets, collected by mobile phone locations, the first considering GPS locations and the second considering cell tower connections. The DNTM successfully discovers topics on the two datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model on unseen data. We find that the DNTM consistantly outperforms LDA as the sequence length increases.LIDIA
Daily Routine Classification from Mobile Phone Data
The automatic analysis of real-life, long-term behavior and dynamics of individuals and groups from mobile sensor data constitutes an emerging and challenging domain. We present a framework to classify people's daily routines (defined by day type, and by group affiliation type) from real-life data collected with mobile phones, which include physical location information (derived from cell tower connectivity), and social context (given by person proximity information derived from Bluetooth). We propose and compare single- and multi-modal routine representations at multiple time scales, each capable of highlighting different features from the data, to determine which best characterized the underlying structure of the daily routines. Using a massive data set of 87000+ hours spanning four months of the life of 30 university students, we show that the integration of location and social context and the use of multiple time-scales used in our method is effective, producing accuracies of over 80\% for the two daily routine classification tasks investigated, with significant performance differences with respect to the single-modal cues.LIDIAPIDIAP-RR 07-6
Maya Codex
The Maya Codex Dataset contains high-quality representation of Maya hieroglyph data, extracted from the three surviving ancient Maya codices (the Dresden, Madrid and Paris codices). A statistical glyph co-occurrence model, which is extracted from the Thompson catalog (J. E. S. Thompson. A catalog of Maya Hieroglyphs. University of Oklahoma Press, 1962.), is also included.
The dataset is generated by epigraphers in our team. The current dataset contains 174 reconstructed high-quality glyphs segmented from 72 blocks, together with the corresponding annotation for each individual glyph.
In order to encode the context information, glyphs segmented from each block are arranged in the form of a string according to the reading order.
This dataset can not only be used as a shape analysis benchmark, but also to study the ancient Maya writing system.
Citation
If you use this dataset, please cite the following publication:
@inproceedings{Hu2014,
author = {Hu, Rui and Gayol, Carlos Pallan and Krempel, Guido and Odobez, Jean-Marc and Gatica-Perez, Daniel},
title = {Automatic Maya Hieroglyph Retrieval Using Shape and Context Information},
booktitle = {Proceedings of the ACM International Conference on Multimedia},
series = {MM '14},
year = {2014},
isbn = {978-1-4503-3063-3},
location = {Orlando, Florida, USA},
pages = {1037--1040},
numpages = {4},
url = {http://doi.acm.org/10.1145/2647868.2655044},
doi = {10.1145/2647868.2655044},
acmid = {2655044},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {glyph co-occurrence, image retrieval, markov model, maya hieroglyph, shape descriptors},
- …
