Search CORE

Infoscience - École polytechnique fédérale de Lausanne

Probabilistic mining of socio-geographic routines from mobile phone data

Author: Gatica-Perez Daniel
Farrahi Katayoun
Publication venue
Publication date: 01/01/2010
Field of study

There is relatively little work on the investigation of large-scale human data in terms of multimodality for human activity discovery. In this paper, we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, obtained by mobile cell tower connections, to mine meaningful details about human activities from large and noisy datasets. We propose a model, called bag of multimodal behavior, that integrates the modeling of variations of location over multiple time-scales, and the modeling of interaction types from proximity. Our representation is simple yet robust to characterize real-life human behavior sensed from mobile phones, which are devices capable of capturing large-scale data known to be noisy and incomplete. We use an unsupervised approach, based on probabilistic topic models, to discover latent human activities in terms of the joint interaction and location behaviors of 97 individuals over the course of approximately a 10-month period using data from MIT's Reality Mining project. Some of the human activities discovered with our multimodal data representation include “going out from 7 pm-midnight alone” and “working from 11 am-5 pm with 3-5 other people,” further finding that this activity dominantly occurs on specific days of the week. Our methodology also finds dominant work patterns occurring on other days of the week. We further demonstrate the feasibility of the topic modeling framework for human routine discovery by predicting missing multimodal phone data at specific times of the day

Infoscience - École polytechnique fédérale de Lausanne

Discovering routines from large-scale human locations using probabilistic topic models

Author: Gatica-Perez D
Gatica-Perez Daniel
Farrahi Katayoun
Publication venue
Publication date: 2011
Field of study

In this work, we discover the daily location-driven routines that are contained in a massive real-life human dataset collected by mobile phones. Our goal is the discovery and analysis of human routines that characterize both individual and group behaviors in terms of location patterns. We develop an unsupervised methodology based on two differing probabilistic topic models and apply them to the daily life of 97 mobile phone users over a 16-month period to achieve these goals. Topic models are probabilistic generative models for documents that identify the latent structure that underlies a set of words. Routines dominating the entire group's activities, identified with a methodology based on the Latent Dirichlet Allocation topic model, include “going to work late”, “going home early”, “working nonstop” and “having no reception (phone off)” at different times over varying time-intervals. We also detect routines which are characteristic of users, with a methodology based on the Author-Topic model. With the routines discovered, and the two methods of characterizing days and users, we can then perform various tasks. We use the routines discovered to determine behavioral patterns of users and groups of users. For example, we can find individuals that display specific daily routines, such as “going to work early” or “turning off the mobile (or having no reception) in the evenings”. We are also able to characterize daily patterns by determining the topic structure of days in addition to determining whether certain routines occur dominantly on weekends or weekdays. Furthermore, the routines discovered can be used to rank users or find subgroups of users who display certain routines. We can also characterize users based on their entropy. We compare our method to one based on clustering using K-means. Finally, we analyze an individual's routines over time to determine regions with high variations, which may correspond to specific events

What did you do today?: discovering daily routines from large-scale mobile data

Author: Gatica-Perez D
Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 01/01/2008
Field of study

We present a framework built from two Hierarchical Bayesian topic models to discover human location-driven routines from mobile phones. The framework uses location-driven bag representations of people's daily activities obtained from celltower connections. Using 68 000+ hours of real-life human data from the Reality Mining dataset, we successfully discover various types of routines. The first studied model, Latent Dirichlet Allocation (LDA), automatically discovers characteristic routines for all individuals in the study, including "going to work at 10am", "leaving work at night", or "staying home for the entire evening". In contrast, the second methodology with the Author Topic model (ATM) finds routines characteristic of a selected groups of users, such as "being at home in the mornings and evenings while being out in the afternoon", and ranks users by their probability of conforming to certain daily routines.</p

Infoscience - École polytechnique fédérale de Lausanne

Mining Human Location-Routines Using a Multi-Level Approach to Topic Modeling

Author: Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 01/08/2010
Field of study

In this work we address the problem of modeling varying time duration sequences for large-scale human routine discovery from cellphone sensor data using a multi-level approach to probabilistic topic models. We use an unsupervised learning approach that discovers human routines of varying durations ranging from half-hourly to several hours. Our methodology can handle large sequence lengths based on a principled procedure to deal with potentially large routine-vocabulary sizes, and can be applied to rather naive initial vocabularies to discover meaningful location-routines. We successfully apply the model to a large, real-life dataset, consisting of 97 cellphone users and 16 months of their location patterns, to discover routines with varying time durations.LIDIA

Infoscience - École polytechnique fédérale de Lausanne

Learning and Predicting Multimodal Daily Life Patterns from Cell Phones

Author: Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 01/11/2009
Field of study

In this paper, we investigate the multimodal nature of cell phone data in terms of discovering recurrent and rich patterns in people's lives. We present a method that can discover routines from multiple modalities (location and proximity) jointly modeled, and that uses these informative routines to predict unlabeled or missing data. Using a joint representation of location and proximity data over approximately 10 months of 97 individuals' lives, Latent Dirichlet Allocation is applied for the unsupervised learning of topics describing people's most common locations jointly with the most common types of interactions at these locations. We further successfully predict where and with how many other individuals users will be, for people with both highly and lowly varying lifestyles.LIDIA

Infoscience - École polytechnique fédérale de Lausanne

Discovering Human Routines from Cell Phone Data with Topic Models

Author: Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 01/01/2008
Field of study

We present a framework to automatically discover people's routines from information extracted by cell phones. The framework is built from a probabilistic topic model learned on novel bag type representations of activity-related cues (location, proximity and their temporal variations over a day) of peoples' daily routines. Using real-life data from the Reality Mining dataset, covering 68 000+ hours of human activities, we can successfully discover location-driven (from cell tower connections) and proximity-driven (from Bluetooth information) routines in an unsupervised manner. The resulting topics meaningfully characterize some of the underlying co-occurrence structure of the activities in the dataset, including ``going to work early/late", ``being home all day", ``working constantly", ``working sporadically" and ``meeting at lunch time".LIDIAPIDIAP-RR 08-3

Infoscience - École polytechnique fédérale de Lausanne

Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model

Author: Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 2012
Field of study

Mining patterns of human behavior from large-scale mobile phone data has potential to understand certain phenomena in society. The study of such human-centric massive datasets requires new mathematical models. In this paper, we propose a probabilistic topic model that we call the distant n-gram topic model (DNTM) to address the problem of learning long duration human location sequences. The DNTM is based on Latent Dirichlet Allocation (LDA). We define the generative process for the model, derive the inference procedure and evaluate our model on real mobile data. We consider two different real-life human datasets, collected by mobile phone locations, the first considering GPS locations and the second considering cell tower connections. The DNTM successfully discovers topics on the two datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model on unseen data. We find that the DNTM consistantly outperforms LDA as the sequence length increases.LIDIA

Infoscience - École polytechnique fédérale de Lausanne

Daily Routine Classification from Mobile Phone Data

Author: Katayoun Farrahi
Gatica-Perez Daniel
Daniel Gatica-Perez
Farrahi Katayoun
Publication venue
Publication date: 25/12/2008
Field of study

The automatic analysis of real-life, long-term behavior and dynamics of individuals and groups from mobile sensor data constitutes an emerging and challenging domain. We present a framework to classify people's daily routines (defined by day type, and by group affiliation type) from real-life data collected with mobile phones, which include physical location information (derived from cell tower connectivity), and social context (given by person proximity information derived from Bluetooth). We propose and compare single- and multi-modal routine representations at multiple time scales, each capable of highlighting different features from the data, to determine which best characterized the underlying structure of the daily routines. Using a massive data set of 87000+ hours spanning four months of the life of 30 university students, we show that the integration of location and social context and the use of multiple time-scales used in our method is effective, producing accuracies of over 80\% for the two daily routine classification tasks investigated, with significant performance differences with respect to the single-modal cues.LIDIAPIDIAP-RR 07-6