"Discovering Hidden Semantics from Spatio-temporal Sensed Data"

Funded by the National Science Foundation

PI: Vassilis J. Tsotras
Co-PI: Chinya Ravishankar
Award Number: IIS-1527984

Duration:  09/01/2015 through 08/31/2018

Pritom Ahmed
Payas Rajan
Elena Strzheletska


Project Summary:

Spatiotemporal data are rich in semantics and embed a great deal of high-level information on the behavior of moving objects as well their interactions with objects in the environment and with each other. To adequately reveal such semantics, this project has set three research goals in the spatiotemporal domain.

The first goal is to study queries that elicit the semantics of interactions between moving objects and their environment; an example of such a novel query is the `dwell region' query. A region R is a dwell region for a moving object O if, given a threshold distance d and duration t, every point of R remains within distance d from O for at least time t. Clearly, points within R are likely to be of interest to O, so identification of O and R has applications in areas such as monitoring and surveillance (i.e., identify the region that is under surveillance by a moving object).

The second goal is to address queries eliciting the semantics of interactions, between moving objects themselves. One example is the `assembly' query which identifies potential meeting points between moving objects when their trajectories are not known fully or precisely. Another example is addressing complex `reachability' queries, which elicit possible interactions between objects via intermediaries.

third goal of this project is to identify efficient solutions that are also practical. In that sense, our solutions should scale. An important related question is then how to efficiently index trajectory data. Another question is whether parallelism can be used to expedite the above queries.


Research Activities: Year 1

During the first year of this project we worked on the following:

(1) Dwell Region Queries. A Dwell Region query enables users identify new regions of interest, based on a moving object's spatio-temporal behavior. In [1] we first proposed an online algorithm to solve this problem, which can handle dynamic additions and deletions of moving object locations in logarithmic time. In this environment we assume an incoming stream of object positions, and maintain the upper and lower bounds for the radius of the smallest circle enclosing these positions, as points are added and deleted. These bounds allow us to greatly reduce the number of trajectory points we need to consider in the query, as well as to defer query evaluation. Our method can approximate the radius of the smallest circle enclosing a given subtrajectory within an arbitrarily small user-defined factor. As a byproduct, we solve efficiently a decision query of whether a dwell region exists or not. This query has applications to the trajectory simplification problem. We also examine the offline version of the problem and propose a preprocessing method that enables fast dwell region queries on archived trajectories. Using the preprocessed information dwell regions can be identified by only looking into a very small part of the archived data. Our experiments using real trajectory datasets show that the online approach can scale up to hundreds of thousands of moving objects for streaming data. For archived trajectories, the preprocessing approach gives orders of magnitudes of speedup in query evaluation time.

(2) Assembly Queries. Consider objects moving in a road network (e.g., groups of friends), who may be free to choose routes, yet be required to arrive at certain locations at certain times (preexisting requirements). Such objects may need to assemble in groups within the network (friends meet while all visiting the same city) without violating their preexisting constraints. Planning for such assemblies is hard when the network or the number of objects is large. Conversely, discovering actual or potential assemblies of such objects is important in many surveillance, security, and law-enforcement applications. This can be hard when object arrival observations are sparse due to inadequate sensor coverage or object countermeasures. In [2] we proposed the novel class of Assembly Queries to model these scenarios, and presented a unified scheme that addresses both of these complementary challenges. Given a set of objects and arrival constraints, we show how to first obtain the set of all possible locations visited by each moving object (the travel corridor), and then determine all possible assemblies, including the participants, locations, and durations. We present a formal model for various tracking strategies and several algorithms for using these strategies. We achieve excellent performance on these queries by preprocessing the network, using Contraction Hierarchies. Experimental results on real-world road networks show that we can efficiently and rapidly infer assembly information for very large networks and object groups. When compared with the traditional Dijkstra-based approaches our method runs 1-2 orders of magnitude faster.

(3) Spatiotemporal Reachability Queries. These queries arise naturally when determining how diseases, information, malware, physical items, etc. can propagate through a collection of moving objects; hence such queries are significant for many important domains like epidemiology, public health, security monitoring, surveillance, and social networks. While traditional reachability queries have been studied in graphs extensively, what makes spatiotemporal reachability queries different and challenging is that the associated graph is dynamic and space-time dependent. Moreover, previous spatiotemporal reachability work assumes an `instant transfer' scenario (where a node is considered to be instantly reachable from another node as long as there is a directed path to it, independently of the length and the number of intermediate nodes on that path), which may not be the case in many real world applications. Further, in [3] we assume that the collection of objects is large, and their behavior is considered over an extensive period of time; hence the dataset becomes very large itself, and does not fit in main memory. As a result an efficient solution needs to be disk I/O efficient. To solve this problem we proposed the RICC (Reachability Index Construction by Contraction) approach. We tested our algorithm on two types of realistic datasets using queries of various temporal length and different types (with single and multiple sources and targets). The results of our experiments show that RICC can be efficiently used for answering a wide range of spatiotemporal reachability queries on disk-resident datasets. When compared with the previous best known approach, RICC offers drastic reduction on the I/O (up to 80%).

(4) Investigative Search over Social Posts. The amount of user generated data increases every year, as more social interaction tools like Twitter, Instagram and Facebook are being created and more users use them to share their everyday experiences. Most research on analyzing social data has focused on detecting trends and patterns such as bursty topics and popular spatiotemporal paths, event extraction, studying how information is spread, or analyzing properties of the social graph. In contrast, and given the spatiotemporal nature of many of these social posts, we are interested in how to search social network data items, posts and images, based on spatiotemporal keyword queries. That is, in [4] we created methods to perform investigative search, in contrast to the trending queries studied by previous work. Investigative search can also be viewed as exploring the currently untapped long tail of the distribution of topics in social networks. This work led to the creation of OSNI (for Online Social Network Investigator) a scalable distributed system to search social network data, based on a spatiotemporal window and a list of keywords.


Research Activities: Year 2

During the second year of this project we worked on the following:

(1) Assembly Queries. We continued and revised our work and formally defined three assembly queries. We first defined the more general [γ, τ]-assembly query, which returns the set of assembly nodes at which at least γ objects could have met for at least τ time units. Another important query is the [top-k(γ), τ] query, that finds the k assembly nodes with the largest possible meeting sizes. Finally we defined the [γ, top-k(τ )] query, that reports the k assembly nodes with the longest-possible meeting durations. In the original work we had discussed the concepts of availability intervals, corridors, and assemblies with respect to nodes only, i.e., not along edges. However, it is likely that objects would actually meet at a location somewhere along the midspan of a road (edge), e.g., at a restaurant, mall, gas station, etc. We presented a way to address this case as well. The revised paper [2] was accepted at the ACM GIS 2017 Conference.

(2) Top-k Frequent Terms over Spatial Ranges. The wide availability of tracking devices has drastically increased the role of geolocation in social networks, resulting in new commercial applications; using geolocation, marketers can identify current trending topics within some region of interest and focus their products accordingly. In [5] we studied a novel analytics query on geotagged data, where given a spatial area we find the most frequent terms among the social posts in that area (kFST query). While there has been prior work on keyword search on spatial data (finding the objects located nearest to the query region and containing the given keywords), and on group keyword search on spatial data (retrieving groups of objects), our problem is different in that it finds the most frequent terms without any given set of query keywords. We proposed a novel index structure and algorithms to efficiently answer such spatial aggregation queries. Our index structure employs an R-tree augmented by top-k sorted term lists (STLs), where a key challenge is to balance the size of the index to achieve faster execution and smaller space requirements. We theoretically studied and experimentally validated the ideal length of the stored term lists, and performed detailed experiments to evaluate the performance of the proposed methods compared to baselines on real datasets. We further extended our algorithms to also cover the multi-region kFST problem and showed that our multi-region approach is more efficient than simply running the single-region algorithm multiple times. The full paper was presented in ACM SIGMOD 2017.

(3) Reachability with Meetings. Previous works on spatiotemporal reachability queries assume that information or physical items can be passed from one object to another either instantaneously or after some processing delay. There are applications however, where we also need to account for the transfer of information; doing so implies that the interacting objects would be forced to stay in contact for some time. This requirement makes the query processing even more challenging. In this work, we introduced a novel problem, namely, spatiotemporal reachability queries with transfer delays (`meetings'). For answering reachability queries with meetings, we proposed two algorithms, namely, RICCmeetMin and RICCmeetMax. To prune the search space during query time, these algorithms precompute some reachability events: the shortest valid (RICCmeetMin), and the longest possible (RICCmeetMax) meetings respectively. An extended experimental evaluation examines the efficiency and pruning characteristics of both RICCmeet algorithms for answering a variety of spatiotemporal reachability queries with meetings on large disk-resident datasets. The presented algorithms were about 3 times faster than the previous approach. Our work [6] was accepted at the ACM GIS 2017 Conference.

(4) Indexing Trajectories with Space Filling Curves. A classic problem for large trajectory repositories is how to efficiently index and query these trajectories. We consider a trajectory as a `polyline' whose endpoints correspond to a time-ordered sequence of locations (points) visited by a moving object (typically provided by sensor readings, user check-ins etc.) Common queries on trajectories are variations of spatial ranges and nearest neighbors. Traditionally, spatial objects are first abstracted by minimum bounding rectangles (MBRs) which are then inserted in R-trees (or their variations). However, using MBRs to index trajectories introduces large dead space and thus a high number of false positives for queries. Space filling curves (SFC) have been used extensively for indexing points in two and higher dimensional spaces. Their advantage arises from their locality preservation and dimensionality reduction. Extending SFCs to index segments instead of points is not trivial as it may lead to false negatives and/or higher query times. In our work we examine how to use SFCs to index trajectory polylines. Our initial experimental results show that our proposed method gives 2 to 15 times improvement for range queries over the state of the art methods.

(5) Massively Parallel Skyline Queries. Processing-In-Memory (PIM) is an increasingly popular architecture aimed at addressing the `memory wall' crisis by prioritizing the integration of processors within DRAM. The skyline operator is a known primitive used to identify those multi-dimensional points offering optimal trade-offs within a given dataset. For large scale data, calculating the skyline set is extensively compute and data intensive. PIM architectures offer many opportunities to mitigate this cost. However, their execution model relies on all processors operating in isolation with minimal data exchange. This prohibits direct application of known skyline optimizations which are inherently sequential, creating dependencies that limit the maximum parallelism and throughput. It is thus challenging to introduce a well design parallel skyline algorithm for prominent PIM architectures. In this work, we address these challenges by introducing a new parallel skyline algorithm called DSky. It is designed to be massively parallel and throughput efficient by leveraging a novel work assignment strategy that emphasizes on load balancing. Our experiments demonstrate that DSky outperforms in most cases the state-of-the-art algorithms developed for the CPU and GPU. For roughly the same work, our solution [7] achieves an order of magnitude higher throughput and 75% of the peak system throughput.



[1] M Reaz Uddin, Chinya V. Ravishankar, Vassilis J. Tsotras, "Dwell Regions: a Novel Query for Identifying Regions of Interest in Trajectory Data", submitted.

[2] M Reaz Uddin, Michael N. Rice, Chinya V. Ravishankar, Vassilis J. Tsotras, "Assembly Queries: Planning and Discovering Assemblies of Moving Objects Using Partial Information", ACM SIGSPATIAL GIS Conference, Redondo Beach, November 2017, to appear.

[3] Elena V. Strzheletska, Vassilis J. Tsotras, ``RICC: Fast Reachability Query Processing on Large Spatiotemporal Datasets", SSTD 2015, pp 3-21, link.

[4] Shiwen Cheng, James Fang, Vagelis Hristidis, Harsha V. Madhyastha, Niluthpol Chowdhury Mithun, Dorian Perkins, Amit K. Roy-Chowdhury, Moloud Shahbazi, Vassilis J. Tsotras, ``OSNI: Searching for Needles in a Haystack of Social Network Data", EDBT 2016, pp: 616-619, link.

[5] Mahbub Hasan, Pritom Ahmed, Abhijith Kashyap, Vagelis Hristidis, Vassilis J. Tsotras, ``Efficient Computation of Top-k Frequent Terms over Spatial Ranges", Proceedings ACM SIGMOD Conference, Chicago, May 2017, pages 1227-1241, link.

[6] Elena V. Strzheletska, Vassilis J. Tsotras, ``RICCmeet: Efficient Processing of Novel Reachability Queries on Large Spatiotemporal Datasets", ACM SIGSPATIAL GIS Conference, Redondo Beach, November 2017, to appear.

[7] Vasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, Jean Francois Roy, ``Massively Parallel Skyline Computation For Processing-In-Memory Architectures", submitted.


Acknowledgement: This material is based upon work supported by the National Science Foundation under Grant No. IIS-1527984

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.