Spatiotemporal data are rich in semantics and embed a
great deal of high-level information on the behavior of moving objects as well
their interactions with objects in the environment and with each other. To
adequately reveal such semantics, this project has set three research goals in
the spatiotemporal domain.
The first goal is to study queries that
elicit the semantics of interactions between moving objects and their
environment; an example of such a novel query is the `dwell region' query. A region R is a
dwell region for a moving object O if, given a threshold distance d and
every point of R remains within distance d from O for at least time t. Clearly,
points within R are likely to be of interest to O, so identification of O and R has
applications in areas such as monitoring and surveillance (i.e., identify the
region that is under surveillance by a moving object).
The second goal is to address queries
eliciting the semantics of interactions, between moving objects themselves. One
example is the `assembly'
query which identifies potential meeting points between moving objects
when their trajectories are not known fully or precisely. Another example is
addressing complex `reachability' queries, which elicit possible interactions between
objects via intermediaries.
A third goal of this project is to identify
efficient solutions that are also practical. In that sense, our solutions
should scale. An important related question is then how to efficiently index
trajectory data. Another question is whether parallelism can be used to
expedite the above queries.
During the first year of this project we worked on the
(1) Dwell Region Queries. A Dwell Region query enables users identify new regions of
interest, based on a moving object's spatio-temporal
behavior. In  we first proposed an online algorithm to solve this problem,
which can handle dynamic additions and deletions of moving object locations in
logarithmic time. In this environment we assume an incoming stream of object
positions, and maintain the upper and lower bounds for the radius of the
smallest circle enclosing these positions, as points are added and deleted.
These bounds allow us to greatly reduce the number of trajectory points we need
to consider in the query, as well as to defer query evaluation. Our method can
approximate the radius of the smallest circle enclosing a given subtrajectory within an arbitrarily small user-defined
factor. As a byproduct, we solve efficiently a decision query of whether a
dwell region exists or not. This query has applications to the trajectory
simplification problem. We also examine the offline version of the problem and
propose a preprocessing method that enables fast dwell region queries on
archived trajectories. Using the preprocessed information dwell regions can be
identified by only looking into a very small part of the archived data. Our
experiments using real trajectory datasets show that the online approach can
scale up to hundreds of thousands of moving objects for streaming data. For
archived trajectories, the preprocessing approach gives orders of magnitudes of
speedup in query evaluation time.
(2) Assembly Queries. Consider objects moving in a road network (e.g., groups of friends), who
may be free to choose routes, yet be required to arrive at certain locations at
certain times (preexisting requirements). Such objects may need to assemble in
groups within the network (friends meet while all visiting the same city) without
violating their preexisting constraints. Planning for such assemblies is hard
when the network or the number of objects is large. Conversely, discovering
actual or potential assemblies of such objects is important in many
surveillance, security, and law-enforcement applications. This can be hard when
object arrival observations are sparse due to inadequate sensor coverage or
object countermeasures. In  we proposed the novel class of Assembly Queries
to model these scenarios, and presented a unified scheme that addresses both of
these complementary challenges. Given a set of objects and arrival constraints,
we show how to first obtain the set of all possible locations visited by each
moving object (the travel corridor), and then determine all possible
assemblies, including the participants, locations, and durations. We present a
formal model for various tracking strategies and several algorithms for using
these strategies. We achieve excellent performance on these queries by
preprocessing the network, using Contraction Hierarchies. Experimental results
on real-world road networks show that we can efficiently and rapidly infer
assembly information for very large networks and object groups. When compared
with the traditional Dijkstra-based approaches our method runs 1-2 orders of
(3) Spatiotemporal Reachability Queries. These queries arise naturally
when determining how diseases, information, malware, physical items, etc. can
propagate through a collection of moving objects; hence such queries are
significant for many important domains like epidemiology, public health,
security monitoring, surveillance, and social networks. While traditional
reachability queries have been studied in graphs extensively, what makes
spatiotemporal reachability queries different and challenging is that the
associated graph is dynamic and space-time dependent. Moreover, previous
spatiotemporal reachability work assumes an `instant transfer' scenario (where
a node is considered to be instantly reachable from another node as long as
there is a directed path to it, independently of the length and the number of
intermediate nodes on that path), which may not be the case in many real world
applications. Further, in  we assume that the collection of objects is large,
and their behavior is considered over an extensive period of time; hence the
dataset becomes very large itself, and does not fit in main memory. As a result
an efficient solution needs to be disk I/O efficient. To solve this problem we
proposed the RICC (Reachability Index Construction by Contraction) approach. We
tested our algorithm on two types of realistic datasets using queries of
various temporal length and different types (with single and multiple sources
and targets). The results of our experiments show that RICC can be efficiently
used for answering a wide range of spatiotemporal reachability queries on
disk-resident datasets. When compared with the previous best known approach,
RICC offers drastic reduction on the I/O (up to 80%).
(4) Investigative Search over Social Posts. The amount of user generated data
increases every year, as more social interaction tools like Twitter, Instagram
and Facebook are being created and more users use them to share their everyday
experiences. Most research on analyzing social data has focused on detecting
trends and patterns such as bursty topics and popular
spatiotemporal paths, event extraction, studying how information is spread, or
analyzing properties of the social graph. In contrast, and given the
spatiotemporal nature of many of these social posts, we are interested in how
to search social network data items, posts and images, based on spatiotemporal
keyword queries. That is, in  we created methods to perform investigative
search, in contrast to the trending queries studied by previous work.
Investigative search can also be viewed as exploring the currently untapped
long tail of the distribution of topics in social networks. This work led to
the creation of OSNI (for Online Social Network Investigator) a scalable
distributed system to search social network data, based on a spatiotemporal
window and a list of keywords.
During the second year of this project we worked on
(1) Assembly Queries. We continued and revised our work and formally defined three assembly
queries. We first defined the more general [γ, τ]-assembly query,
which returns the set of assembly nodes at which at least γ objects could
have met for at least τ time units. Another important query is the [top-k(γ), τ] query, that finds the k assembly nodes
with the largest possible meeting sizes. Finally we defined the [γ, top-k(τ )] query,
that reports the k assembly nodes with the longest-possible meeting durations.
In the original work we had discussed the concepts of availability intervals,
corridors, and assemblies with respect to nodes only, i.e., not along edges.
However, it is likely that objects would actually meet at a location somewhere
along the midspan of a road (edge), e.g., at a
restaurant, mall, gas station, etc. We presented a way to address this case as
well. The revised paper  was accepted at the ACM GIS 2017 Conference.
(2) Top-k Frequent Terms over Spatial Ranges. The wide availability of tracking
devices has drastically increased the role of geolocation
in social networks, resulting in new commercial applications; using geolocation, marketers can identify current trending topics
within some region of interest and focus their products accordingly. In  we
studied a novel analytics query on geotagged data,
where given a spatial area we find the most frequent terms among the social
posts in that area (kFST query). While there has been prior work on
keyword search on spatial data (finding the objects located nearest to the
query region and containing the given keywords), and on group keyword search on
spatial data (retrieving groups of objects), our problem is different in that
it finds the most frequent terms without any given set of query keywords. We
proposed a novel index structure and algorithms to efficiently answer such
spatial aggregation queries. Our index structure employs an R-tree augmented by
top-k sorted term lists (STLs), where a key challenge
is to balance the size of the index to achieve faster execution and
smaller space requirements. We theoretically studied and experimentally
validated the ideal length of the stored term lists, and performed detailed
experiments to evaluate the performance of the proposed methods compared to
baselines on real datasets. We further extended our algorithms to also cover
the multi-region kFST problem and showed that our
multi-region approach is more efficient than simply running the single-region
algorithm multiple times. The full paper was presented in ACM SIGMOD 2017.
(3) Reachability with Meetings.
Previous works on spatiotemporal reachability queries
assume that information or physical items can be passed from one object to
another either instantaneously or after some processing delay. There are
applications however, where we also need to account for the transfer of
information; doing so implies that the interacting objects would be forced to
stay in contact for some time. This requirement makes the query processing even
more challenging. In this work, we introduced a novel problem, namely,
spatiotemporal reachability queries with transfer
delays (`meetings'). For answering reachability
queries with meetings, we proposed two algorithms, namely, RICCmeetMin
and RICCmeetMax. To prune the search space during
query time, these algorithms precompute some reachability events: the shortest valid (RICCmeetMin), and the longest possible (RICCmeetMax)
meetings respectively. An extended experimental evaluation examines the
efficiency and pruning characteristics of both RICCmeet
algorithms for answering a variety of spatiotemporal reachability
queries with meetings on large disk-resident datasets. The presented algorithms
were about 3 times faster than the previous approach. Our work  was accepted
at the ACM GIS 2017 Conference.
Trajectories with Space Filling Curves. A classic problem for large
trajectory repositories is how to efficiently index and query these
trajectories. We consider a trajectory as a `polyline'
whose endpoints correspond to a time-ordered sequence of locations (points)
visited by a moving object (typically provided by sensor readings, user
check-ins etc.) Common queries on trajectories are variations of spatial ranges
and nearest neighbors. Traditionally, spatial objects are first abstracted by minimum
bounding rectangles (MBRs) which
are then inserted in R-trees (or their variations). However, using MBRs to index trajectories introduces large dead space and
thus a high number of false positives for queries. Space filling curves (SFC)
have been used extensively for indexing points in two and higher dimensional
spaces. Their advantage arises from their locality preservation and
dimensionality reduction. Extending SFCs to index
segments instead of points is not trivial as it may lead to false negatives
and/or higher query times. In our work we examine how to use SFCs to index trajectory polylines.
Our initial experimental results show that our proposed method gives 2 to 15
times improvement for range queries over the state of the art methods.
(5) Massively Parallel Skyline Queries. Processing-In-Memory (PIM) is an increasingly popular
architecture aimed at addressing the `memory wall' crisis by prioritizing the
integration of processors within DRAM. The skyline operator is a known
primitive used to identify those multi-dimensional points offering optimal
trade-offs within a given dataset. For large scale
data, calculating the skyline set is extensively compute and data intensive.
PIM architectures offer many opportunities to mitigate this cost. However,
their execution model relies on all processors operating in isolation with
minimal data exchange. This prohibits direct application of known skyline optimizations which are inherently sequential, creating
dependencies that limit the maximum parallelism and throughput. It is thus
challenging to introduce a well design parallel skyline algorithm for prominent
PIM architectures. In this work, we address these challenges by introducing a
new parallel skyline algorithm called DSky. It is
designed to be massively parallel and throughput efficient by leveraging a
novel work assignment strategy that emphasizes on load balancing. Our
experiments demonstrate that DSky outperforms in most
cases the state-of-the-art algorithms developed for the CPU and GPU. For roughly
the same work, our solution  achieves an order of magnitude higher
throughput and 75% of the peak system throughput.
 M Reaz Uddin, Chinya V. Ravishankar, Vassilis J. Tsotras, "Dwell
Regions: a Novel Query for Identifying Regions of Interest in Trajectory
 M Reaz Uddin, Michael N.
Rice, Chinya V. Ravishankar,
Vassilis J. Tsotras,
"Assembly Queries: Planning and Discovering Assemblies of Moving Objects
Using Partial Information", ACM SIGSPATIAL GIS Conference, Redondo Beach,
November 2017, to appear.
 Elena V. Strzheletska, Vassilis J. Tsotras, ``RICC: Fast
Reachability Query Processing on Large Spatiotemporal
Datasets", SSTD 2015, pp 3-21, link.
 Shiwen Cheng, James
Fang, Vagelis Hristidis,
Harsha V. Madhyastha, Niluthpol
Chowdhury Mithun, Dorian
Perkins, Amit K. Roy-Chowdhury, Moloud Shahbazi, Vassilis J. Tsotras, ``OSNI: Searching for Needles in a Haystack of
Social Network Data", EDBT 2016, pp: 616-619, link.
 Mahbub Hasan, Pritom Ahmed, Abhijith Kashyap, Vagelis Hristidis, Vassilis J. Tsotras, ``Efficient
Computation of Top-k Frequent Terms over Spatial Ranges", Proceedings ACM
SIGMOD Conference, Chicago, May 2017, pages 1227-1241, link.
 Elena V. Strzheletska, Vassilis J. Tsotras, ``RICCmeet: Efficient Processing of Novel Reachability
Queries on Large Spatiotemporal Datasets", ACM SIGSPATIAL GIS Conference,
Redondo Beach, November 2017, to appear.
 Vasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, Jean Francois Roy,
``Massively Parallel Skyline Computation For Processing-In-Memory
Acknowledgement: This material is based upon work supported by the
National Science Foundation under Grant No. IIS-1527984
Disclaimer: Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation.