Publications | Umar Farooq

2023

SIGIR’23
MobileRec: A Large Scale Dataset for Mobile Apps Recommendation

MH Maqbool, Umar Farooq, Adib Mosharrof, AB Siddique, and Hassan Foroosh

In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Abs Bib HTML PDF

Recommender systems have become ubiquitous in our digital lives, from recommending products on e-commerce websites to suggesting movies and music on streaming platforms. Existing recommendation datasets, such as Amazon Product Reviews and MovieLens, greatly facilitated the research and development of recommender systems in their respective domains. While the number of mobile users and applications (aka apps) has increased exponentially over the past decade, research in mobile app recommender systems has been significantly constrained, primarily due to the lack of high-quality benchmark datasets, as opposed to recommendations for products, movies, and news. To facilitate research for app recommendation systems, we introduce a large-scale dataset, called MobileRec. We constructed MobileRec from users’ activity on the Google play store. MobileRec contains 19.3 million user interactions (i.e., user reviews on apps) with over 10K unique apps across 48 categories. MobileRec records the sequential activity of a total of 0.7 million distinct users. Each of these users has interacted with no fewer than five distinct apps, which stands in contrast to previous datasets on mobile apps that recorded only a single interaction per user. Furthermore, MobileRec presents users’ ratings as well as sentiments on installed apps, and each app contains rich metadata such as app name, category, description, and overall rating, among others. We demonstrate that MobileRec can serve as an excellent testbed for app recommendation through a comparative study of several state-of-the-art recommendation approaches. The quantitative results can act as a baseline for other researchers to compare their results against. The MobileRec dataset is available at https://huggingface.co/datasets/recmeapp/mobilerec.
@inproceedings{sigir2023, author = {Maqbool, MH and Farooq, Umar and Mosharrof, Adib and Siddique, AB and Foroosh, Hassan}, title = {MobileRec: A Large Scale Dataset for Mobile Apps Recommendation}, year = {2023}, isbn = {9781450394086}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages = {3007–3016}, numpages = {10}, keywords = {googleplay dataset, recommendation, app recommendation}, location = {Taipei, Taiwan}, series = {SIGIR '23}, doi = {10.1145/3539618.3591906} }
AST’23
Detecting Potential User-data Save & Export Losses due to Android App Termination

Sydur Rahaman, Umar Farooq, Zhijia Zhao, and Iulian Neamtiu

In 2023 IEEE/ACM International Conference on Automation of Software Test (AST), May 2023

Abs Bib HTML PDF

A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.
@inproceedings{ast2023, author = {Rahaman, Sydur and Farooq, Umar and Zhao, Zhijia and Neamtiu, Iulian}, title = {Detecting Potential User-data Save & Export Losses due to Android App Termination}, booktitle = {2023 IEEE/ACM International Conference on Automation of Software Test (AST)}, series = {AST 2023}, doi = {10.1109/AST58925.2023.00019}, issn = {2833-9061}, month = may, year = {2023}, pages = {152-162} }
CC’23
Linker Code Size Optimization for Native Mobile Applications

Gai Liu, Umar Farooq, Chengyan Zhao, Xia Liu, and Nian Sun

In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, May 2023

Abs Bib HTML PDF

Modern mobile applications have grown rapidly in binary size, which restricts user growth and hinders updates for existing users. Thus, reducing the binary size is important for application developers. Recent studies have shown the possibility of using link-time code size optimizations by re-invoking certain compiler optimizations on the linked intermediate representation of the program. However, such methods often incur significant build time overhead and require intrusive changes to the existing build pipeline. In this paper, we propose several novel optimization techniques that do not require significant customization to the build pipeline and reduce binary size with low build time overhead. As opposed to re-invoking the compiler during link time, we perform true linker optimization directly as optimization passes within the linker. This enables more optimization opportunities such as pre-compiled libraries that prior work often could not optimize. We evaluate our techniques on several commercial iOS applications including NewsFeedApp, ShortVideoApp, and CollaborationSuiteApp, each with hundreds of millions of daily active users. Our techniques on average achieve 18.4% binary size reduction across the three commercial applications without any user-perceivable performance degradations.
@inproceedings{10.1145/3578360.3580256, author = {Liu, Gai and Farooq, Umar and Zhao, Chengyan and Liu, Xia and Sun, Nian}, title = {Linker Code Size Optimization for Native Mobile Applications}, year = {2023}, isbn = {9798400700880}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3578360.3580256}, doi = {10.1145/3578360.3580256}, booktitle = {Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction}, pages = {168–179}, numpages = {12}, keywords = {Code Size Optimization, Static Analysis, iOS}, location = {Montr\'{e}al, QC, Canada}, series = {CC 2023} }

2022

BigData’22
Proactive Prioritization of App Issues via Contrastive Learning

Moghis Fereidouni, Adib Mosharrof, Umar Farooq, and A.B. Siddique

In 2022 IEEE International Conference on Big Data (Big Data), Dec 2022

Abs Bib HTML PDF

Mobile app stores produce a tremendous amount of data in the form of user reviews, which is a huge source of user requirements and sentiments; such reviews allow app developers to proactively address issues in their apps. However, only a small number of reviews capture common issues and sentiments which creates a need for automatically identifying prominent reviews. Unfortunately, most existing work in text ranking and popularity prediction focuses on social contexts where other signals are available, which renders such works ineffective in the context of app reviews. In this work, we propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews (ones predicted to receive a large number of votes in a given time window). Predicting highly-voted reviews is challenging given that, unlike social posts, social network features of users are not available. Moreover, there is an issue of class imbalance, since a large number of user reviews receive little to no votes. PPrior employs a pre-trained T5 model and works in three phases. Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion. In phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews. Phase three uses radius neighbors classifier t o m ake t he final predictions. This phase also uses FAISS index for scalability and efficient search. To conduct extensive experiments, we acquired a large dataset of over 2.1 million user reviews from Google Play. Our experimental results demonstrate the effectiveness of the proposed framework when compared against several state-of-the-art approaches. Moreover, the accuracy of PPrior in predicting prominent reviews is comparable to that of experienced app developers.
@inproceedings{10020586, author = {Fereidouni, Moghis and Mosharrof, Adib and Farooq, Umar and Siddique, A.B.}, booktitle = {2022 IEEE International Conference on Big Data (Big Data)}, title = {Proactive Prioritization of App Issues via Contrastive Learning}, year = {2022}, month = dec, pages = {535-544}, doi = {10.1109/BigData55660.2022.10020586} }

2020

BigData’20
App-Aware Response Synthesis for User Reviews

Umar Farooq, A. B. Siddique, Fuad Jamour, Zhijia Zhao, and Vagelis Hristidis

In 2020 IEEE International Conference on Big Data (Big Data), Dec 2020

Abs Bib HTML PDF

Hundreds of thousands of mobile app users post their reviews online. Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation by training a seq2seq model with a large collection of review-response pairs. However, because the training review-response pairs are aggregated from many different apps, it remains challenging for such models to generate app-specific responses, which, on the other hand, are often desirable as appwes have different features and concerns. Solving the challenge by simply building an app-specific generative model per app (i.e., training the model with review-response pairs of a single app) may be insufficient because individual apps have limited review-response pairs, and such pairs typically lack the relevant information needed to respond to a new review.To enable app-specific response generation, this work proposes AARSYNTH: an app-aware response synthesis system. The key idea behind AARSYNTH is to augment the seq2seq model with information specific to a given app. Given a new user review, AARSYNTH first retrieves the top-K most relevant app reviews and the most relevant snippet from the app description. The retrieved information and the new user review are then fed into a fused machine learning model that integrates the seq2seq model with a machine reading comprehension model. The latter helps digest the retrieved reviews and app description. Finally, the fused model generates a response that is customized to the given app. We evaluated AARSYNTH using a large corpus of reviews and responses from Google Play. The results show that AARSYNTH outperforms the state-of-the-art system by 22.2% on BLEU-4 score. Furthermore, our human study shows that AARSYNTH produces a statistically significant improvement in response quality compared to the state-of-the-art system.
@inproceedings{9377983, author = {Farooq, Umar and Siddique, A. B. and Jamour, Fuad and Zhao, Zhijia and Hristidis, Vagelis}, booktitle = {2020 IEEE International Conference on Big Data (Big Data)}, title = {App-Aware Response Synthesis for User Reviews}, year = {2020}, month = dec, pages = {699-708}, doi = {10.1109/BigData50022.2020.9377983} }
OOPSLA’20
LiveDroid: Identifying and Preserving Mobile App State in Volatile Runtime Environments

Umar Farooq, Zhijia Zhao, Manu Sridharan, and Iulian Neamtiu

In Proc. ACM Program. Lang., Nov 2020

Abs Bib HTML PDF

Mobile operating systems, especially Android, expose apps to a volatile runtime environment. The app state that reflects past user interaction and system environment updates (e.g., battery status changes) can be destroyed implicitly, in response to runtime configuration changes (e.g., screen rotations) or memory pressure. Developers are therefore responsible for identifying app state affected by volatility and preserving it across app lifecycles. When handled inappropriately, the app may lose state or end up in an inconsistent state after a runtime configuration change or when users return to the app. To free developers from this tedious and error-prone task, we propose a systematic solution, LiveDroid, which precisely identifies the necessary part of the app state that needs to be preserved across app lifecycles, and automatically saves and restores it. LiveDroid consists of: (i) a static analyzer that reasons about app source code and resource files to pinpoint the program variables and GUI properties that represent the necessary app state, and (ii) a runtime system that manages the state saving and recovering. We implemented LiveDroid as a plugin in Android Studio and a patching tool for APKs. Our evaluation shows that LiveDroid can be successfully applied to 966 Android apps. A focused study with 36 Android apps shows that LiveDroid identifies app state much more precisely than an existing solution that includes all mutable program variables but ignores GUI properties. As a result, on average, LiveDroid is able to reduce the costs of state saving and restoring by 16.6X (1.7X - 141.1X) and 9.5X (1.1X - 43.8X), respectively. Furthermore, compared with the manual state handling performed by developers, our analysis reveals a set of 46 issues due to incomplete state saving/restoring, all of which can be successfully eliminated by LiveDroid.
@inproceedings{10.1145/3428228, author = {Farooq, Umar and Zhao, Zhijia and Sridharan, Manu and Neamtiu, Iulian}, title = {LiveDroid: Identifying and Preserving Mobile App State in Volatile Runtime Environments}, year = {2020}, issue_date = {November 2020}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {4}, number = {OOPSLA}, url = {https://doi.org/10.1145/3428228}, doi = {10.1145/3428228}, booktitle = {Proc. ACM Program. Lang.}, month = nov, articleno = {160}, numpages = {30} }

2019

ASPLOS’19
Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors - A Compilation-Based Approach

Lin Jiang, Xiaofan Sun, Umar Farooq, and Zhijia Zhao

In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Nov 2019

Abs Bib HTML PDF

JSON (JavaScript Object Notation) and its derivatives are essential in the modern computing infrastructure. However, existing software often fails to process such types of data in a scalable way, mainly for two reasons: (i) the processing often requires to build a memory-consuming parse tree; (ii) there exist inherent dependences in processing the data stream, preventing any data-level parallelization. Facing the challenges, developers often have to construct ad-hoc pre-parsers to split the data stream in order to reduce the memory consumption and increase the data parallelism. However, this strategy requires more programming efforts. Moreover, the pre-parsing itself is non-trivial to parallelize, thus introducing a new serial bottleneck. To solve the dilemma, this work introduces a scalable yet fully automatic solution - a compilation system, namely JPStream, that compiles standard JSONPath queries into parallel executables with bounded memory footprints. First, JPStream adopts a stream processing design that combines the querying and parsing into one pass, without generating any in-memory parse tree. To achieve this, JPStream uses a novel joint compilation technique that compiles the queries and the JSON syntax together into a single automaton. Furthermore, JPStream leverages the "enumerability” of automaton to break the dependences and reason about the transition rules to prune infeasible states. It also features a runtime that learns structural constraints from the input to enhance the pruning. Evaluation on real-world JSON datasets with standard JSONPath queries shows that JPStream can reduce the memory consumption significantly, by up to 95%, meanwhile achieving near-linear speedup on multicore and manycore processors.
@inproceedings{10.1145/3297858.3304008, author = {Jiang, Lin and Sun, Xiaofan and Farooq, Umar and Zhao, Zhijia}, title = {Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors - A Compilation-Based Approach}, year = {2019}, isbn = {9781450362405}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3297858.3304008}, doi = {10.1145/3297858.3304008}, booktitle = {Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems}, pages = {79–92}, numpages = {14}, location = {Providence, RI, USA}, series = {ASPLOS '19} }
GetMobile
RuntimeDroid: Restarting-Free Runtime Change Handling for Android Apps

Umar Farooq, and Zhijia Zhao

GetMobile: Mobile Comp. and Comm., May 2019

Abs Bib HTML PDF

Portable devices, like smartphones and tablets, are often subject to higher frequency of configuration changes, such as screen orientation changes, screen resizing, keyboard attachments, and language switching. Since the changes can happen at runtime while users interact with the devices, they are referred to as runtime changes. Recent studies have shown that runtime changes happen regularly as users operate their apps. For example, on average, users change the orientation of their devices every five minutes accumulatively over sessions of the same app [1]. For multilingual or tablet users, changing the language setting or attaching an external keyboard is often desired [2,3]. As newer versions of Android systems with multiwindow supports are adopted, it is projected that runtime changes will happen more frequently. Each time a user drags the boundary between two split windows, a runtime change would be triggered [4].
@article{10.1145/3325867.3325879, author = {Farooq, Umar and Zhao, Zhijia}, title = {RuntimeDroid: Restarting-Free Runtime Change Handling for Android Apps}, year = {2019}, issue_date = {December 2018}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {22}, number = {4}, issn = {2375-0529}, url = {https://doi.org/10.1145/3325867.3325879}, doi = {10.1145/3325867.3325879}, journal = {GetMobile: Mobile Comp. and Comm.}, month = may, pages = {25–29}, numpages = {5} }

2018

MobiSys’18
RuntimeDroid: Restarting-Free Runtime Change Handling for Android Apps

Umar Farooq, and Zhijia Zhao

In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, May 2018

Abs Bib HTML PDF

Portable devices, like smartphones and tablets, are often subject to runtime configuration changes, such as screen orientation changes, screen resizing, keyboard attachments, and language switching. When handled improperly, such simple changes can cause serious runtime issues, from data loss to app crashes.This work presents, to our best knowledge, the first formative study on runtime change handling with 3,567 Android apps. The study not only reveals the current landscape of runtime change handling, but also points out a common cause of various runtime change issues – activity restarting. On one hand, the restarting facilitates the resource reloading for the new configuration. On the other hand, it may slow down the app, and more critically, it requires developers to manually preserve a set of data in order to recover the user interaction state after restarting.Based on the findings of this study, this work further introduces a re starting-free runtime change handling solution – RuntimeDroid. RuntimeDroid can completely avoid the activity restarting, at the same time, ensure proper resource updating with user input data preserved. These are achieved with two key components: an online resource loading module, called HotR and a novel UI components migration technique. The former enables proper resources loading while the activity is still live. The latter ensures that prior user changes are carefully preserved during runtime changes.For practical use, this work proposes two implementations of RuntimeDroid: an IDE plugin and an auto-patching tool. The former allows developers to easily adopt restarting-free runtime change handling during the app developing; The latter can patch released app packages without source code. Finally, evaluation with a set of 72 apps shows that RuntimeDroid successfully fixed all the 197 reported runtime change issues, meanwhile reducing the runtime change handling delays by 9.5X on average.
@inproceedings{10.1145/3210240.3210327, author = {Farooq, Umar and Zhao, Zhijia}, title = {RuntimeDroid: Restarting-Free Runtime Change Handling for Android Apps}, year = {2018}, isbn = {9781450357203}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3210240.3210327}, doi = {10.1145/3210240.3210327}, booktitle = {Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services}, pages = {110–122}, numpages = {13}, location = {Munich, Germany}, series = {MobiSys '18} }