Mentions and Travel Grants
The following proposals are amongst the top-third of all proposals and were offered a platform to share their ideas and work with other members of the DTL community. They were offered a presentation slot at the forthcoming DTL workshop in November 2015 (location and details will be announced in due time) as well as a travel grant to attend it.
A Deep learning platform for the reverse-engineering of Behavioral Targeting procedures in online ad networks (DeepBET)
Sotirios Chatzis (Cyprus University of Technology), Aristodemos Paphitis (Cyprus University of Technology)
Online ad networks are a characteristic example of online services that massively leverage user data for the purposes of behavioral targeting. A significant problem of these technologies is their lack of transparency. For this reason, the problem of reverse-engineering the behavioral targeting mechanisms of ad networks has recently attracted significant research interest. Existing approaches query ad networks using artificial user profiles, each of which pertains to a single user category. Nevertheless, well-designed ad services may not rely on such simple user categorizations: A user assigned to multiple categories may be presented with a set of ads quite different from the union of the set of ads pertaining to each one of their individual interests. Even more importantly, user interests may change or vary over time. Nevertheless, none of the existing reverse-engineering systems are capable of determining whether and how ad network targeting mechanisms adapt to such temporal dynamics.
The goal of this proposal is to develop a platform addressing these inadequacies by leveraging advanced machine learning methods. The proposed platform is capable of: (i) Intelligently creating a diverse set of (interest-based) user profiles to query ad networks with. It ensures that the (artificial) user profiles used to query the analyzed ad networks correspond to as diverse a set of combinations of user interests (characteristics) as possible. (ii) Obviating the need to rely on some publicly available tree of categories/user interests, as this can be restrictive to the analysis or even misleading. Instead, our platform is capable of reliably producing a tree-like content-based grouping (clustering) of websites into interest groups, in a completely unsupervised manner. (iii) Performing inference of the correlations between user characteristics and ad network outputs in a way that allows for large scale generalization. (iv) Determining whether and how temporal dynamics affect these correlations, and on how long temporal horizons.
Alibi: Turning User Tracking Into a User Benefit
Marcel Flores, Andrew Kahn, Marc Warrior, Aleksandar Kuzmanovic (PI) (Northwestern University)
We propose Alibi, a system that enables users to take direct advantage of the work online trackers do to record and interpret their behavior. The key idea is to use the readily available personalized content, generated by online trackers in real-time, as a means to verify an online user in a seamless and privacy-preserving manner. We propose to utilize such tracker-generated personalized content, submitted directly by the user, to construct a multi-tracker user-vector representation and use it in various online verification scenarios. The main research objectives of this project are to explore the fundamental properties of such user-vector representations, i.e., their construction, uniqueness, persistency, resilience, utility in online verification, etc. The key goal of this project is to design, implement, and evaluate the Alibi service, and make it publicly available.
Towards Making Systems Forget
Yinzhi Cao (Lehigh University and Columbia University)
Today’s systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data’s lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly.
In this proposal, we focus on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations’ asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling.
Bringing Fairness and Transparency to Mobile On-Demand Services
Christo Wilson (Northeastern University), Dave Choffnes (Northeastern University), Alan Mislove (Northeastern University)
In this project, we aim to bring greater transparency to algorithmic pricing implemented by mobile, on-demand services. Algorithmic pricing was pioneered in this space by Uber in the form of “surge pricing”. While we applaud mobile, on-demand services for disrupting incumbents and stimulating moribund sectors of the economy, we also believe that the data and algorithms leveraged by these services should be transparent. Fundamentally, consumers and providers cannot make informed choices when marketplaces are opaque. Furthermore, black-box services are vulnerable to exploitation once their algorithms are understood, which creates opportunities for customers and providers to manipulate these services in ways that are not possible in transparent markets.
Providing Users With Feedback on Search Personalised Learning
Douglas Leith (Trinity College Dublin), Alessandro Checco (Trinity College Dublin)
Users are currently given only very limited feedback from search providers as to what learning and inference of personal preferences is taking place. When a search engine infers that a particular advertising category is likely to be of interest to a user, and so more likely to generate click through and sales, it will tend to use this information when selecting which adverts to display. This can be used to detect search engine learning via analysis of changes in the choice of displayed adverts and to inform the user of this learning. In this project we will develop a browser plugin that provides such feedback, essentially by empowering the user via the kind of data analytic techniques used by the search engines themselves.
Zero-Knowledge Transparency: Safe Audit Tools for End Users
Maksym Gabielkov (INRIA, Columbia University), Larissa Navarro Passos de Araujo (Columbia University), Max Tucker Da Silva (Columbia University), Augustin Chaintreau (Columbia University)
In principle, data transparency tools follow strict privacy guidelines to protect customers’ data while revealing how this data is being used by others. But those objectives are often at odds. To take a simple example, answering questions like which of my email caused this ad to appear brings user to the following dilemma: she can either enjoy (blindly) the (relative) privacy offered by a service like gmail, or if she decides to voice her concern, can alternatively propose her data to participate in a data-transparency experiment with various tools (e.g., Xray, AdFisher, Sunlight and other more specific ones). The later involves running the experiment herself entirely or providing the data in clear form to one of those tools run by a third party. Both increases privacy risks, because sensitive data are now being manipulated by other pieces of codes, sometimes under someone else’s control. That explains that all tools mentioned above, and in fact with almost no exception all transparency research so far is run and validated on synthetic data-sets that are by nature not sensitive.
Here, our goal is to formally define zero-knowledge transparency, to reconcile the two needs of being informed and being safe when it comes to our data usage, and experiment with tools that provide this dual protection. As in our prior research, we aim at generic tools, that address a broad range of scenarios with the same underlying concepts. The first architecture we propose leverages differential correlation, as used in Xray for multiple services, to show that this tool can be made privacy-preserving with an additional simple architectural layers. The second architecture we envision is way broader: it leverages data bank with interactive queries such as air-cloak to separately solve privacy and transparency. We believe that most data transparency tools will require a similar complement and experiment with the robustness of this solution in the face of scale and other challenges posed.
Privacy-aware ecosystem for data sharing
Anna Monreale (Department of Computer Science, University of Pisa)
Human and social data are an important source of knowledge useful for understanding human behaviour and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people’s activities described by these data may allow re-identification of individuals in a de-identified database and thus can potentially reveal intimate personal traits, such as religious or sexual preferences. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a tradeoff. This project proposes a framework to support the Data Provider in the privacy risk assessment of data to be shared. This framework measures both the empirical (not theoretical) privacy risk associated to users represented in the data and the data quality guaranteed only with users not at risk. It provides a mechanism allowing the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. The project will focus on mobility data studying the practical effectiveness of the framework over forms of mobility data required by specific knowledge-based services.
Exposing and Overcoming Privacy Leakage in Mobile Apps using Dynamic Profiles
Z. Morley Mao (University of Michigan)
Detecting Accidental and Intentional PII Leakage from Modern Web Applications
Nick Nikiforakis (Stony Brook University)
Despite the magnitude and the severity of the PII-leakage problem, there is, currently, a dearth of usable, privacy-enhancing technologies that detect and prevent PII leakage. To restore the control of users over their own personally identifiable information, we propose to design, implement, and evaluate LeakSentry, a browser extension that has the ability to identify leakage as that is happening and give users contextual information about the leakage as well as the power to allow it, or block it. Next to LeakSentry’s stand-alone mode, users of LeakSentry will be able to opt-in to a crowd-wisdom program where they can learn from each other’s choices. In addition, LeakSentry will have the ability to report the location of PII leakage, enabling us to create a PII-leaking page observatory, which can both apply pressure to the websites that were caught red-handed, as well as navigate other users away from them.
Towards Transparent Privacy Practices: Facilitating Comparisons of Privacy Policies
Ali Sunyaev (Department of Information Systems, University of CologneUniversity of Cologne), Tobias Dehling (Department of Information Systems, University of Cologne)
Improving the Comprehension of Browser Privacy Modes
Sascha Fahl (DCSec, Leibniz UniversitÃ¤t Hannover), Yasemin Acar (DCSec, Leibniz UniversitÃ¤t Hannover), Matthew Smith (Rheinische Friedrich-Wilhelms-UniversitÃ¤t Bonn)
Online privacy is an important, hotly researched and demanded topic that gained even more relevance recently. However, existing mechanisms that protect usersÃ¢Â€Â™ privacy online, such as TOR and using VPN connections are complex, bring performance issues with them and, in case of the latter, add costs. Therefore, their widespread use is not applicable for the public. Browser vendors have recently established so-called private browsing modes that are largely misunderstood by users: They over-rate the level of protection offered by the services, which can lead to insecure behaviour. We aim to study user misconceptions, enhance their comprehension and scientifically evaluate the usability and applicability of more privacy-enhancing services such as TOR.
PRIVA-SEE: PRIVacy Aware visual SEnsitivity Evaluator
Bruno Lepri (Fondazione Bruno Kessler), Elisa Ricci (Fondazione Bruno Kessler), Lorenzo Porzi (Fondazione Bruno Kessler)
Digitally sharing our lives with others is a captivating and often addictive activity. Nowadays 1.8 billion photos are shared daily on social media. These images hold a wealth of personal information, ripe for exploitation by tailored advertising business models, but placed in the wrong hands this data can lead to disaster. In this project, we want to see how the increasing of a person’s awareness about potential personal data sensitivity issues influences their decisions about what and how to share, and moreover, how valuable they perceive their personal data to be. To achieve this ambitious goal we aim to
(i) develop a novel methodology, applied within a mobile app, to inform users about the potential sensitivity of their images. Sensitivity will be modeled by exploiting automatic inferences coming from advanced computer vision and deep learning algorithms applied to personal photos and associated metadata;
(ii) perform user-centric studies within a living-lab environment to assess how users’ posting behaviours and monetary valuation of mobile personal data are influenced by user awareness about content sharing risks.
Bringing Transparency to Targeted Advertising
Patrick Loiseau (EURECOM), Oana Goga (MPI-SWS)
Targeted advertising largely contributes to the support of free web services. However, it is also increasingly raising concerns from users, mainly due to its lack of transparency. The objective of this proposal is to increase the transparency of targeted advertising from the userâ€™s point of view by providing users with a tool to understand why they are targeted with a particular ad and to infer what information the ad engines possibly have about them. Concretely, we propose to build a browser plugin that collects the ads shown to a user and provides her with analytics about these ads.
Exploring Personal Data on the Databox
Hamed Haddadi (QMUL)
We are in a personal data gold rush driven by advertising being the primary revenue source for most online companies. These companies accumulate extensive personal data about individuals with minimal concern for us, the subjects of this process. This can cause many harms: privacy infringement, personal and professional embarrassment, restricted access to labour markets, restricted access to best value pricing, and many others. There is a critical need to provide technologies that enable alternative practices, so that individuals can participate in the collection, management and consumption of their personal data.We are developing the Databox, a personal networked device (and associated services) that collates and mediates access to personal data, allowing us to recover control of our online lives. We hope the Databox is a first step to re-balancing power between us, the data subjects, and the corporations that collect and use our data.