DTL Grants support research in tools, data, platforms and methodologies for shedding light on the use of personal data by online services, and to empower users to be in control of their personal data online.
The winners of the first DTL research grants are listed below. Each project received a lump sum of Euros 50,000. Click here for more information on the call for proposals.
The remaining proposals that made it to the top-third of all proposals have been awarded with a platform to present their work at the DTL2015 Conference, with a corresponding travel grant.
Providing Users Data-Driven Privacy Awareness
Lorrie Faith Cranor (Carnegie Mellon University),Blase Ur (Carnegie Mellon University)
Online behavioral advertising (OBA), the targeting of advertisements based on a user’s web browsing, remains a major source of privacy invasion. Although a number of privacy tools (e.g., Ghostery, Lightbeam, and Privacy Badger) can help users control OBA, average users are left utterly confused about OBA even after using such tools. We propose moving beyond existing tools, which alert users to tracking occurring at the current moment, by designing and testing a tool that takes a data-driven, personalized approach to privacy awareness. We hypothesize that users can better understand OBA and resultant privacy threats if equipped with a tool that visualizes instances of them being tracked over time.
We will build and test such a data-driven privacy tool that enables users to explore on precisely which webpages different companies have tracked them, as well as what those companies may have inferred about their interests. Studies have shown benefits in notifying users about the collection of data by smartphone apps. Our proposal translates these insights to the OBA domain, yet makes further intellectual contributions by exploring the impact of presenting different abstractions and granularities of the information tracked (e.g., showing “Doubleclick knows you visited the following 82 pages” versus “Doubleclick has likely concluded that you like ‘European travel’ based on your visits to these 82 pages”). In addition to releasing our privacy tool as a fully functional, open-source project, we will conduct a 75-participant, 2-week field trial comparing visualizations of personalized tracking data.
Revealing and Controlling Mobile Privacy Leaks
David Choffnes (Northeastern University), Christo Wilson (Northeastern University), Alan Mislove (Northeastern University)
The combination of rich sensors and ubiquitous connectivity make mobile devices perfect vectors for invading the privacy of end users. We argue that improving privacy in this environment requires trusted third-party systems that enable auditing and control over PII leaks. However, previous attempts to address PII leaks fall short of enabling such auditing and control because they face challenges of a lack of visibility into network traffic generated by mobile devices and the inability to control the traffic.
The proposed research will enable the auditing and control of PII leaks in network traffic from mobile devices using indirection to improve visibility and control for PII leaks in mobile network traffic. Specifically, we use natively supported mobile OS features to redirect all of a device’s Internet traffic to a trusted server to identify and control privacy leaks in network traffic.
We will address the key challenges of how to identify and control PII leaks when users’ PII is not known a priori, nor is the set of apps that leak this information. First, to enable auditing through improved transparency, we will investigate how to use machine learning to reliably identify PII from network flows, and identify algorithms that incorporate user feedback to adapt to the changing landscape of privacy leaks. Second, we will build tools that allow users to control how their information is (or not) shared with second and third parties. These tools will be deployed as free, open-source applications that can run in a number of deployment scenarios, including on a device in a user’s home network, or in a shared cloud-based VM environment.
FDVT: Personal Data Valuation Tool for Facebook Users
Angel Cuevas (Universidad Carlos III de Madrid, co-PI), Ruben Cuevas (Universidad Carlos III de Madrid, co-PI) Raquel Aparicio (Universidad Carlos III de Madrid)
A recent report of the Interactive Advertising Bureau revealed that online advertising generated in 2014 $49.5B worth of revenue only in US, representing an increase of 16% with respect to 2013, which in turn exceeded 17% the revenue of 2012. A great advantage of online advertising over more traditional printed and TV advertising is its capability to target individuals with specialized advertisements tailored to their personal information. For instance the ad campaign planner from Facebook (FB) allows defining an audience using more than 13 different attributes related to personal information of the end-user. Therefore, an online advertiser can launch a campaign targeting a well-defined audience based on personal information attributes, thus an important part of the FB business model is built up on top of the personal information of its subscribers. Although there are no doubts of the legality of the business model implemented by FB and other major players in the Internet, there are some actors raising the request of generating tools that let end-users knowing what is the actual value of their personal information. In other words, how much money FB, Google, and other companies in the on-line advertising market make out of my personal information. Providing Internet users with simple and transparent tools that inform them of what is the value that their personal data generates is not only a civil society request, but a demand from governmental forces.
The goal of this project is to develop a tool that informs in real-time Internet end users regarding the economic value that the personal information associated to their browsing activity has generated. Due to the complexity of the problem we narrow down the scope of this tool to FB in this project, i.e., inform FB users in real time of the value that they are generating to FB. We refer to this tool as FB Data Valuation Tool (FDVT).
Digital Halo: Browsing History Awareness
Arkadiusz Stopczynski (Department of Applied Mathematics and Computer Science, Technical University of Denmark), Mieszko Piotr Manijak (Department of Applied Mathematics and Computer Science, Technical University of Denmark), Piotr Sapiezynski (Department of Applied Mathematics and Computer Science, Technical University of Denmark), Sune Lehmann (Department of Applied Mathematics and Computer Science, Technical University of Denmark)
Our online browsing history is intensely personal. Our search terms and the web-pages we visit, reveal our fears, interests, illnesses, and secret ambitions. While many people are familiar with the concept of behavior-tracking and cookies, there is significantly less public awareness of just how personal our online behavior is.
A few years ago, the immersion project originating at the MIT Media Lab received world-wide press coverage by visualizing the latent social information contained in our email header information. We aim to do something similar for web-browsing. Using topic models, we aim to design a simple dashboard that allows individuals to visualize the content of their browsing, and observe how these topics change over time. Crucially, we will combine this visualization with information on data trackers (how many tracking parties, how much outgoing information), thus allowing users to directly observe what the data tracking means for them.
Collected, as well as computed data, will be stored in safe, individualized ‘vaults’ in a storage system following the OpenPDS framework specification. Ipso facto ensuring strict sovereignty of users over their data.
Characterizing tradeoff between privacy and function
Yogesh Mundada (Princeton University), Nick Feamster (Princeton University), Sarthak Grover (Princeton University)
Personal information loss has been a worrisome issue for both researchers as well as regular users. Even though a lot of research has been done in security as well as privacy community, a personalized solution, addressing problems in both areas, which is useful to end users is missing. In this work, we present Appu, a browser extension, that automatically detects i) sensitive information of the user, ii) whether it is sufficiently secured, and iii) if it is getting leaked to third party domains.
To automatically detect users sensitive information, we developed scripting language to scrape this information from userâs existing accounts. Once the personal information store is populated with this information, Appu monitors userâs interaction with various accounts passively to detect further information spread. Appu also monitors whether any personal information is leaked to third parties. Over time, Appu presents the user a complete picture of personal information spread across the web. Appu also nudges the user to secure important but inadequately protected accounts.
Reverse-engineering online tracking: From niche research field to easy-to-use tool
Arvind Narayanan (Princeton University), Steven Englehardt (Princeton University)
The third-party online tracking ecosystem lacks transparency about (1) which companies track users, (2) what user data is being collected, (3) what technologies are being used for tracking, and (4) data flows between trackers. Automated measurement can enable transparency and has already resulted in greater privacy awareness, improved privacy tools, and, at times, regulatory enforcement actions.
At Princeton we have built OpenWPM, a platform for online tracking transparency. We have used it in several published studies to detect and reverse-engineer online tracking. We now aim to democratize web privacy measurement: transform it from a niche research field to a widely available tool. We will do this in two steps: use OpenWPM to publish a “web privacy census” — a monthly web-scale measurement of tracking and privacy, comprising 1 million sites. The census will detect and measure many or most of the types of known privacy violations reported by researchers so far: circumvention of cookie blocking, leakage of PII to third parties, canvas fingerprinting, and more. Second, we will build an analysis platform to allow anyone to analyze the census data with minimal expertise. The platform will have “1-click reproducibility'” which will allow packaging and distributing study data, scripts, and results in a format that’s easy to replicate and extend.