Greater New England Research Interest Group Meeting [June 2021]
Date: June 01, 2021
Project Update #1
Are Adversarial Attacks a Viable Solution to Individual Privacy? – James Kunstle (firstname.lastname@example.org), Boston University undergraduate and Research Assistant, BU Hariri Institute. Lance Galletti, BU Advisor.
Show the abstract
Users of online services today must trust platforms with their personal data. Platforms can choose to enable privacy by default through methods such as differential privacy but the incentives seem to be lacking and trust is still required by the end user. Is there a way individuals can modify their data in such a way to obfuscate information and prevent platforms from gleaning personal information they would like to keep private (all the while minimally changing the data itself)?
One intriguing class of techniques is adversarial machine learning. These provide methods of minimally modifying data in ways that can fool classification models.
Generative Adversarial Networks are state-of-the-art adversarial methods and are effective against even state-of-the-art models in controlled conditions, but they are fundamentally impractical for the average user. They require massive amounts of data and intimate knowledge of the architecture of the targeted models in order to be effective.
Our work has focused on closing this gap – we want to provide a method for users to obfuscate information from their data while: 1) making minimal assumptions about the target model, 2) requiring minimal data, and 3) being robust to changes in the targeted model. What can we learn about the receptive fields of classification models from the small changes that fool them? Are we able to learn pragmatic rules about these models and make model-evasive guarantees with respect to what information can be learned? How well do these adversarial perturbations generalize across the wide variety of deep neural network architectures? To gain intuition about the problem, our focus has been on image data.
Project Update #2
Does Efficient, Private, Agnostic Learning Imply Efficeint, Agnostic Online Learning? – Lucas Ou (email@example.com), Boston University undergraduate. Lance Galletti, BU Advisor
Show the abstract
Users of online services today must trust platforms with their personal data. Platforms can choose to enable privacy by default through methods such as differential privacy but the correctness and efficiency incentives are lacking.
Traditionally, models are trained offline in batches of data and re-deployed, but with the growth of IoT, and the growing need to compute at the edge, the online model which incorporates new data points as soon as they are observed is becoming increasingly appealing.
The long-term goal for this research is to either show there is an efficient equivalence between PAC learning and private online learning or to create a framework for privacy that ensures this equivalence – thus motivating privacy by default.
This project has taken a look at recent work in the field of Computational Learning Theory. In 2019 it was shown that the problem of privately learning a class of hypotheses was equivalent to the problem of online learning it. It has also been shown that the equivalence is efficient if one assumes the pure private learner is highly sample efficient.
This means, at least in the pure learning case, that there is no reason not to privately learn a hypothesis if you can online-learn it.
However, since there may not exist a hypothesis that perfectly labels the data, the agnostic setting is a much more realistic framework to operate in. It is still an open problem as to whether the same process, by which we can reduce an efficient, private PAC learner into an efficient online learner, works in the agnostic setting. Further study into this qualitative relationship lays the groundwork to allow the knowledge from the field of online learning to design better differentially private learning algorithms.
We made headway into resolving this open question by attempting to derive an efficient black-box reduction from agnostic, private PAC learning to online learning.
Project Update #3
Near-Memory Data Reorganization Engine for Data Table Access. – Shahin Roozkhosh, PhD student at Boston University; Prof. Renato Mancuso, BU Advisor.
Show the abstract
Data organization is often a crucial choice to be made when designing memory-intensive applications. On the other hand, it is often the case that not enough information is available at the design time to decide how large-footprint data objects should be stored in memory. A typical example is modern database engines. A relational data table is a 2D array of data structured in rows and columns. One can store the table by rows (row-store) which is beneficial when transactional queries that access entire rows are performed. Unfortunately, this is far from ideal in the presence of queries to perform analytics on a few columns across all the rows. In the latter case, a column-store is to be preferred.
Can we have it both ways? In other words, can the same exact memory object appear to the processor as contemporarily organized as a row-store and a column-store? In this project, we are working towards answering “yes” to this question. We do so by proposing a new paradigm of hardware/software co-design. We rely on the ability to intercept main memory requests originated by a traditional processor. This is possible by interposing a block of programmable logic (FPGA) between the CPUs and the memory subsystem. Doing so enables re-defining the very semantics of memory accesses. Therefore, memory organized as a row-store can be made addressable as if it was a column-store. Intuitively, this capability has profound implications beyond database systems, for any data-intensive workloads manipulating complex multi-dimensional objects.
In this talk, we will present the first prototype of a data reorganization engine that is designed to operate according to the aforementioned principle. Once the engine is provided with application-level knowledge of the geometry of the table of interest and of the subset of data items of interest. It then (1) intercepts CPU-originated memory requests in the form of cache line refills and (2) orchestrates DRAM-side data fetching so that (3) only compacted data items are propagated up the memory hierarchy. We have performed a full preliminary implementation and initial evaluation on a CPU+FPGA embedded system.