Are Adversarial Attacks a Viable Solution to Individual Privacy?
Users of online services today must trust platforms with their personal data. Platforms can choose to enable privacy by default through methods such as differential privacy but the incentives seem to be lacking and trust is still required by the end user. Is there a way individuals can modify their data in such a way to obfuscate information and prevent platforms from gleaning personal information they would like to keep private (all the while minimally changing the data itself)?
One intriguing class of techniques is adversarial machine learning. These provide methods of minimally modifying data in ways that can fool classification models.
Generative Adversarial Networks are state-of-the-art adversarial methods and are effective against even state-of-the-art models in controlled conditions, but they are fundamentally impractical for the average user. They require massive amounts of data and intimate knowledge of the architecture of the targeted models in order to be effective.
Our work has focused on closing this gap – we want to provide a method for users to obfuscate information from their data while:
1) making minimal assumptions about the target model,
2) requiring minimal data, and
3) being robust to changes in the targeted model.
What can we learn about the receptive fields of classification models from the small changes that fool them? Are we able to learn pragmatic rules about these models and make model-evasive guarantees with respect to what information can be learned? How well do these adversarial perturbations generalize across the wide variety of deep neural network architectures? To gain intuition about the problem, our focus has been on image data.