Workshop at AAAI 2022
February 28th, 2022
Though machine learning (ML) approaches have demonstrated impressive performance on various applications and made significant progress for the development of artificial intelligence (AI), the potential vulnerabilities of ML models to malicious attacks (e.g., adversarial/poisoning attacks) have raised severe concerns in safety-critical applications. For example, by adding small noise to an image, an ML model would misclassify it as another category.
Such adversarial examples can be generated in numerous domains, including image classification, object detection, speech recognition, natural language processing, graph representation learning, and self-driving cars in the physical world. Although it is not to be alarmist, researchers in ML and AI have a responsibility to preempt attacks and build safeguards especially when the task is critical for information security and human lives.
The adversarial ML techniques could also result in potential data privacy and ethical issues when deploying ML techniques in real-world applications. Counter-intuitive behaviors of ML models will largely affect the public trust on AI techniques, while a revolution of machine learning/deep learning methods may be an urgent need. This workshop aims to discuss important topics about adversarial ML, which can bridge academia with industry, algorithm design with policy making, short-term approaches with long-term solutions, to deepen our understanding of ML models in adversarial environments and build reliable ML systems in the real world.
Topics
We welcome submission from different aspects of adversarial ML, including but not limited to:Data-Centric Robust Learning on ML Models
Introduction
Current machine learning competitions mostly seek for a high-performance model given a fixed dataset, while recent Data-Centric AI Competition (https://https-deeplearning-ai.github.io/data-centric-comp/) changes the traditional format and aims to improve a dataset given a fixed model. Similarly, in the aspect of robust learning, many defensive methods have been proposed of deep learning models for mitigating the potential threat of adversarial examples, but most of them strive for a high-performance model in fixed constraints and datasets. Thus how to construct a dataset that is universal and effective for the training of robust models has not been extensively explored. To accelerate the research on data-centric techniques of adversarial robustness in image classification, we organize this competition with the purpose of developing novel data-centric algorithms, such as data augmentation, label refinement, crafting adversarial data, even designing knowledge fusion algorithms from other datasets. The participants are encouraged to freely develop novel ideas to find effective data-centric techniques to promote to train robust ML models.
Models and Datasets
This competition consists of two stages.
Stage I: we choose 2 baseline networks on CIFAR-10. These models come from:
1) ResNet50 https://arxiv.org/abs/1512.03385 ----CIFAR-10
2) DenseNet121: https://arxiv.org/abs/1608.06993 ----CIFAR-10
We will use the data points and corresponding label vectors from the submissions to train the models, meanwhile, the participants have the ability to assign some training settings, including the optimizer, weight decay, learning rate, and training epochs. After the training phase, we use the private test set based on the CIFAR-10 to evaluate the submissions on the public leaderboard.
Stage II: the top-50 participants in Stage I will enter Stage II. In this stage, we will evaluate submissions on another private test set based on the CIFAR-10. The chosen models are also different from those in Stage I. Besides, in Stage II we only allow participants to adjust the training parameters, and the data points and corresponding label vectors will be fixed after Stage I.
Notely, in Stage I, a portion of data points in our private test set will come from the basic test set of CIFAR-10, but we do not recommend that participants try to incorporate the basic test set of CIFAR-10 or probe the contents of the private test set. We will change the test set in Stage II. Meanwhile, we will check the winners’ final program. Participants are encouraged to design general and effective data-centric techniques to improve the models’ performance.
We train the models based on the submissions and obtain the classification rate of (higher is better), which is computed by the following formula:
where M is the set of all trained models, X is the evaluation dataset.
Whenever multiple submissions obtain the same score, they will be compared by the number of data points (less is better).
Submission Format
Each submission is a zip archive of dataset, including no more than 50,000 (with the same training number on CIFAR-10) data points and corresponding label vector, and the training setting of every model, including the optimizer, weight decay, learning rate, and training epochs. More details will be announced soon.
Competition SiteA Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning, ICML 2020
Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges (AML-CV), CVPR 2021
Adversarial Robustness in the Real World, ICCV 2021