Projects:2019s1-117 Adversarial Machine Learning

From Projects
Revision as of 18:39, 9 May 2019 by A1668773 (talk | contribs)
Jump to: navigation, search

Project Team

Students

Samuel Henderson

Brian Du

Supervisors

Dr Matthew Sorell

David Hubczenko

Tamas Abraham

Project

Introduction

Social media has profoundly affected the way we acquire and process information. It has been reported that eight in ten Australians use social media [1] and furthermore, that 52% of social media users utilise it to keep up to date with the news, with 17% reporting that it is used as their primary source of obtaining information [2]. Twitter is a popular social media platform that is particularly effective at distributing information.

Twitter’s Application Programming Interface (API) enables external software to meld with the site and supports users in building bots. Researchers have found that as many as 15% of active Twitter accounts are bots, with bot activity accounting for 50% of the site's traffic [3]. Over the past 10 years there has been an explosion of malicious social bots [4]. Social bots are social media accounts that automatically produce content and interact with humans, often trying to emulate or alter their behaviour [5]. Social bots can attempt to influence people through the spreading and amplification of misinformation.  An example of this was the spreading of misinformation online during the 2016 US election [6]. A recent study by Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to spread 31% of the low-credibility information on the network [7].

With the continual increased uptake and usage of social media, it is concerning to consider the impact of these social bots, given their ability to spread and amplify misinformation. Researchers have sought to address this by using machine learning algorithms to detect social bots on social media. On Twitter, the current state-of-the-art classifier is Botometer (formerly known as BotOrNot) [3]. Current classification algorithms have followed a reactive schema, where detection techniques are based on collected evidence of existing bots. Adversaries therefore only have to modify the characteristics of their bots to evade detection. This leaves researchers always one step behind in a virtual arms race.

There has been an increased interest in the machine learning community in the vulnerabilities in machine learning models (referred to as adversarial machine learning) [8,9]. In this study we propose to employ adversarial machine learning techniques to study how an adversary may evade Twitter bot detection classifiers. In our research we will attack Botometer using a black-box approach (one in which the details of the machine learning model are not available). The purpose of our research is to highlight the vulnerabilities in the existing Twitter bot detection tools and to encourage the further development of these tools with adversarial machine learning concepts taken into account.

Objectives

The main objectives of this research project are to:  Test the limits and vulnerabilities of a current, state-of-the-art Twitter bot classifier in an adversarial setting.  Engineer adversarial examples and perform a practical black-box attack against the Twitter bot machine learning algorithm.  Suggest a defensive framework to improve the robustness of these classifier models.

Background

Botometer

Botometer is the state-of-the-art in Twitter bot detection research. The tool generates more than 1,000 features from Twitter accounts using meta-data and information extracted from interaction patterns and content [3]. These features are then grouped and leveraged to train several different classifiers (one for each group and one for the overall score) using a Random Forest algorithm. These classifiers each output a score. Rather than use the raw score, the Botometer team developed a Complete Automation Probability (CAP) score to provide a better indication of whether an account is a bot or not.  A higher account CAP score indicated a higher likelihood that an account is automated. Researchers using Botometer are directed to choose their own threshold score to allow decisions to be made on false positive and false negative rates.

Adversarial Examples

Machine learning models are vulnerable to adversarial examples; malicious inputs designed to yield erroneous model outputs, while appearing unmodified to human observers [9]. These adversarial examples exploit the imperfections and approximations made by the learning algorithm during the training phase. This phenomenon is analogous to the concept of optical illusions to humans. Recent research has demonstrated adversarial examples can be easily crafted with knowledge of either the machine learning model or its training data [8]. A concerning property of adversarial examples from a cybersecurity perspective, is that it is possible to generate an adversarial example for any known machine learning model. [8] Another alarming property of adversarial examples is that if an adversarial example is successful on one machine learning model, it will likely be successful on others [10]. This property has been exploited to perform black-box attacks on machine learning models [9]. In a black box attack, the adversary constructs a substitute for the target model and generates adversarial instances against the substitute that can then be used to attack the target [11].

Preliminary and Expected Results

We use self-generated and publicly available data (such as Varol-2017 [3]) for our experiments, with a balanced representation of human and bot examples. The Varol 2017 dataset consists of Twitter account IDs along with a corresponding classification (bot or human). We have used these Twitter IDs as inputs into the Botometer python API [13] to obtain a set of output scores. We will choose the threshold score for Botometer so as to best match the labels of our dataset.

We have also identified suitable machine learning models that can be used to effectively mimic Botometer’s classification model. We expect to be able to use the previously obtained input-output pairs to train these. In order to obtain an acceptable substitute model, we will apply augmentation techniques to upscale the initial training data to increase the number of synthetic training points. We will then evaluate the similarity between the substitute model and Botometer.

With our obtained substitute models we expect to be able to craft adversarial examples using existing frameworks [12]. Once the adversarial examples are created, we will conduct a black-box attack against Botometer’s classifier.  We will evaluate our results to determine which features can be realistically manipulated and hence determine the feasibility of this type of attack in the wild. We hope to utilise our findings to provide suggestions on how we can improve the current and future defensive frameworks of machine learning algorithms.

References

[1] Yellow™, "Yellow Social Media Report 2018", Yellow, 2018.

[2] Park, S., Fisher, C., Fuller, G. & Lee, J.Y. (2018). Digital news report: Australia 2018. Canberra: News and Media Research Centre.

[3] Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017, May). Online human- bot interactions: Detection, estimation, and characterization. In Eleventh international AAAI conference on web and social media.

[4] Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. 2011. The socialbot network: when bots socialize for fame and money. In ACSAC: 27th Annual Computer Security Applications Conference.ACM, 93–102.

[5] Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104.

[6] Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36.

[7] Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature communications, 9(1), 4787.

[8] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.

[9] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.

[10] Papernot, N., McDaniel, P., & Goodfellow, I. (2016). Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277.

[11] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security(pp. 506-519). ACM.

[12] Kulynych, B., Hayes, J., Samarin, N., & Troncoso, C. (2018). Evading classifiers in discrete domains with provable optimality guarantees. arXiv preprint arXiv:1810.10939.

[13] Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences Steering Committee.