Difference between revisions of "Projects:2019s1-117 Adversarial Machine Learning"

From Projects
Jump to: navigation, search
 
Line 19: Line 19:
 
=== Introduction ===
 
=== Introduction ===
  
Social media has profoundly affected the way we acquire and process information. It has
+
Social media has profoundly affected the way we acquire and process information. It has been reported that eight in ten Australians use social media [1] and that 52% of social media users utilise it to keep up to date with the news [2]. Furthermore, 17% report that it is used as their primary source of obtaining information [2]. A popular social media platform that is particularly effective at distributing information is Twitter, which will be the focus of this study.
been reported that eight in ten Australians use social media [1] and furthermore, that 52% of
 
social media users utilise it to keep up to date with the news, with 17% reporting that it is
 
used as their primary source of obtaining information [2]. Twitter is a popular social media
 
platform that is particularly effective at distributing information.
 
  
Twitter’s Application Programming Interface (API) enables external software to meld with the
+
Twitter’s Application Programming Interface (API) enables external software to meld with the site and supports users in building bots. Social bots are social media accounts that automatically produce content and interact with humans [3]. Researchers have found that as many as 15% of active Twitter accounts are bots, with bot activity accounting for 50% of the site’s traffic [5]. Over the past ten years, there has been an explosion of social bots [3]. While not all are malicious, some social bots can attempt to influence people through the spreading and amplification of misinformation. An example of this was the spreading of misinformation online during the 2016 US election [6]. A recent study by Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to spread 31% of the low-credibility information on the network [7].  
site and supports users in building bots. Researchers have found that as many as 15% of
 
active Twitter accounts are bots, with bot activity accounting for 50% of the site's traffic [3].
 
Over the past 10 years there has been an explosion of malicious social bots [4]. Social bots
 
are social media accounts that automatically produce content and interact with humans,
 
often trying to emulate or alter their behaviour [5]. Social bots can attempt to influence
 
people through the spreading and amplification of misinformation.  An example of this was
 
the spreading of misinformation online during the 2016 US election [6]. A recent study by
 
Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to
 
spread 31% of the low-credibility information on the network [7].
 
  
With the continual increased uptake and usage of social media, it is concerning to consider
+
With the increased uptake and usage of social media, it is concerning to consider the impact of these social bots, given their ability to spread and amplify misinformation. Researchers have sought to address this by using machine learning algorithms to detect social bots on social media. For Twitter, the current state-of-the-art classifier is Botometer (formerly known as BotOrNot) [4]. Current classification algorithms have followed a reactive schema, where detection techniques are based on collected evidence of existing bots. Adversaries, therefore, only have to modify the characteristics of their bots to evade detection. This leaves researchers always one step behind in a virtual arms race.
the impact of these social bots, given their ability to spread and amplify misinformation.
+
 
Researchers have sought to address this by using machine learning algorithms to detect
+
There has been an increased interest in the artificial intelligence community in the vulnerabilities of machine learning models (referred to as adversarial machine learning) [8,9]. In this study, adversarial machine learning techniques will be employed to study how an adversary may evade Twitter bot detection classifiers. Real-world adversaries often have no knowledge about the machine learning models they are trying to attack. Since Botometer is accessed through a public API and the model has not been made available, the most practical way to attack is using a black-box approach [11]. This involves constructing substitute machine learning algorithms to mimic Botometer, from which adversarial examples can be crafted. The purpose of this research is to highlight the vulnerabilities in the existing Twitter bot detection tools and to encourage their further development with adversarial machine learning concepts taken into account.
social bots on social media. On Twitter, the current state-of-the-art classifier is Botometer
 
(formerly known as BotOrNot) [3]. Current classification algorithms have followed a reactive
 
schema, where detection techniques are based on collected evidence of existing bots.
 
Adversaries therefore only have to modify the characteristics of their bots to evade detection.
 
This leaves researchers always one step behind in a virtual arms race.
 
  
There has been an increased interest in the machine learning community in the
 
vulnerabilities in machine learning models (referred to as adversarial machine learning) [8,9].
 
In this study we propose to employ adversarial machine learning techniques to study how an
 
adversary may evade Twitter bot detection classifiers. In our research we will attack
 
Botometer using a black-box approach (one in which the details of the machine learning
 
model are not available). The purpose of our research is to highlight the vulnerabilities in the
 
existing Twitter bot detection tools and to encourage the further development of these tools
 
with adversarial machine learning concepts taken into account.
 
  
 
=== Objectives ===
 
=== Objectives ===
Line 67: Line 41:
 
'''Botometer'''
 
'''Botometer'''
  
Botometer is the state-of-the-art in Twitter bot detection research. The tool generates more
+
Botometer is state-of-the-art in Twitter bot detection research. The tool generates more than 1,000 features from Twitter accounts using meta-data and information extracted from interaction patterns and content [4]. These features are then grouped and leveraged to train several different classifiers (one for each group and one for the overall score) using a Random Forest algorithm. These classifiers each output a score. Rather than use the raw score, the Botometer team developed a Complete Automation Probability (CAP) score to provide a better indication of whether an account is a bot or not. A higher account CAP score indicated a higher likelihood that an account is automated. Since the framework provides a continuous bot score, as opposed to a discrete bot/human judgement, an appropriate threshold must be determined to label the accounts. Recent research showed that a threshold of 0.43 maximised accuracy and enabled the classifier to correctly identify more modern and sophisticated automated accounts [4].
than 1,000 features from Twitter accounts using meta-data and information extracted from
 
interaction patterns and content [3]. These features are then grouped and leveraged to train
 
several different classifiers (one for each group and one for the overall score) using a
 
Random Forest algorithm. These classifiers each output a score. Rather than use the raw
 
score, the Botometer team developed a Complete Automation Probability (CAP) score to
 
provide a better indication of whether an account is a bot or not.  A higher account CAP
 
score indicated a higher likelihood that an account is automated. Researchers using
 
Botometer are directed to choose their own threshold score to allow decisions to be made on
 
false positive and false negative rates.
 
  
 
'''Adversarial Examples'''
 
'''Adversarial Examples'''
  
Machine learning models are vulnerable to adversarial examples; malicious inputs designed
+
Machine learning models are vulnerable to adversarial examples; malicious inputs designed to yield erroneous model outputs while appearing unmodified to human observers [9]. These adversarial examples exploit the imperfections and approximations made by the learning algorithm during the training phase. This phenomenon is analogous to the concept of optical illusions to humans. Recent research has demonstrated adversarial examples can be easily crafted with knowledge of either the machine learning model or its training data [8].  
to yield erroneous model outputs, while appearing unmodified to human observers [9].
+
 
These adversarial examples exploit the imperfections and approximations made by the
+
A concerning property of adversarial examples from a cybersecurity perspective is that it is possible to generate an adversarial example for any known machine learning model [8]. Another alarming property is that, if an adversarial example is effective against one machine learning model, it will likely be effective against others [10]. This property has been exploited to perform black-box attacks on machine learning models [9]. In a black box attack, the adversary constructs a substitute for the target model and generates adversarial instances against the substitute that can then be used to attack the target [11].
learning algorithm during the training phase. This phenomenon is analogous to the concept
+
 
of optical illusions to humans. Recent research has demonstrated adversarial examples can
 
be easily crafted with knowledge of either the machine learning model or its training data [8].
 
A concerning property of adversarial examples from a cybersecurity perspective, is that it is
 
possible to generate an adversarial example for any known machine learning model. [8]
 
Another alarming property of adversarial examples is that if an adversarial example is
 
successful on one machine learning model, it will likely be successful on others [10]. This
 
property has been exploited to perform black-box attacks on machine learning models [9]. In
 
a black box attack, the adversary constructs a substitute for the target model and generates
 
adversarial instances against the substitute that can then be used to attack the target [11].
 
  
 
=== Preliminary and Expected Results ===
 
=== Preliminary and Expected Results ===
  
We use self-generated and publicly available data (such as Varol-2017 [3]) for our
+
The first phase of this study involves constructing substitute models to mimic Botometer’s algorithm. To construct substitute models a labeled dataset is required. This was obtained by exploiting the Twitter API platform that allows for the streaming of real-time tweets. A small random sample of all public tweets that were produced in English, as specified in the tweet’s language setting, were acquired. The screen names of the users responsible for the tweets were extracted and then passed to the Botometer Python API as input. Botometer output a series of scores for each user, and the threshold of 0.43 was used to label each account [4]. This labeling method was used because it is only necessary to train a substitute capable of mimicking Botometer’s decision boundaries, rather than train a substitute model with optimal accuracy. The final dataset was made up of a balanced spread of 5,000 human and 5,000 bot examples.
experiments, with a balanced representation of human and bot examples. The Varol 2017
+
 
dataset consists of Twitter account IDs along with a corresponding classification (bot or
+
A set of raw features were obtained by mining the meta-data of each user account. Statistical, sentiment and temporal analysis was performed on the meta-data to engineer a larger number of features. This large sample of labelled accounts and corresponding features was used as a training dataset for the substitute models. A subset of the training data was reserved for testing. The algorithms that were identified as the most suitable for this type of supervised learning were Random Forest, Gradient Boosting and Support Vector Machine. These algorithms were tested with the testing data and obtained accuracies of 88%, 87% and 80%, respectively. This accuracy result describes the similarity between the substitute model and Botometer.  
human). We have used these Twitter IDs as inputs into the Botometer python API [13] to
 
obtain a set of output scores. We will choose the threshold score for Botometer so as to best
 
match the labels of our dataset.
 
  
We have also identified suitable machine learning models that can be used to effectively
+
Having obtained substitute models that effectively mimic Botometer, the weighting of each feature can be determined, and this information can be used to craft adversarial examples using existing frameworks [12]. Once the adversarial examples are created, a black-box attack will be conducted against Botometer’s classifier. The results will be evaluated to determine which features can be realistically manipulated and hence, determine the feasibility of this type of attack in the wild. The findings of this research can be utilised to provide suggestions on how current and future defensive frameworks of machine learning algorithms can be improved.  
mimic Botometer’s classification model. We expect to be able to use the previously obtained
 
input-output pairs to train these. In order to obtain an acceptable substitute model, we will
 
apply augmentation techniques to upscale the initial training data to increase the number of
 
synthetic training points. We will then evaluate the similarity between the substitute model
 
and Botometer.
 
  
With our obtained substitute models we expect to be able to craft adversarial examples
 
using existing frameworks [12]. Once the adversarial examples are created, we will conduct
 
a black-box attack against Botometer’s classifier.  We will evaluate our results to determine
 
which features can be realistically manipulated and hence determine the feasibility of this
 
type of attack in the wild. We hope to utilise our findings to provide suggestions on how we
 
can improve the current and future defensive frameworks of machine learning algorithms.
 
  
 
=== References ===
 
=== References ===

Latest revision as of 23:21, 29 October 2019

Project Team

Students

Samuel Henderson

Brian Du

Supervisors

Dr Matthew Sorell

David Hubczenko

Tamas Abraham

Project

Introduction

Social media has profoundly affected the way we acquire and process information. It has been reported that eight in ten Australians use social media [1] and that 52% of social media users utilise it to keep up to date with the news [2]. Furthermore, 17% report that it is used as their primary source of obtaining information [2]. A popular social media platform that is particularly effective at distributing information is Twitter, which will be the focus of this study.

Twitter’s Application Programming Interface (API) enables external software to meld with the site and supports users in building bots. Social bots are social media accounts that automatically produce content and interact with humans [3]. Researchers have found that as many as 15% of active Twitter accounts are bots, with bot activity accounting for 50% of the site’s traffic [5]. Over the past ten years, there has been an explosion of social bots [3]. While not all are malicious, some social bots can attempt to influence people through the spreading and amplification of misinformation. An example of this was the spreading of misinformation online during the 2016 US election [6]. A recent study by Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to spread 31% of the low-credibility information on the network [7].

With the increased uptake and usage of social media, it is concerning to consider the impact of these social bots, given their ability to spread and amplify misinformation. Researchers have sought to address this by using machine learning algorithms to detect social bots on social media. For Twitter, the current state-of-the-art classifier is Botometer (formerly known as BotOrNot) [4]. Current classification algorithms have followed a reactive schema, where detection techniques are based on collected evidence of existing bots. Adversaries, therefore, only have to modify the characteristics of their bots to evade detection. This leaves researchers always one step behind in a virtual arms race.

There has been an increased interest in the artificial intelligence community in the vulnerabilities of machine learning models (referred to as adversarial machine learning) [8,9]. In this study, adversarial machine learning techniques will be employed to study how an adversary may evade Twitter bot detection classifiers. Real-world adversaries often have no knowledge about the machine learning models they are trying to attack. Since Botometer is accessed through a public API and the model has not been made available, the most practical way to attack is using a black-box approach [11]. This involves constructing substitute machine learning algorithms to mimic Botometer, from which adversarial examples can be crafted. The purpose of this research is to highlight the vulnerabilities in the existing Twitter bot detection tools and to encourage their further development with adversarial machine learning concepts taken into account.


Objectives

The main objectives of this research project are to:  Test the limits and vulnerabilities of a current, state-of-the-art Twitter bot classifier in an adversarial setting.  Engineer adversarial examples and perform a practical black-box attack against the Twitter bot machine learning algorithm.  Suggest a defensive framework to improve the robustness of these classifier models.

Background

Botometer

Botometer is state-of-the-art in Twitter bot detection research. The tool generates more than 1,000 features from Twitter accounts using meta-data and information extracted from interaction patterns and content [4]. These features are then grouped and leveraged to train several different classifiers (one for each group and one for the overall score) using a Random Forest algorithm. These classifiers each output a score. Rather than use the raw score, the Botometer team developed a Complete Automation Probability (CAP) score to provide a better indication of whether an account is a bot or not. A higher account CAP score indicated a higher likelihood that an account is automated. Since the framework provides a continuous bot score, as opposed to a discrete bot/human judgement, an appropriate threshold must be determined to label the accounts. Recent research showed that a threshold of 0.43 maximised accuracy and enabled the classifier to correctly identify more modern and sophisticated automated accounts [4].

Adversarial Examples

Machine learning models are vulnerable to adversarial examples; malicious inputs designed to yield erroneous model outputs while appearing unmodified to human observers [9]. These adversarial examples exploit the imperfections and approximations made by the learning algorithm during the training phase. This phenomenon is analogous to the concept of optical illusions to humans. Recent research has demonstrated adversarial examples can be easily crafted with knowledge of either the machine learning model or its training data [8].

A concerning property of adversarial examples from a cybersecurity perspective is that it is possible to generate an adversarial example for any known machine learning model [8]. Another alarming property is that, if an adversarial example is effective against one machine learning model, it will likely be effective against others [10]. This property has been exploited to perform black-box attacks on machine learning models [9]. In a black box attack, the adversary constructs a substitute for the target model and generates adversarial instances against the substitute that can then be used to attack the target [11].


Preliminary and Expected Results

The first phase of this study involves constructing substitute models to mimic Botometer’s algorithm. To construct substitute models a labeled dataset is required. This was obtained by exploiting the Twitter API platform that allows for the streaming of real-time tweets. A small random sample of all public tweets that were produced in English, as specified in the tweet’s language setting, were acquired. The screen names of the users responsible for the tweets were extracted and then passed to the Botometer Python API as input. Botometer output a series of scores for each user, and the threshold of 0.43 was used to label each account [4]. This labeling method was used because it is only necessary to train a substitute capable of mimicking Botometer’s decision boundaries, rather than train a substitute model with optimal accuracy. The final dataset was made up of a balanced spread of 5,000 human and 5,000 bot examples.

A set of raw features were obtained by mining the meta-data of each user account. Statistical, sentiment and temporal analysis was performed on the meta-data to engineer a larger number of features. This large sample of labelled accounts and corresponding features was used as a training dataset for the substitute models. A subset of the training data was reserved for testing. The algorithms that were identified as the most suitable for this type of supervised learning were Random Forest, Gradient Boosting and Support Vector Machine. These algorithms were tested with the testing data and obtained accuracies of 88%, 87% and 80%, respectively. This accuracy result describes the similarity between the substitute model and Botometer.

Having obtained substitute models that effectively mimic Botometer, the weighting of each feature can be determined, and this information can be used to craft adversarial examples using existing frameworks [12]. Once the adversarial examples are created, a black-box attack will be conducted against Botometer’s classifier. The results will be evaluated to determine which features can be realistically manipulated and hence, determine the feasibility of this type of attack in the wild. The findings of this research can be utilised to provide suggestions on how current and future defensive frameworks of machine learning algorithms can be improved.


References

[1] Yellow™, "Yellow Social Media Report 2018", Yellow, 2018.

[2] Park, S., Fisher, C., Fuller, G. & Lee, J.Y. (2018). Digital news report: Australia 2018. Canberra: News and Media Research Centre.

[3] Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017, May). Online human- bot interactions: Detection, estimation, and characterization. In Eleventh international AAAI conference on web and social media.

[4] Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. 2011. The socialbot network: when bots socialize for fame and money. In ACSAC: 27th Annual Computer Security Applications Conference.ACM, 93–102.

[5] Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104.

[6] Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36.

[7] Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature communications, 9(1), 4787.

[8] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.

[9] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.

[10] Papernot, N., McDaniel, P., & Goodfellow, I. (2016). Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277.

[11] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security(pp. 506-519). ACM.

[12] Kulynych, B., Hayes, J., Samarin, N., & Troncoso, C. (2018). Evading classifiers in discrete domains with provable optimality guarantees. arXiv preprint arXiv:1810.10939.

[13] Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences Steering Committee.