Difference between revisions of "Projects:2019s1-117 Adversarial Machine Learning"
(Created page with "== Project Team == === Students === Samuel Henderson Brian Du === Supervisors === Dr Matthew Sorell David Hubczenko Tamas Abraham") |
|||
(One intermediate revision by the same user not shown) | |||
Line 14: | Line 14: | ||
Tamas Abraham | Tamas Abraham | ||
+ | |||
+ | == Project == | ||
+ | |||
+ | === Introduction === | ||
+ | |||
+ | Social media has profoundly affected the way we acquire and process information. It has been reported that eight in ten Australians use social media [1] and that 52% of social media users utilise it to keep up to date with the news [2]. Furthermore, 17% report that it is used as their primary source of obtaining information [2]. A popular social media platform that is particularly effective at distributing information is Twitter, which will be the focus of this study. | ||
+ | |||
+ | Twitter’s Application Programming Interface (API) enables external software to meld with the site and supports users in building bots. Social bots are social media accounts that automatically produce content and interact with humans [3]. Researchers have found that as many as 15% of active Twitter accounts are bots, with bot activity accounting for 50% of the site’s traffic [5]. Over the past ten years, there has been an explosion of social bots [3]. While not all are malicious, some social bots can attempt to influence people through the spreading and amplification of misinformation. An example of this was the spreading of misinformation online during the 2016 US election [6]. A recent study by Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to spread 31% of the low-credibility information on the network [7]. | ||
+ | |||
+ | With the increased uptake and usage of social media, it is concerning to consider the impact of these social bots, given their ability to spread and amplify misinformation. Researchers have sought to address this by using machine learning algorithms to detect social bots on social media. For Twitter, the current state-of-the-art classifier is Botometer (formerly known as BotOrNot) [4]. Current classification algorithms have followed a reactive schema, where detection techniques are based on collected evidence of existing bots. Adversaries, therefore, only have to modify the characteristics of their bots to evade detection. This leaves researchers always one step behind in a virtual arms race. | ||
+ | |||
+ | There has been an increased interest in the artificial intelligence community in the vulnerabilities of machine learning models (referred to as adversarial machine learning) [8,9]. In this study, adversarial machine learning techniques will be employed to study how an adversary may evade Twitter bot detection classifiers. Real-world adversaries often have no knowledge about the machine learning models they are trying to attack. Since Botometer is accessed through a public API and the model has not been made available, the most practical way to attack is using a black-box approach [11]. This involves constructing substitute machine learning algorithms to mimic Botometer, from which adversarial examples can be crafted. The purpose of this research is to highlight the vulnerabilities in the existing Twitter bot detection tools and to encourage their further development with adversarial machine learning concepts taken into account. | ||
+ | |||
+ | |||
+ | === Objectives === | ||
+ | |||
+ | The main objectives of this research project are to: | ||
+ | Test the limits and vulnerabilities of a current, state-of-the-art Twitter bot classifier in | ||
+ | an adversarial setting. | ||
+ | Engineer adversarial examples and perform a practical black-box attack against the | ||
+ | Twitter bot machine learning algorithm. | ||
+ | Suggest a defensive framework to improve the robustness of these classifier models. | ||
+ | |||
+ | === Background === | ||
+ | |||
+ | '''Botometer''' | ||
+ | |||
+ | Botometer is state-of-the-art in Twitter bot detection research. The tool generates more than 1,000 features from Twitter accounts using meta-data and information extracted from interaction patterns and content [4]. These features are then grouped and leveraged to train several different classifiers (one for each group and one for the overall score) using a Random Forest algorithm. These classifiers each output a score. Rather than use the raw score, the Botometer team developed a Complete Automation Probability (CAP) score to provide a better indication of whether an account is a bot or not. A higher account CAP score indicated a higher likelihood that an account is automated. Since the framework provides a continuous bot score, as opposed to a discrete bot/human judgement, an appropriate threshold must be determined to label the accounts. Recent research showed that a threshold of 0.43 maximised accuracy and enabled the classifier to correctly identify more modern and sophisticated automated accounts [4]. | ||
+ | |||
+ | '''Adversarial Examples''' | ||
+ | |||
+ | Machine learning models are vulnerable to adversarial examples; malicious inputs designed to yield erroneous model outputs while appearing unmodified to human observers [9]. These adversarial examples exploit the imperfections and approximations made by the learning algorithm during the training phase. This phenomenon is analogous to the concept of optical illusions to humans. Recent research has demonstrated adversarial examples can be easily crafted with knowledge of either the machine learning model or its training data [8]. | ||
+ | |||
+ | A concerning property of adversarial examples from a cybersecurity perspective is that it is possible to generate an adversarial example for any known machine learning model [8]. Another alarming property is that, if an adversarial example is effective against one machine learning model, it will likely be effective against others [10]. This property has been exploited to perform black-box attacks on machine learning models [9]. In a black box attack, the adversary constructs a substitute for the target model and generates adversarial instances against the substitute that can then be used to attack the target [11]. | ||
+ | |||
+ | |||
+ | === Preliminary and Expected Results === | ||
+ | |||
+ | The first phase of this study involves constructing substitute models to mimic Botometer’s algorithm. To construct substitute models a labeled dataset is required. This was obtained by exploiting the Twitter API platform that allows for the streaming of real-time tweets. A small random sample of all public tweets that were produced in English, as specified in the tweet’s language setting, were acquired. The screen names of the users responsible for the tweets were extracted and then passed to the Botometer Python API as input. Botometer output a series of scores for each user, and the threshold of 0.43 was used to label each account [4]. This labeling method was used because it is only necessary to train a substitute capable of mimicking Botometer’s decision boundaries, rather than train a substitute model with optimal accuracy. The final dataset was made up of a balanced spread of 5,000 human and 5,000 bot examples. | ||
+ | |||
+ | A set of raw features were obtained by mining the meta-data of each user account. Statistical, sentiment and temporal analysis was performed on the meta-data to engineer a larger number of features. This large sample of labelled accounts and corresponding features was used as a training dataset for the substitute models. A subset of the training data was reserved for testing. The algorithms that were identified as the most suitable for this type of supervised learning were Random Forest, Gradient Boosting and Support Vector Machine. These algorithms were tested with the testing data and obtained accuracies of 88%, 87% and 80%, respectively. This accuracy result describes the similarity between the substitute model and Botometer. | ||
+ | |||
+ | Having obtained substitute models that effectively mimic Botometer, the weighting of each feature can be determined, and this information can be used to craft adversarial examples using existing frameworks [12]. Once the adversarial examples are created, a black-box attack will be conducted against Botometer’s classifier. The results will be evaluated to determine which features can be realistically manipulated and hence, determine the feasibility of this type of attack in the wild. The findings of this research can be utilised to provide suggestions on how current and future defensive frameworks of machine learning algorithms can be improved. | ||
+ | |||
+ | |||
+ | === References === | ||
+ | |||
+ | [1] Yellow™, "Yellow Social Media Report 2018", Yellow, 2018. | ||
+ | |||
+ | [2] Park, S., Fisher, C., Fuller, G. & Lee, J.Y. (2018). Digital news report: Australia 2018. | ||
+ | Canberra: News and Media Research Centre. | ||
+ | |||
+ | [3] Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017, May). Online human- | ||
+ | bot interactions: Detection, estimation, and characterization. In Eleventh international AAAI | ||
+ | conference on web and social media. | ||
+ | |||
+ | [4] Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. 2011. The | ||
+ | socialbot network: when bots socialize for fame and money. In ACSAC: 27th Annual | ||
+ | Computer Security Applications Conference.ACM, 93–102. | ||
+ | |||
+ | [5] Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. | ||
+ | Communications of the ACM, 59(7), 96-104. | ||
+ | |||
+ | [6] Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal | ||
+ | of economic perspectives, 31(2), 211-36. | ||
+ | |||
+ | [7] Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The | ||
+ | spread of low-credibility content by social bots. Nature communications, 9(1), 4787. | ||
+ | |||
+ | [8] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. | ||
+ | (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. | ||
+ | |||
+ | [9] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial | ||
+ | examples. arXiv preprint arXiv:1412.6572. | ||
+ | |||
+ | [10] Papernot, N., McDaniel, P., & Goodfellow, I. (2016). Transferability in machine learning: from | ||
+ | phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. | ||
+ | |||
+ | [11] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). | ||
+ | Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on | ||
+ | Asia Conference on Computer and Communications Security(pp. 506-519). ACM. | ||
+ | |||
+ | [12] Kulynych, B., Hayes, J., Samarin, N., & Troncoso, C. (2018). Evading classifiers in discrete | ||
+ | domains with provable optimality guarantees. arXiv preprint arXiv:1810.10939. | ||
+ | |||
+ | [13] Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A | ||
+ | system to evaluate social bots. In Proceedings of the 25th International Conference | ||
+ | Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences | ||
+ | Steering Committee. |
Latest revision as of 23:21, 29 October 2019
Contents
Project Team
Students
Samuel Henderson
Brian Du
Supervisors
Dr Matthew Sorell
David Hubczenko
Tamas Abraham
Project
Introduction
Social media has profoundly affected the way we acquire and process information. It has been reported that eight in ten Australians use social media [1] and that 52% of social media users utilise it to keep up to date with the news [2]. Furthermore, 17% report that it is used as their primary source of obtaining information [2]. A popular social media platform that is particularly effective at distributing information is Twitter, which will be the focus of this study.
Twitter’s Application Programming Interface (API) enables external software to meld with the site and supports users in building bots. Social bots are social media accounts that automatically produce content and interact with humans [3]. Researchers have found that as many as 15% of active Twitter accounts are bots, with bot activity accounting for 50% of the site’s traffic [5]. Over the past ten years, there has been an explosion of social bots [3]. While not all are malicious, some social bots can attempt to influence people through the spreading and amplification of misinformation. An example of this was the spreading of misinformation online during the 2016 US election [6]. A recent study by Shao et al. found that a mere 6% of Twitter accounts identified as bots were enough to spread 31% of the low-credibility information on the network [7].
With the increased uptake and usage of social media, it is concerning to consider the impact of these social bots, given their ability to spread and amplify misinformation. Researchers have sought to address this by using machine learning algorithms to detect social bots on social media. For Twitter, the current state-of-the-art classifier is Botometer (formerly known as BotOrNot) [4]. Current classification algorithms have followed a reactive schema, where detection techniques are based on collected evidence of existing bots. Adversaries, therefore, only have to modify the characteristics of their bots to evade detection. This leaves researchers always one step behind in a virtual arms race.
There has been an increased interest in the artificial intelligence community in the vulnerabilities of machine learning models (referred to as adversarial machine learning) [8,9]. In this study, adversarial machine learning techniques will be employed to study how an adversary may evade Twitter bot detection classifiers. Real-world adversaries often have no knowledge about the machine learning models they are trying to attack. Since Botometer is accessed through a public API and the model has not been made available, the most practical way to attack is using a black-box approach [11]. This involves constructing substitute machine learning algorithms to mimic Botometer, from which adversarial examples can be crafted. The purpose of this research is to highlight the vulnerabilities in the existing Twitter bot detection tools and to encourage their further development with adversarial machine learning concepts taken into account.
Objectives
The main objectives of this research project are to: Test the limits and vulnerabilities of a current, state-of-the-art Twitter bot classifier in an adversarial setting. Engineer adversarial examples and perform a practical black-box attack against the Twitter bot machine learning algorithm. Suggest a defensive framework to improve the robustness of these classifier models.
Background
Botometer
Botometer is state-of-the-art in Twitter bot detection research. The tool generates more than 1,000 features from Twitter accounts using meta-data and information extracted from interaction patterns and content [4]. These features are then grouped and leveraged to train several different classifiers (one for each group and one for the overall score) using a Random Forest algorithm. These classifiers each output a score. Rather than use the raw score, the Botometer team developed a Complete Automation Probability (CAP) score to provide a better indication of whether an account is a bot or not. A higher account CAP score indicated a higher likelihood that an account is automated. Since the framework provides a continuous bot score, as opposed to a discrete bot/human judgement, an appropriate threshold must be determined to label the accounts. Recent research showed that a threshold of 0.43 maximised accuracy and enabled the classifier to correctly identify more modern and sophisticated automated accounts [4].
Adversarial Examples
Machine learning models are vulnerable to adversarial examples; malicious inputs designed to yield erroneous model outputs while appearing unmodified to human observers [9]. These adversarial examples exploit the imperfections and approximations made by the learning algorithm during the training phase. This phenomenon is analogous to the concept of optical illusions to humans. Recent research has demonstrated adversarial examples can be easily crafted with knowledge of either the machine learning model or its training data [8].
A concerning property of adversarial examples from a cybersecurity perspective is that it is possible to generate an adversarial example for any known machine learning model [8]. Another alarming property is that, if an adversarial example is effective against one machine learning model, it will likely be effective against others [10]. This property has been exploited to perform black-box attacks on machine learning models [9]. In a black box attack, the adversary constructs a substitute for the target model and generates adversarial instances against the substitute that can then be used to attack the target [11].
Preliminary and Expected Results
The first phase of this study involves constructing substitute models to mimic Botometer’s algorithm. To construct substitute models a labeled dataset is required. This was obtained by exploiting the Twitter API platform that allows for the streaming of real-time tweets. A small random sample of all public tweets that were produced in English, as specified in the tweet’s language setting, were acquired. The screen names of the users responsible for the tweets were extracted and then passed to the Botometer Python API as input. Botometer output a series of scores for each user, and the threshold of 0.43 was used to label each account [4]. This labeling method was used because it is only necessary to train a substitute capable of mimicking Botometer’s decision boundaries, rather than train a substitute model with optimal accuracy. The final dataset was made up of a balanced spread of 5,000 human and 5,000 bot examples.
A set of raw features were obtained by mining the meta-data of each user account. Statistical, sentiment and temporal analysis was performed on the meta-data to engineer a larger number of features. This large sample of labelled accounts and corresponding features was used as a training dataset for the substitute models. A subset of the training data was reserved for testing. The algorithms that were identified as the most suitable for this type of supervised learning were Random Forest, Gradient Boosting and Support Vector Machine. These algorithms were tested with the testing data and obtained accuracies of 88%, 87% and 80%, respectively. This accuracy result describes the similarity between the substitute model and Botometer.
Having obtained substitute models that effectively mimic Botometer, the weighting of each feature can be determined, and this information can be used to craft adversarial examples using existing frameworks [12]. Once the adversarial examples are created, a black-box attack will be conducted against Botometer’s classifier. The results will be evaluated to determine which features can be realistically manipulated and hence, determine the feasibility of this type of attack in the wild. The findings of this research can be utilised to provide suggestions on how current and future defensive frameworks of machine learning algorithms can be improved.
References
[1] Yellow™, "Yellow Social Media Report 2018", Yellow, 2018.
[2] Park, S., Fisher, C., Fuller, G. & Lee, J.Y. (2018). Digital news report: Australia 2018. Canberra: News and Media Research Centre.
[3] Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017, May). Online human- bot interactions: Detection, estimation, and characterization. In Eleventh international AAAI conference on web and social media.
[4] Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. 2011. The socialbot network: when bots socialize for fame and money. In ACSAC: 27th Annual Computer Security Applications Conference.ACM, 93–102.
[5] Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104.
[6] Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36.
[7] Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature communications, 9(1), 4787.
[8] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
[9] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
[10] Papernot, N., McDaniel, P., & Goodfellow, I. (2016). Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277.
[11] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security(pp. 506-519). ACM.
[12] Kulynych, B., Hayes, J., Samarin, N., & Troncoso, C. (2018). Evading classifiers in discrete domains with provable optimality guarantees. arXiv preprint arXiv:1810.10939.
[13] Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences Steering Committee.