Publicacoes - INESC TEC

Publicações

Publicações por Nuno Ricardo Guimarães

2020

Analysis and Detection of Unreliable Users in Twitter: Two Case Studies

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
Communications in Computer and Information Science

Abstract
The emergence of online social networks provided users with an easy way to publish and disseminate content, reaching broader audiences than previous platforms (such as blogs or personal websites) allowed. However, malicious users started to take advantage of these features to disseminate unreliable content through the network like false information, extremely biased opinions, or hate speech. Consequently, it becomes crucial to try to detect these users at an early stage to avoid the propagation of unreliable content in social networks’ ecosystems. In this work, we introduce a methodology to extract large corpus of unreliable posts using Twitter and two databases of unreliable websites (OpenSources and Media Bias Fact Check). In addition, we present an analysis of the content and users that publish and share several types of unreliable content. Finally, we develop supervised models to classify a twitter account according to its reliability. The experiments conducted using two different data sets show performance above 94% using Decision Trees as the learning algorithm. These experiments, although with some limitations, provide some encouraging results for future research on detecting unreliable accounts on social networks. © 2020, Springer Nature Switzerland AG.

FecharLer Abstract

2021

Profiling Accounts Political Bias on Twitter

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021)

Abstract
Twitter has become a major platform to share ideas and promoting discussion on relevant topics. However, with a large number of users to resort to it as their primary source of information and with an increasing number of accounts spreading newsworthy content, a characterization of the political bias associated with the social network ecosystem becomes necessary. In this work, we aim at analyzing accounts spreading or publishing content from five different classes of the political spectrum. We also look further and study accounts who spread content from both right and left sides. Conclusions show that there is a large presence of accounts which disseminate right bias content although it is the more central classes that have a higher influence on the network. In addition, users who spread content from both sides are more actively spreading right content with opposite content associated with criticism towards left political parties or promoting right political decisions.

FecharLer Abstract

2021

Towards a pragmatic detection of unreliable accounts on social networks

Autores
Guimarães, N; Figueira, A; Torgo, L;

Publicação
Online Soc. Networks Media

Abstract
In recent years, the problem of unreliable content in social networks has become a major threat, with a proven real-world impact in events like elections and pandemics, undermining democracy and trust in science, respectively. Research in this domain has focused not only on the content but also on the accounts that propagate it, with the bot detection task having been thoroughly studied. However, not all bot accounts work as unreliable content spreaders (p.e. bot for news aggregation), and not all human accounts are necessarily reliable. In this study, we try to distinguish unreliable from reliable accounts, independently of how they are operated. In addition, we work towards providing a methodology capable of coping with real-world situations by introducing the content available (restricting it by volume- and time-based batches) as a parameter of the methodology. Experiments conducted on a validation set with a different number of tweets per account provide evidence that our proposed solution produces an increase of up to 20% in performance when compared with traditional (individual) models and with cross-batch models (which perform better with different batches of tweets).

FecharLer Abstract

2020

Knowledge-based Reliability Metrics for Social Media Accounts

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST)

Abstract
The growth of social media as an information medium without restrictive measures on the creation of new accounts led to the rise of malicious agents with the intend to diffuse unreliable information in the network, ultimately affecting the perception of users in important topics such as political and health issues. Although the problem is being tackled within the domain of bot detection, the impact of studies in this area is still limited due to 1) not all accounts that spread unreliable content are bots, 2) human-operated accounts are also responsible for the diffusion of unreliable information and 3) bot accounts are not always malicious (e.g. news aggregators). Also, most of these methods are based on supervised models that required annotated data and updates to maintain their performance through time. In this work, we build a framework and develop knowledge-based metrics to complement the current research in bot detection and characterize the impact and behavior of a Twitter account, independently of the way it is operated (human or bot). We proceed to analyze a sample of the accounts using the metrics proposed and evaluate the necessity of these metrics by comparing them with the scores from a bot detection system. The results show that the metrics can characterize different degrees of unreliable accounts, from unreliable bot accounts with a high number of followers to human-operated accounts that also spread unreliable content (but with less impact on the network). Furthermore, evaluating a sample of the accounts with a bot detection system shown that bots compose around 11% of the sample of unreliable accounts extracted and that the bot score is not correlated with the proposed metrics. In addition, the accounts that achieve the highest values in our metrics present different characteristics than the ones that achieve the highest bot score. This provides evidence on the usefulness of our metrics in the evaluation of unreliable accounts in social networks. Copyright

FecharLer Abstract

2021

Can Fake News Detection Models Maintain the Performance through Time? A Longitudinal Evaluation of Twitter Publications

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
MATHEMATICS

Abstract
The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a long-term scenario where the topics discussed may change is lacking. In this work, we deviate from current approaches to the problem and instead focus on a longitudinal evaluation using social network publications spanning an 18-month period. We evaluate different combinations of features and supervised models in a long-term scenario where the training and testing data are ordered chronologically, and thus the robustness and stability of the models can be evaluated through time. We experimented with 3 different scenarios where the models are trained with 15-, 30-, and 60-day data periods. The results show that detection models trained with word-embedding features are the ones that perform better and are less likely to be affected by the change of topics (for example, the rise of COVID-19 conspiracy theories). Furthermore, the additional days of training data also increase the performance of the best feature/model combinations, although not very significantly (around 2%). The results presented in this paper build the foundations towards a more pragmatic approach to the evaluation of fake news detection models in social networks.

FecharLer Abstract

2021

The landscape of schizophrenia on twitter

Autores
Rodrigues, T; Guimaraes, N; Monteiro, J;

Publicação
EUROPEAN PSYCHIATRY

Abstract
IntroductionPeople with schizophrenia experience higher levels of stigma compared with other diseases. The analysis of social media content is a tool of great importance to understand the public opinion toward a particular topic.ObjectivesThe aim of this study is to analyse the content of social media on schizophrenia and the most prevalent sentiments towards this disorder.MethodsTweets were retrieved using Twitter’s Application Programming Interface and the keyword “schizophrenia”. Parameters were set to allow the retrieval of recent and popular tweets on the topic and no restrictions were made in terms of geolocation. Analysis of 8 basic emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) was conducted automatically using a lexicon-based approach and the NRC Word-Emotion Association Lexicon.ResultsTweets on schizophrenia were heterogeneous. The most prevalent sentiments on the topic were mainly negative, namely anger, fear, sadness and disgust. Qualitative analyses of the most retweeted posts added insight into the nature of the public dialogue on schizophrenia.ConclusionsAnalyses of social media content can add value to the research on stigma toward psychiatric disorders. This tool is of growing importance in many fields and further research in mental health can help the development of public health strategies in order to decrease the stigma towards psychiatric disorders.

FecharLer Abstract