2021
Authors
Coelho, T; Figueira, A;
Publication
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Abstract
In recent years we have seen a large adherence to social media by various Higher Education Institutions (HEI) with the intent of reaching their target audiences and strengthen their brand recognition. It is important for organizations to discover the true audience-aggregating themes resulting from their communication strategies, as it provides institutions with the ability to monitor their organizational positioning and identify opportunities and threats. In this work we create an automatic system capable of identifying HEI Twitter communication strategies. We gathered and analyzed more than 18k Twitter publications from 12 of the top-HEI according to the 2019 Center for World University Rankings (CWUR). Results show that there are different strategies, and most of HEI had to adapt them to the covid situation. The analysis also shows the prediction of topics and retweets for a HEI cannot just be based on recent historical data.
2021
Authors
Guimaraes, N; Figueira, A; Torgo, L;
Publication
MATHEMATICS
Abstract
The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a long-term scenario where the topics discussed may change is lacking. In this work, we deviate from current approaches to the problem and instead focus on a longitudinal evaluation using social network publications spanning an 18-month period. We evaluate different combinations of features and supervised models in a long-term scenario where the training and testing data are ordered chronologically, and thus the robustness and stability of the models can be evaluated through time. We experimented with 3 different scenarios where the models are trained with 15-, 30-, and 60-day data periods. The results show that detection models trained with word-embedding features are the ones that perform better and are less likely to be affected by the change of topics (for example, the rise of COVID-19 conspiracy theories). Furthermore, the additional days of training data also increase the performance of the best feature/model combinations, although not very significantly (around 2%). The results presented in this paper build the foundations towards a more pragmatic approach to the evaluation of fake news detection models in social networks.
2022
Authors
Paiva, JC; Leal, JP; Figueira, A;
Publication
ACM TRANSACTIONS ON COMPUTING EDUCATION
Abstract
Practical programming competencies are critical to the success in computer science (CS) education and goto-market of fresh graduates. Acquiring the required level of skills is a long journey of discovery, trial and error, and optimization seeking through a broad range of programming activities that learners must perform themselves. It is not reasonable to consider that teachers could evaluate all attempts that the average learner should develop multiplied by the number of students enrolled in a course, much less in a timely, deep, and fair fashion. Unsurprisingly, exploring the formal structure of programs to automate the assessment of certain features has long been a hot topic among CS education practitioners. Assessing a program is considerably more complex than asserting its functional correctness, as the proliferation of tools and techniques in the literature over the past decades indicates. Program efficiency, behavior, and readability, among many other features, assessed either statically or dynamically, are now also relevant for automatic evaluation. The outcome of an evaluation evolved from the primordial Boolean values to information about errors and tips on how to advance, possibly taking into account similar solutions. This work surveys the state of the art in the automated assessment of CS assignments, focusing on the supported types of exercises, security measures adopted, testing techniques used, type of feedback produced, and the information they offer the teacher to understand and optimize learning. A new era of automated assessment, capitalizing on static analysis techniques and containerization, has been identified. Furthermore, this review presents several other findings from the conducted review, discusses the current challenges of the field, and proposes some future research directions.
2022
Authors
Vaz, B; Barros, MD; Lavoura, MJ; Figueira, A;
Publication
MARKETING AND SMART TECHNOLOGIES, VOL 1
Abstract
It is common for people to choose their next movie or show through other viewers' experience statements, like the Internet Movie Database (IMDb) presents. In this paper, we will be inspecting the IMDb public datasets, processing them, and using a visual analytics approach to understand how a movie can be successful among its fans. The main exploration focus is regions where titles are translated to, how the success of a title relates to its cast, crew, and awards nominations/wins. We took a methodology based on hypothesis formulation based on the EDA exploration and their testing based on a visual analytics confirmation.
2022
Authors
Vaz, B; Bernardes, V; Figueira, A;
Publication
INFORMATION SYSTEMS AND TECHNOLOGIES, WORLDCIST 2022, VOL 3
Abstract
The use of Generative Adversarial Networks is almost traditional in creating synthetic images for medical purposes. They are probably the best use of GANs until now, as their results can easily be checked by the eye of specialists. In fake news detection models, we have seen lately that neural models (and deep learning) can provide a considerable improvement from standard classifiers. Yet, the most problematic problem still is the lack of data, mostly fake news data to feed these models. In this paper, we address that by proposing the use of a GAN. Results show a better capacity to generalize when used for training an extended dataset based on synthetic samples created by this GAN.
2022
Authors
Figueira, A; Vaz, B;
Publication
MATHEMATICS
Abstract
Synthetic data consists of artificially generated data. When data are scarce, or of poor quality, synthetic data can be used, for example, to improve the performance of machine learning models. Generative adversarial networks (GANs) are a state-of-the-art deep generative models that can generate novel synthetic samples that follow the underlying data distribution of the original dataset. Reviews on synthetic data generation and on GANs have already been written. However, none in the relevant literature, to the best of our knowledge, has explicitly combined these two topics. This survey aims to fill this gap and provide useful material to new researchers in this field. That is, we aim to provide a survey that combines synthetic data generation and GANs, and that can act as a good and strong starting point for new researchers in the field, so that they have a general overview of the key contributions and useful references. We have conducted a review of the state-of-the-art by querying four major databases: Web of Sciences (WoS), Scopus, IEEE Xplore, and ACM Digital Library. This allowed us to gain insights into the most relevant authors, the most relevant scientific journals in the area, the most cited papers, the most significant research areas, the most important institutions, and the most relevant GAN architectures. GANs were thoroughly reviewed, as well as their most common training problems, their most important breakthroughs, and a focus on GAN architectures for tabular data. Further, the main algorithms for generating synthetic data, their applications and our thoughts on these methods are also expressed. Finally, we reviewed the main techniques for evaluating the quality of synthetic data (especially tabular data) and provided a schematic overview of the information presented in this paper.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.