2023
Autores
Teixeira, S; Veloso, B; Rodrigues, JC; Gama, J;
Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I
Abstract
The growing use of data-driven decision systems based on Artificial Intelligence (AI) by governments, companies and social organizations has given more attention to the challenges they pose to society. Over the last few years, news about discrimination appeared on social media, and privacy, among others, highlighted their vulnerabilities. Despite all the research around these issues, the definition of concepts inherent to the risks and/or vulnerabilities of data-driven decision systems is not consensual. Categorizing the dangers and vulnerabilities of data-driven decision systems will facilitate ethics by design, ethics in design and ethics for designers to contribute to responsibleAI. Themain goal of thiswork is to understand which types of AI risks/ vulnerabilities are Ethical and/or Technological and the differences between human vs machine classification. We analyze two types of problems: (i) the risks/ vulnerabilities classification task by humans; and (ii) the risks/vulnerabilities classification task by machines. To carry out the analysis, we applied a survey to perform human classification and the BERT algorithm in machine classification. The results show that even with different levels of detail, the classification of vulnerabilities is in agreement in most cases.
2023
Autores
Aguilar-Ruiz, JS; Bifet, A; Gama, J;
Publicação
Analytics
Abstract
2023
Autores
Andrade, T; Gama, J;
Publicação
Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023, Tallinn, Estonia, March 27-31, 2023
Abstract
2023
Autores
Silva, PR; Vinagre, J; Gama, J;
Publicação
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023
Abstract
Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.
2023
Autores
Meira, J; Veloso, B; Bolon Canedo, V; Marreiros, G; Alonso Betanzos, A; Gama, J;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions based on the analyzed data. This study suggests a data-driven predictive maintenance framework for the air production unit (APU) system of a train of Metro do Porto. The proposed method assists in detecting failures and errors in machinery before they reach critical stages. We present an anomaly detection model following an unsupervised approach, combining the Half-Space-trees method with One Class K Nearest Neighbor, adapted to deal with data streams. We evaluate and compare our approach with the Half-Space-Trees method applied without the One Class K Nearest Neighbor combination. Our model produced few type-I errors, significantly increasing the value of precision when compared to the Half-Space-Trees model. Our proposal achieved high anomaly detection performance, predicting most of the catastrophic failures of the APU train system.
2023
Autores
Liguori, A; Caroprese, L; Minici, M; Veloso, B; Spinnato, F; Nanni, M; Manco, G; Gama, J;
Publicação
CoRR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.