Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Filipe Oliveira completed his Bachelor's Degree in Computer Science in 2021 and is currently finishing his Master's Degree at the School of Management and Technology, of the Polytechnic of Porto. His bachelor's final project was developed in the field of Machine Learning (ML). This project was carried out around the Distributed Machine Learning concept. Currently, he is an Invited Assistant Professor at the same institution. As a research enthusiast and passionate about the fascinating field of Artificial Intelligence, he finds real satisfaction in exploring the advances and challenges of this ever-evolving area. Over these few years, he had the opportunity to contribute to the knowledge in this domain, having written several scientific articles that address crucial issues and innovative solutions in Machine Learning.

Interest
Topics
Details

Details

  • Name

    Filipe Vamonde Oliveira
  • Role

    Research Assistant
  • Since

    15th February 2023
002
Publications

2024

Supervised and unsupervised techniques in textile quality inspections

Authors
Ferreira, HM; Carneiro, DR; Guimaraes, MA; Oliveira, FV;

Publication
5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023

Abstract
Quality inspection is a critical step in ensuring the quality and efficiency of textile production processes. With the increasing complexity and scale of modern textile manufacturing systems, the need for accurate and efficient quality inspection and defect detection techniques has become paramount. This paper compares supervised and unsupervised Machine Learning techniques for defect detection in the context of industrial textile production, in terms of their respective advantages and disadvantages, and their implementation and computational costs. We explore the use of an autoencoder for the detection of defects in textiles. The goal of this preliminary work is to find out if unsupervised methods can successfully train models with good performance without the need for defect labelled data. (c) 2023 The Authors. Published by Elsevier B.V.

2023

The Impact of Data Selection Strategies on Distributed Model Performance

Authors
Guimarães, M; Oliveira, F; Carneiro, D; Novais, P;

Publication
Ambient Intelligence - Software and Applications - 14th International Symposium on Ambient Intelligence, ISAmI 2023, Guimarães, Portugal, July 12-14, 2023

Abstract
Distributed Machine Learning, in which data and learning tasks are scattered across a cluster of computers, is one of the answers of the field to the challenges posed by Big Data. Still, in an era in which data abounds, decisions must still be made regarding which specific data to use on the training of the model, either because the amount of available data is simply too large, or because the training time or complexity of the model must be kept low. Typical approaches include, for example, selection based on data freshness. However, old data are not necessarily outdated and might still contain relevant patterns. Likewise, relying only on recent data may significantly decrease data diversity and representativity, and decrease model quality. The goal of this paper is to compare different heuristics for selecting data in a distributed Machine Learning scenario. Specifically, we ascertain whether selecting data based on their characteristics (meta-features), and optimizing for maximum diversity, improves model quality while, eventually, allowing to reduce model complexity. This will allow to develop more informed data selection strategies in distributed settings, in which the criteria are not only the location of the data or the state of each node in the cluster, but also include intrinsic and relevant characteristics of the data. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

2023

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Authors
Oliveira, F; Carneiro, D; Guimaraes, M; Oliveira, O; Novais, P;

Publication
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS

Abstract
As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.

2023

Predicting Model Training Time to Optimize Distributed Machine Learning Applications

Authors
Guimaraes, M; Carneiro, D; Palumbo, G; Oliveira, F; Oliveira, O; Alves, V; Novais, P;

Publication
ELECTRONICS

Abstract
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs-a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster's computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.

2023

Dynamic Management of Distributed Machine Learning Projects

Authors
Oliveira, F; Alves, A; Moço, H; Monteiro, J; Oliveira, O; Carneiro, D; Novais, P;

Publication
INTELLIGENT DISTRIBUTED COMPUTING XV, IDC 2022

Abstract
Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner following the principle of data locality, and is able to change parts of the model through an optimization module, thus allowing a model to evolve over time as the data changes. This paper describes its generic architecture, details the implementation of the first modules, and provides a first validation.