Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Carlos Ferreira

2008

RUSE-WARMR: Rule Selection for Classifier Induction in Multi-Relational Data-Sets

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS

Abstract
One of the major challenges in knowledge discovery is how to extract meaningful and useful knowledge from the complex structured data that one finds in Scientific and Technological applications. One approach is to explore the logic relations in the database and using, say, an Inductive Logic Programming (ILP) algorithm find descriptive and expressive patterns. These patterns can then be used as features to characterize the target concept, The effectiveness of these algorithms depends both upon the algorithm we use to generate the patterns and upon the classifier Rule mining provides an excellent framework for efficiently mining the interesting patterns that are relevant. We propose a novel method to select discriminative patterns and evaluate the effectiveness of this method on a complex discovery application of practical interest.

2012

Predictive sequence miner in ILP learning

Authors
Ferreira, CA; Gama, J; Santos Costa, V;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This work presents an optimized version of XMuSer, an ILP based framework suitable to explore temporal patterns available in multi-relational databases. XMuSer's main idea consists of exploiting frequent sequence mining, an efficient method to learn temporal patterns in the form of sequences. XMuSer framework efficiency is grounded on a new coding methodology for temporal data and on the use of a predictive sequence miner. The frameworks selects and map the most interesting sequential patterns into a new table, the sequence relation. In the last step of our framework, we use an ILP algorithm to learn a classification theory on the enlarged relational database that consists of the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems and map each one of three different types of sequential patterns: frequent, closed or maximal. The experiments show that our ILP based framework gains both from the descriptive power of the ILP algorithms and the efficiency of the sequential miners. © 2012 Springer-Verlag Berlin Heidelberg.

2010

Sequential Pattern Mining in Multi-relational Datasets

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multi-sequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.

2012

Predicting Ramp Events with a Stream-Based HMM Framework

Authors
Ferreira, CA; Gama, J; Costa, VS; Miranda, V; Botterud, A;

Publication
Discovery Science - 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings

Abstract
The motivation for this work is the study and prediction of wind ramp events occurring in a large-scale wind farm located in the US Midwest. In this paper we introduce the SHRED framework, a stream-based model that continuously learns a discrete HMM model from wind power and wind speed measurements. We use a supervised learning algorithm to learn HMM parameters from discretized data, where ramp events are HMM states and discretized wind speed data are HMM observations. The discretization of the historical data is obtained by running the SAX algorithm over the first order variations in the original signal. SHRED updates the HMM using the most recent historical data and includes a forgetting mechanism to model natural time dependence in wind patterns. To forecast ramp events we use recent wind speed forecasts and the Viterbi algorithm, that incrementally finds the most probable ramp event to occur. We compare SHRED framework against Persistence baseline in predicting ramp events occurring in short-time horizons, ranging from 30 minutes to 90 minutes. SHRED consistently exhibits more accurate and cost-effective results than the baseline. © 2012 Springer-Verlag Berlin Heidelberg.

2023

Modeling the Ink Tuning Process Using Machine Learning

Authors
Costa, C; Ferreira, CA;

Publication
Intelligent Data Engineering and Automated Learning - IDEAL 2023 - 24th International Conference, Évora, Portugal, November 22-24, 2023, Proceedings

Abstract
Paint bases are the essence of the color palette, allowing for the creation of a wide range of tones by combining them in different proportions. In this paper, an Artificial Neural Network is developed incorporating a pre-trained Decoder to predict the proportion of each paint base in an ink mixture in order to achieve the desired color. Color coordinates in the CIELAB space and the final finish are considered as input parameters. The proposed model is compared with commonly used models such as Linear Regression, Random Forest and Artificial Neural Network. It is important to note that the Artificial Neural Network was implemented with the same architecture as the proposed model but without incorporating the pre-trained Decoder. Experimental results demonstrate that the Artificial Neural Network with a pre-trained Decoder consistently outperforms the other models in predicting the proportions of paint bases for color tuning. This model exhibits lower Mean Absolute Error and Root Mean Square Error values across multiple objectives, indicating its superior accuracy in capturing the complexities of color relationships. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

2024

Map-matching methods in agriculture

Authors
Silva, A; Mendes-Moreira, J; Ferreira, C; Costa, N; Dias, D;

Publication
COMPUTERS AND ELECTRONICS IN AGRICULTURE

Abstract
In this paper, a solution to monitor the location of humans during their activity in the agriculture sector with the aim to boost productivity and efficiency is provided. Our solution is based on map-matching methods, that are used to track the path spanned by a worker along a specific activity in an agriculture culture. Two different cultures are taken into consideration in this study olives and vines. We leverage the symmetry of the geometry of these cultures into our solution and divide the problem three-fold initially, we estimate a path of a worker along the fields, then we apply the map-matching to such path and finally, a post-processing method is applied to ensure local continuity of the sequence obtained from map-matching. The proposed methods are experimentally evaluated using synthetic and real data in the region of Mirandela, Portugal. Evaluation metrics show that results for synthetic data are robust under several sampling periods, while for real-world data, results for the vine culture are on par with synthetic, and for the olive culture performance is reduced.

  • 8
  • 8