2024
Autores
Mazarei, A; Sousa, R; Mendes-Moreira, J; Molchanov, S; Ferreira, HM;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Outlier detection is a widely used technique for identifying anomalous or exceptional events across various contexts. It has proven to be valuable in applications like fault detection, fraud detection, and real-time monitoring systems. Detecting outliers in real time is crucial in several industries, such as financial fraud detection and quality control in manufacturing processes. In the context of big data, the amount of data generated is enormous, and traditional batch mode methods are not practical since the entire dataset is not available. The limited computational resources further compound this issue. Boxplot is a widely used batch mode algorithm for outlier detection that involves several derivations. However, the lack of an incremental closed form for statistical calculations during boxplot construction poses considerable challenges for its application within the realm of big data. We propose an incremental/online version of the boxplot algorithm to address these challenges. Our proposed algorithm is based on an approximation approach that involves numerical integration of the histogram and calculation of the cumulative distribution function. This approach is independent of the dataset's distribution, making it effective for all types of distributions, whether skewed or not. To assess the efficacy of the proposed algorithm, we conducted tests using simulated datasets featuring varying degrees of skewness. Additionally, we applied the algorithm to a real-world dataset concerning software fault detection, which posed a considerable challenge. The experimental results underscored the robust performance of our proposed algorithm, highlighting its efficacy comparable to batch mode methods that access the entire dataset. Our online boxplot method, leveraging dataset distribution to define whiskers, consistently achieved exceptional outlier detection results. Notably, our algorithm demonstrated computational efficiency, maintaining constant memory usage with minimal hyperparameter tuning.
2024
Autores
Baldo, A; Ferreira, PJS; Mendes-Moreira, J;
Publicação
EXPERT SYSTEMS
Abstract
With technological advancements, much data is being captured by sensors, smartphones, wearable devices, and so forth. These vast datasets are stored in data centres and utilized to forge data-driven models for the condition monitoring of infrastructures and systems through future data mining tasks. However, these datasets often surpass the processing capabilities of traditional information systems and methodologies due to their significant size. Additionally, not all samples within these datasets contribute valuable information during the model training phase, leading to inefficiencies. The processing and training of Machine Learning algorithms become time-consuming, and storing all the data demands excessive space, contributing to the Big Data challenge. In this paper, we propose two novel techniques to reduce large time-series datasets into more compact versions without undermining the predictive performance of the resulting models. These methods also aim to decrease the time required for training the models and the storage space needed for the condensed datasets. We evaluated our techniques on five public datasets, employing three Machine Learning algorithms: Holt-Winters, SARIMA, and LSTM. The outcomes indicate that for most of the datasets examined, our techniques maintain, and in several instances enhance, the forecasting accuracy of the models. Moreover, we significantly reduced the time required to train the Machine Learning algorithms employed.
2024
Autores
---, MP; Mendes-Moreira, J;
Publicação
Abstract
2024
Autores
Kumar, R; Bhanu, M; Mendes Moreira, J; Chandra, J;
Publicação
ACM Computing Surveys
Abstract
2024
Autores
Mendes-Neves, T; Meireles, L; Mendes-Moreira, J;
Publicação
MACHINE LEARNING
Abstract
This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework's design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player's contribution to the team's points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.
2024
Autores
Fontes, DBMM; Homayouni, SM; Fernandes, JC;
Publicação
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH
Abstract
This work extends the energy-efficient job shop scheduling problem with transport resources by considering speed adjustable resources of two types, namely: the machines where the jobs are processed on and the vehicles that transport the jobs around the shop-floor. Therefore, the problem being considered involves determining, simultaneously, the processing speed of each production operation, the sequence of the production operations for each machine, the allocation of the transport tasks to vehicles, the travelling speed of each task for the empty and for the loaded legs, and the sequence of the transport tasks for each vehicle. Among the possible solutions, we are interested in those providing trade-offs between makespan and total energy consumption (Pareto solutions). To that end, we develop and solve a bi-objective mixed-integer linear programming model. In addition, due to problem complexity we also propose a multi-objective biased random key genetic algorithm that simultaneously evolves several populations. The computational experiments performed have show it to be effective and efficient, even in the presence of larger problem instances. Finally, we provide extensive time and energy trade-off analysis (Pareto front) to infer the advantages of considering speed adjustable machines and speed adjustable vehicles and provide general insights for the managers dealing with such a complex problem.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.