Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HASLab

2022

Which Technologies are Most Frequently Used by Data Scientists?

Authors
Pereira, P; Fernandes, JP; Cunha, J;

Publication
2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022, Rome, Italy, September 12-16, 2022

Abstract
Data collection is pervasively bound to our digital lifestyle. A recent study reports that the growth of the data created and replicated in 2020 was even higher than in the previous years to an astonishing global amount of 64.2 zettabytes of data. There are numerous companies whose services/products rely heavily on data analysis, and mining the produced data has already revealed great value for businesses in different sectors. In order to be able to support the professionals that do this job, typically known as data scientists, we first need to characterize them. To contribute towards this characterization, we conducted a public survey and in this work we present the results about a particular aspects of their life: the tools they use and need. © 2022 IEEE Computer Society. All rights reserved.

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Authors
Moreno, M; Vilaca, R; Ferreira, PG;

Publication
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

2022

Cloud-Based Privacy-Preserving Medical Imaging System Using Machine Learning Tools

Authors
Alves, J; Soares, B; Brito, C; Sousa, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022

Abstract
Healthcare environments are generating a deluge of sensitive data. Nonetheless, dealing with large amounts of data is an expensive task, and current solutions resort to the cloud environment. Additionally, the intersection of the cloud environment and healthcare data opens new challenges regarding data privacy. With this in mind, we propose MEDCLOUDCARE (MCC), a healthcare application offering medical image viewing and processing tools while integrating cloud computing and AI. Moreover, MCC provides security and privacy features, scalability and high availability. The system is intended for two user groups: health professionals and researchers. The former can remotely view, process and share medical imaging information in the DICOM format. Also, it can use pre-trained Machine Learning (ML) models to aid the analysis of medical images. The latter can remotely add, share, and deploy ML models to perform inference on DICOM images. MCC incorporates a DICOM web viewer enabling users to view and process DICOM studies, which they can also upload and store. Regarding the security and privacy of the data, all sensitive information is encrypted at rest and in transit. Furthermore, MCC is intended for cloud environments. Thus, the system is deployed using Kubernetes, increasing the efficiency, availability and scalability of the ML inference process.

2022

Performance Evaluation of Microservices Featuring Different Implementation Patterns

Authors
Costa, L; Ribeiro, AN;

Publication
INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021

Abstract
The process of migrating from a monolithic to a microservices based architecture is currently described as a form of modernizing applications. The core principles of microservices, which mostly reside in achieving loose coupling between the services, highly depend on the implementation approaches used. Being microservices a complete change of paradigm that contrasts with the traditional way of developing software, the current lack of established principles often results in implementations that conflict with its alleged benefits. Given its distributed nature, performance is affected, but specific implementation patterns can further impact it. This paper aims to address the impact that microservices-based solutions, featuring different implementation patterns, have on performance and how it compares with monolithic applications. To do so, benchmarks are conducted over one application developed following a traditional monolithic approach, and two equivalent microservices-based implementations featuring distinct inter-service communication mechanisms and data management methodologies.

2022

A Blockchain-based Data Market for Renewable Energy Forecasts

Authors
Coelho, F; Silva, F; Goncalves, C; Bessa, R; Alonso, A;

Publication
2022 FOURTH INTERNATIONAL CONFERENCE ON BLOCKCHAIN COMPUTING AND APPLICATIONS (BCCA)

Abstract
This paper presents a data market aimed at trading energy forecasts data. The system architecture is built using blockchain as a service, allowing access to data streams and establishing a distributed settlement between stakeholders. Energy Forecasts data is presented as the commodity traded in the market, whose settlement is provided through the blockchain on the basis of the extracted value provided by market stakeholders. Our proposal allows market stakeholders to acquire energy forecasts and pay according to the data accuracy, solving the confidentiality problem of freely sharing data. A data quality reward is introduced, steering the compensation sent to market participants. The data market design is presented and an evaluation campaign is performed, showing that the data market produced functionally valid results in comparison with the results achieved with a central simulated approach. Moreover, results show that the data market architecture is able to scale.

2022

Flexible Fine-grained Data Access Management for Hyperledger Fabric

Authors
Parente, J; Alonso, AN; Coelho, F; Vinagre, J; Bastos, P;

Publication
2022 FOURTH INTERNATIONAL CONFERENCE ON BLOCKCHAIN COMPUTING AND APPLICATIONS (BCCA)

Abstract
As blockchains go beyond cryptocurrencies into applications in multiple industries such as Insurance, Healthcare and Banking, handling personal or sensitive data, data access control becomes increasingly relevant. Access control mechanisms proposed so far are mostly based on requester identity, particularly for permissioned blockchain platforms, and are limited to binary, all-or-nothing access decisions. This is the case with Hyperledger Fabric's native access control mechanisms and, as permission updates require consensus, these fall short regarding the flexibility required to address GDPR-derived policies and client consent management. We propose SDAM, a novel access control mechanism for Fabric that enables fine-grained and dynamic control policies, using both contextual and resource attributes for decisions. Instead of binary results, decisions may also include mandatory data transformations as to conform with the expressed policy, all without modifications to Fabric. Results show that SDAM's overhead w.r.t baseline Fabric is acceptable. The scalability of the approach w.r.t to the number of concurrent clients is also evaluated and found to follow Fabric's.

  • 39
  • 256