Publications

Publications by Fernando Silva

2019

Feature-enriched author ranking in incomplete networks

Authors
Silva, J; Aparicio, D; Silva, F;

Publication
APPLIED NETWORK SCIENCE

Abstract
Evaluating scientists based on their scientific production is a controversial topic. Nevertheless, bibliometrics and algorithmic approaches can assist traditional peer review in numerous tasks, such as attributing research grants, deciding scientific committees, or choosing faculty promotions. Traditional bibliometrics rank individual entities (e.g., researchers, journals, faculties) without looking at the whole data (i.e., the whole network). Network algorithms, such as PageRank, have been used to measure node importance in a network, and have been applied to author ranking. However, traditional PageRank only uses network topology and ignores relevant features of scientific collaborations. Multiple extensions of PageRank have been proposed, more suited for author ranking. These methods enrich the network with information about the author’s productivity or the venue and year of the publication/citation. Most state-of-the-art (STOA) feature-enriched methods either ignore or do not combine effectively this information. Furthermore, STOA algorithms typically disregard that the full network is not known for most real-world cases.Here we describe OTARIOS, an author ranking method recently developed by us, which combines multiple publication/citation criteria (i.e., features) to evaluate authors. OTARIOS divides the original network into two subnetworks, insiders and outsiders, which is an adequate representation of citation networks with missing information. We evaluate OTARIOS on a set of five real networks, each with publications in distinct areas of Computer Science, and compare it against STOA methods. When matching OTARIOS’ produced ranking with a ground-truth ranking (comprised of best paper award nominations), we observe that OTARIOS is >30% more accurate than traditional PageRank (i.e., topology based method) and >20% more accurate than STOA (i.e., competing feature enriched methods). We obtain the best results when OTARIOS considers (i) the author’s publication volume and publication recency, (ii) how recently the author’s work is being cited by outsiders, and (iii) how recently the author’s work is being cited by insiders and how individual he is. Our results showcase (a) the importance of efficiently combining relevant features and (b) how to adequately perform author ranking in incomplete networks.

CloseRead Abstract

2019

Finding Dominant Nodes Using Graphlets

Authors
Aparício, D; Ribeiro, P; Silva, F; Silva, JMB;

Publication
Complex Networks and Their Applications VIII - Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019, Lisbon, Portugal, December 10-12, 2019.

Abstract
Finding important nodes is a classic task in network science. Nodes are important depending on the context; e.g., they can be (i) nodes that, when removed, cause the network to collapse or (ii) influential spreaders (e.g., of information, or of diseases). Typically, central nodes are assumed to be important, and numerous network centrality measures have been proposed such as the degree centrality, the betweenness centrality, and the subgraph centrality. However, centrality measures are not tailored to capture one particular kind of important nodes: dominant nodes. We define dominant nodes as nodes that dominate many others and are not dominated by many others. We then propose a general graphlet-based measure of node dominance called graphlet-dominance (GD). We analyze how GD differs from traditional network centrality measures. We also study how certain parameters (namely the importance of dominating versus not being dominated and indirect versus direct dominances) influence GD. Finally, we apply GD to author ranking and verify that GD is superior to PageRank in four of the five citation networks tested. © 2020, Springer Nature Switzerland AG.

CloseRead Abstract

2020

FOCAS: Penalising friendly citations to improve author ranking

Authors
Silva, J; Aparicio, D; Ribeiro, P; Silva, F;

Publication
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20)

Abstract
Scientific impact is commonly associated with the number of citations received. However, an author can easily boost his own citation count by (i) publishing articles that cite his own previous work (self-citations), (ii) having co-authors citing his work (co-author citations), or (iii) exchanging citations with authors from other research groups (reciprocated citations). Even though these friendly citations inflate an author's perceived scientific impact, author ranking algorithms do not normally address them. They, at most, remove self-citations. Here we present Friends-Only Citations AnalySer (FOCAS), a method that identifies friendly citations and reduces their negative effect in author ranking algorithms. FOCAS combines the author citation network with the co-authorship network in order to measure author proximity and penalises citations between friendly authors. FOCAS is general and can be regarded as an independent module applied while running (any) PageRank-like author ranking algorithm. FOCAS can be tuned to use three different criteria, namely authors' distance, citation frequency, and citation recency, or combinations of these. We evaluate and compare FOCAS against eight state-of-the-art author ranking algorithms. We compare their rankings with a ground-truth of best paper awards. We test our hypothesis on a citation and co-authorship network comprised of seven Information Retrieval top-conferences. We observed that FOCAS improved author rankings by 25% on average and, in one case, leads to a gain of 46%.

CloseRead Abstract

2020

JAY: Adaptive Computation Offloading for Hybrid Cloud Environments

Authors
Silva, J; Marques, ERB; Lopes, LMB; Silva, F;

Publication
2020 FIFTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING (FMEC)

Abstract
Edge computing is a hot research topic given the ever-increasing requirements of mobile applications in terms of computation and communication and the emerging Internet-of-Things with billions of devices. While ubiquitous and with considerable computational resources, devices at the edge may not be able to handle processing tasks on their own and thus resort to offloading to cloudlets, when available, or traditional cloud infrastructures. In this paper, we present JAY, a modular and extensible platform for mobile devices, cloudlets, and clouds that can manage computational tasks spawned by devices and make informed decisions about offloading to neighboring devices, cloudlets, or traditional clouds. JAY is parametric on the scheduling strategy and metrics used to make offloading decisions, providing a useful tool to study the impact of distinct offloading strategies. We illustrate the use of JAY with an evaluation of several offloading strategies in distinct cloud configurations using a real-world machine learning application, firing tasks can be dynamically executed on or offloaded to Android devices, cloudlet servers, or Google Cloud servers. The results obtained show that edge-clouds form competent computing platforms on their own and that they can effectively be meshed with cloudlets and traditional clouds when more demanding processing tasks are considered. In particular, edge computing is competitive with infrastructure clouds in scenarios where data is generated at the edge, high bandwidth is required, and a pool of computationally competent devices or an edge-server is available. The results also highlight JAY's ability of exposing the performance compromises in applications when they are deployed over distinct hybrid cloud configurations using distinct offloading strategies.

CloseRead Abstract

2021

Time series analysis via network science: Concepts and algorithms

Authors
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space, or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining, and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics, and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified way and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition, and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Knowledge Representation

CloseRead Abstract

2021

A Survey on Subgraph Counting: Concepts, Algorithms, and Applications to Network Motifs and Graphlets

Authors
Ribeiro, P; Paredes, P; Silva, MEP; Aparicio, D; Silva, F;

Publication
ACM COMPUTING SURVEYS

Abstract
Computing subgraph frequencies is a fundamental task that lies at the core of several network analysis methodologies, such as network motifs and graphlet-based metrics, which have been widely used to categorize and compare networks from multiple domains. Counting subgraphs is, however, computationally very expensive, and there has been a large body of work on efficient algorithms and strategies to make subgraph counting feasible for larger subgraphs and networks. This survey aims precisely to provide a comprehensive overview of the existing methods for subgraph counting. Our main contribution is a general and structured review of existing algorithms, classifying them on a set of key characteristics, highlighting their main similarities and differences. We identify and describe the main conceptual approaches, giving insight on their advantages and limitations, and we provide pointers to existing implementations. We initially focus on exact sequential algorithms, but we also do a thorough survey on approximate methodologies (with a trade-off between accuracy and execution time) and parallel strategies (that need to deal with an unbalanced search space).

CloseRead Abstract