Associate Professor
Email: alvabre@unizar.es
Address:
Department of Computer Science and Systems Engineering
Universidad de Zaragoza
Calle María de Luna, 1
Ada Byron Building
50018 Zaragoza, Spain
BIOGRAPHY
Alejandro Valero received the BS, MS, and PhD degrees in Computer Engineering from the Universitat Politècnica de València, Spain, in 2009, 2011, and 2013, respectively. From 2013 to 2015 he was a Visiting Researcher with Northeastern University, Boston (MA), USA, and the University of Cambridge, UK. From 2016 to 2021 he was an Assistant Professor with the Department of Computer Science and Systems Engineering, Universidad de Zaragoza, Spain. Since 2021 he is an Associate Professor with the same department and institution. Prof. Valero has taught several courses on computer organization, including digital design, computer organization and design, heterogeneous systems programming and design, data center design, and operating systems. His PhD research contributions to the design of high-performance, energy-efficient CPU memory subsystems were recognized by multiple entities. He received the Intel Doctoral Student Honor Program Award in 2012 and the Gold Medal in the ACM Student Research Competition (SRC) held in the 27th International Conference on Supercomputing (ICS 2013). His research interests mainly focus on the design of memory hierarchies in terms of performance, energy efficiency, and reliability for different microprocessors: CPU systems, general-purpose GPUs, and accelerators for computer vision algorithms. Prof. Valero has participated in more than 20 national and local funded research projects and has published more than 30 papers in the main venues of the computer architecture area, such as the IEEE/ACM International Symposium on Microarchitecture (MICRO), the International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE Transactions on Computers, and IEEE Transactions on Very Large Scale Integration (VLSI) Systems. He has served as Technical Program Committee member in a significant number of conferences, workshops, and research competitions, like the Design Automation and Test in Europe (DATE) conference, the IEEE International Conference on Computer Design (ICCD), the Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS) workshop, and the ACM SRC Grand Finals. He is also a frequent reviewer in top journals of his area, such as IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Dependable and Secure Computing, and ACM Transactions on Design Automation of Electronic Systems. He was a recipient of the Outstanding Reviewer Award in the Design Methods and Tools track at the DATE 2024 conference. Prof. Valero is a member of the ACM, the Sociedad de Arquitectura y Tecnología de Computadores (SARTECO), the Aragon Institute of Engineering Research (I3A), and an affiliated member of the High Performance, Edge, And Cloud Computing (HiPEAC) European Network of Excellence.
PUBLICATIONS
2019
Artículos de revista
Candel, Francisco; Valero, Alejandro; Petit, Salvador; Sahuquillo, Julio
Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance Artículo de revista
En: IEEE Transactions on Computers, vol. 68, no 10, pp. 1442-1454, 2019, ISSN: 0018-9340.
@article{candel2019efficient,
title = {Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance},
author = {Francisco Candel and Alejandro Valero and Salvador Petit and Julio Sahuquillo},
url = {https://ieeexplore.ieee.org/document/8681093},
doi = {https://doi.org/10.1109/TC.2019.2907591},
issn = {0018-9340},
year = {2019},
date = {2019-10-01},
urldate = {2019-01-01},
journal = {IEEE Transactions on Computers},
volume = {68},
number = {10},
pages = {1442-1454},
publisher = {IEEE},
abstract = {To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Valero, Alejandro; Candel, Francisco; Gracia, Darío Suárez; Petit, Salvador; Sahuquillo, Julio
An Aging-Aware GPU Register File Design Based on Data Redundancy Artículo de revista
En: IEEE Transactions on Computers, vol. 68, iss. 1, pp. 4-20, 2019, ISSN: 0018-9340.
@article{Valero2019,
title = {An Aging-Aware GPU Register File Design Based on Data Redundancy},
author = {Alejandro Valero and Francisco Candel and Darío Suárez Gracia and Salvador Petit and Julio Sahuquillo},
url = {https://ieeexplore.ieee.org/document/8395355},
doi = {https://doi.org/10.1109/TC.2018.2849376},
issn = {0018-9340},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
journal = {IEEE Transactions on Computers},
volume = {68},
issue = {1},
pages = {4-20},
abstract = {Nowadays, GPUs sit at the forefront of high-performance computing thanks to their massive computational capabilities. Internally, thousands of functional units, architected to be fed by large register files, fuel such a performance. At deep nanometer technologies, the SRAM memory cells that implement GPU register files are very sensitive to the Negative Bias Temperature Instability (NBTI) effect. NBTI ages cell transistors by degrading their threshold voltage Vth over the lifetime of the GPU. This degradation, which manifests when a cell keeps the same logic value for a relatively long period of time, compromises the cell read stability and increases the transistor switching delay, which can lead to wrong read values and eventually exceed the processor cycle time, respectively, so resulting in faulty operation. This work proposes architectural mechanisms leveraging the redundancy of the data stored in GPU register files to attack NBTI aging. The proposed mechanisms are based on data compression, power gating, and register address rotation techniques. All these mechanisms working together balance the distribution of logic values stored in the cells along the execution time, reducing both the overall Vth degradation and the increase in the transistor switching delays. Experimental results show that a conventional GPU register file suffers the worst case for NBTI, since a significant fraction of the cells maintain the same logic value during the entire application execution (i.e., a 100 percent ‘0’ and ‘1’ duty cycle distributions). On average, the proposal reduces these distributions by 58 and 68 percent, respectively, which translates into Vth degradation savings by 54 and 62 percent, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Workshops
Valero, Alejandro; Gracia, Darío Suárez; Tejero, Ruben Gran; Ramos, Luis M.; Navarro-Torres, Agustín; Muñoz, Adolfo; Ezpeleta, Joaquín; Briz, José Luis; Murillo, Ana C.; Montijano, Eduardo; Resano, Javier; Villarroya-Gaudó, María; Alastruey-Benedé, Jesús; Torres, Enrique F.; Álvarez, Pedro; Ibáñez, Pablo; Viñals, Víctor
Exposing Abstraction-Level Interactions with a Parallel Ray Tracer Workshop
Proceedings of the Workshop on Computer Architecture Education, WCAE@ISCA 2019, Phoenix, AZ, USA, June 22, 2019, ACM, 2019.
@workshop{DBLP:conf/wcae/ValeroGTRNMEBMM19,
title = {Exposing Abstraction-Level Interactions with a Parallel Ray Tracer},
author = {Alejandro Valero and Darío Suárez Gracia and Ruben Gran Tejero and Luis M. Ramos and Agustín Navarro-Torres and Adolfo Muñoz and Joaquín Ezpeleta and José Luis Briz and Ana C. Murillo and Eduardo Montijano and Javier Resano and María Villarroya-Gaudó and Jesús Alastruey-Benedé and Enrique F. Torres and Pedro Álvarez and Pablo Ibáñez and Víctor Viñals},
url = {https://doi.org/10.1145/3338698.3338886},
doi = {10.1145/3338698.3338886},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
booktitle = {Proceedings of the Workshop on Computer Architecture Education, WCAE@ISCA
2019, Phoenix, AZ, USA, June 22, 2019},
pages = {5:1--5:8},
publisher = {ACM},
keywords = {},
pubstate = {published},
tppubtype = {workshop}
}
2018
Proceedings Articles
Candel, Francisco; Petit, Salvador; Valero, Alejandro; Sahuquillo, Julio
Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache Proceedings Article
En: European Conference on Parallel Processing, pp. 235-248, Springer Springer, 2018, ISBN: 978-3-319-96983-1.
@inproceedings{candel2018improving,
title = {Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache},
author = {Francisco Candel and Salvador Petit and Alejandro Valero and Julio Sahuquillo},
url = {https://link.springer.com/chapter/10.1007/978-3-319-96983-1_17},
doi = {https://doi.org/10.1007/978-3-319-96983-1_17},
isbn = {978-3-319-96983-1},
year = {2018},
date = {2018-08-01},
urldate = {2018-01-01},
booktitle = {European Conference on Parallel Processing},
pages = {235-248},
publisher = {Springer},
organization = {Springer},
abstract = {In the last few years, GPGPU computing has become one of the most popular computing paradigms in high-performance computers due to its excellent performance to power ratio. The memory requirements of GPGPU applications widely differ from the requirements of CPU counterparts. The amount of memory accesses is several orders of magnitude higher in GPU applications than in CPU applications, and they present disparate access patterns. Because of this fact, large and highly associative Last-Level Caches (LLCs) bring much lower performance gains in GPUs than in CPUs. This paper presents a novel approach to manage LLC misses that efficiently improves LLC hit ratio, memory-level parallelism, and miss latencies in GPU systems. The proposed approach leverages a small additional Fetch and Replacement Cache (FRC) that stores control and coherence information of incoming blocks until they are fetched from main memory. Then, fetched blocks are swapped with victim blocks to be replaced in the LLC. After that, the eviction of victim blocks is performed from the FRC. This management approach improves performance due to three main reasons: (i) the lifetime of blocks being replaced is increased, (ii) the main memory path is unclogged on long bursts of LLC misses, and (iii) the average L2 miss delaying latency is reduced. Experimental results show that our proposal increases the performance (OPC) over 25% in most of the studied applications, reaching improvements up to 150% in some applications.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Actas de congresos
Valero, Alejandro; Gracia, Darío Suárez; Gran, Rubén; Munoz, Adolfo; Ezpeleta, Joaquín; Briz, José Luis; Ramos, Luis M; Murillo, Ana C; Montijano, Eduardo; Resano, Javier; others,
Atomicidad, Consistencia, Paralelismo y Concurrencia en un Trazador de Rayos elaborado a lo largo del Grado en Ingeniería Informática Actas de congresos
Actas de las Jornadas SARTECO 2018, 2018.
@proceedings{valero2018atomicidad,
title = {Atomicidad, Consistencia, Paralelismo y Concurrencia en un Trazador de Rayos elaborado a lo largo del Grado en Ingeniería Informática},
author = {Alejandro Valero and Darío Suárez Gracia and Rubén Gran and Adolfo Munoz and Joaquín Ezpeleta and José Luis Briz and Luis M Ramos and Ana C Murillo and Eduardo Montijano and Javier Resano and others},
url = {https://zenodo.org/records/1303185},
doi = {https://doi.org/10.5281/zenodo.1303185},
year = {2018},
date = {2018-09-18},
urldate = {2018-09-18},
booktitle = {Actas de las XXIX Jornadas de Paralelismo},
pages = {201-207},
publisher = {Actas de las Jornadas SARTECO 2018},
abstract = {Para el alumnado de Ingeniería Informática resulta de gran interés alcanzar una visión global de los diferentes niveles de abstracción que permiten entender y explotar un sistema informático. Sin embargo, la organización habitual del Grado de Ingeniería Informática en asignaturas tiende hacia la creación de compartimentos estancos, donde se suele trabajar con un único nivel de abstracción, lo cual conlleva a aislar conceptos y especializar plataformas. Con el objetivo de dotar a un conjunto de asignaturas de una mayor transversalidad, este artículo describe un proyecto consistente en un trazador de rayos paralelo que permite al alumnado experimentar las propiedades de la atomicidad, consistencia, paralelismo y concurrencia de un sistema informático desde el nivel algorítmico de una aplicación hasta las instrucciones de código máquina, incluyendo la interacción entre los diferentes niveles de abstracción del sistema y la relación con las asignaturas involucradas. El desarrollo del proyecto se sustenta con la elaboración de diferentes enunciados de prácticas atendiendo a los distintos niveles de abstracción. Finalmente, se describen los requisitos hardware y software necesarios para el desempeño de las prácticas así como la justificación de la elección del dispositivo Raspberry Pi como plataforma única de desarrollo.},
keywords = {},
pubstate = {published},
tppubtype = {proceedings}
}