Associate Professor
Email: alvabre@unizar.es
Address:
Department of Computer Science and Systems Engineering
Universidad de Zaragoza
Calle María de Luna, 1
Ada Byron Building
50018 Zaragoza, Spain
BIOGRAPHY
Alejandro Valero received the BS, MS, and PhD degrees in Computer Engineering from the Universitat Politècnica de València, Spain, in 2009, 2011, and 2013, respectively. From 2013 to 2015 he was a Visiting Researcher with Northeastern University, Boston (MA), USA, and the University of Cambridge, UK. From 2016 to 2021 he was an Assistant Professor with the Department of Computer Science and Systems Engineering, Universidad de Zaragoza, Spain. Since 2021 he is an Associate Professor with the same department and institution. Prof. Valero has taught several courses on computer organization, including digital design, computer organization and design, heterogeneous systems programming and design, data center design, and operating systems. His PhD research contributions to the design of high-performance, energy-efficient CPU memory subsystems were recognized by multiple entities. He received the Intel Doctoral Student Honor Program Award in 2012 and the Gold Medal in the ACM Student Research Competition (SRC) held in the 27th International Conference on Supercomputing (ICS 2013). His research interests mainly focus on the design of memory hierarchies in terms of performance, energy efficiency, and reliability for different microprocessors: CPU systems, general-purpose GPUs, and accelerators for computer vision algorithms. Prof. Valero has participated in more than 20 national and local funded research projects and has published more than 30 papers in the main venues of the computer architecture area, such as the IEEE/ACM International Symposium on Microarchitecture (MICRO), the International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE Transactions on Computers, and IEEE Transactions on Very Large Scale Integration (VLSI) Systems. He has served as Technical Program Committee member in a significant number of conferences, workshops, and research competitions, like the Design Automation and Test in Europe (DATE) conference, the IEEE International Conference on Computer Design (ICCD), the Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS) workshop, and the ACM SRC Grand Finals. He is also a frequent reviewer in top journals of his area, such as IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Dependable and Secure Computing, and ACM Transactions on Design Automation of Electronic Systems. He was a recipient of the Outstanding Reviewer Award in the Design Methods and Tools track at the DATE 2024 conference. Prof. Valero is a member of the ACM, the Sociedad de Arquitectura y Tecnología de Computadores (SARTECO), the Aragon Institute of Engineering Research (I3A), and an affiliated member of the High Performance, Edge, And Cloud Computing (HiPEAC) European Network of Excellence.
PUBLICATIONS
2011
Proceedings Articles
Valero, Alejandro; Sahuquillo, Julio; Petit, Salvador; Lopez, Pedro; Duato, Jose
Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour Proceedings Article
In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT 2011), pp. 214-214, IEEE 2011, ISBN: 978-1-4577-1794-9.
@inproceedings{valero2011improving,
title = {Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour},
author = {Alejandro Valero and Julio Sahuquillo and Salvador Petit and Pedro Lopez and Jose Duato},
url = {https://ieeexplore.ieee.org/document/6113824},
doi = {https://doi.org/10.1109/PACT.2011.47},
isbn = {978-1-4577-1794-9},
year = {2011},
date = {2011-10-10},
urldate = {2011-01-01},
booktitle = {Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT 2011)},
pages = {214-214},
organization = {IEEE},
abstract = {Last-Level Caches (LLCs) implement the LRU algorithm to exploit temporal locality, but its performance is quite far of Belady's optimal algorithm as the number of ways increases. One of the main reasons because of LRU does not reach good performance in LLCs is that this policy forces a block to descend until the bottom of the stack before eviction. Nevertheless, most of the blocks that leave the MRU position are not referenced again before eviction. This work pursues to select candidate blocks to be victimized before reaching the bottom of the stack. To this end, this work defines the number of MRU-Tours (MRUTs) of a block as the number of times that a block enters in the MRU position during its live time. Based on the fact that most of the blocks exhibit a single MRUT, this work presents the family of MRUT-based algorithms aimed at exploiting this block behavior to improve performance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Masters Theses
Valero, Alejandro
Disseny de caches de dades L1 mitjançant tecnologia SRAM i eDRAM Masters Thesis
Universitat Politècnica de València, 2011.
@mastersthesis{valero2012disseny,
title = {Disseny de caches de dades L1 mitjançant tecnologia SRAM i eDRAM},
author = {Alejandro Valero},
url = {https://riunet.upv.es/handle/10251/15676},
year = {2011},
date = {2011-06-01},
school = {Universitat Politècnica de València},
keywords = {},
pubstate = {published},
tppubtype = {mastersthesis}
}
Proceedings
Valero, Alejandro; Sahuquillo, Julio; Petit, Salvador; López, Pedro; Duato, José
Algoritmo de reemplazo para cache de último nivel basado en periodos MRU Proceedings
2011.
@proceedings{valero2011algoritmo,
title = {Algoritmo de reemplazo para cache de último nivel basado en periodos MRU},
author = {Alejandro Valero and Julio Sahuquillo and Salvador Petit and Pedro López and José Duato},
url = {https://www.researchgate.net/publication/266480897_Algoritmo_de_reemplazo_para_cache_de_ultimo_nivel_basado_en_periodos_MRU},
doi = {https://doi.org/10.13140/RG.2.1.1627.9525},
year = {2011},
date = {2011-09-01},
urldate = {2011-09-01},
journal = {Actas de las XXII Jornadas de Paralelismo},
pages = {557--562},
abstract = {El diseño de la jerarquía de memoria es un aspecto importante en los microprocesadores actuales. Muchos trabajos de investigación se centran en el último nivel de cache, el cual se diseña para ocultar la elevada latencia de acceso a la memoria principal. Para reducir los fallos de capacidad y de conflicto, estas caches forman estructuras de memoria grandes con un gran número de vías. Para explotar la localidad temporal, el algoritmo de reemplazo típicamente implementado en caches es el LRU. Sin embargo, para caches con un gran número de vías, su implementación es costosa en términos de área y consumo de potencia. De hecho, el uso de LRU no es conveniente en caches de último nivel porque no pueden lidiar con la localidad temporal. Esto se debe a que las caches de último nivel no ven todos los accesos a memoria. Además, los bloques deben descender hasta la última posición de la pila LRU para ser reemplazados. En este trabajo se muestra que la mayoría de los bloques no se vuelven a referenciar una vez han dejado la posición MRU. Más aún, la probabilidad de volver a ser referenciados no depende siempre de la posición que ocupan en la pila. Basándose en estas observaciones, se define el número de periodos MRU (pMRU) de un bloque como el número de veces que un bloque ocupa la posición MRU mientras permanece en la cache, y se propone el algoritmo de reemplazo pMRU, que selecciona la víctima entre aquellos bloques que tienen un solo pMRU. También se proponen variaciones de este algoritmo para explotar la recencia de información. Los resultados experimentales muestran que, en la media, la mejor versión de algoritmo pMRU obtiene una reducción de MPKI de un 19% comparado con LRU. Además, la versión más sencilla tan sólo necesita 2 bits de estado por bloque independientemente de la asociatividad de cache. Por consiguiente, la complejidad hardware y el coste de actualizar estos bits se reduce significativamente comparado con LRU.},
keywords = {},
pubstate = {published},
tppubtype = {proceedings}
}
2009
Proceedings Articles
Valero, Alejandro; Sahuquillo, Julio; Petit, Salvador; Lorente, Vicente; Canal, Ramon; López, Pedro; Duato, José
An hybrid eDRAM/SRAM macrocell to implement first-level data caches Proceedings Article
In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), pp. 213-221, ACM 2009, ISBN: 978-1-60558-798-1.
@inproceedings{valero2009hybrid,
title = {An hybrid eDRAM/SRAM macrocell to implement first-level data caches},
author = {Alejandro Valero and Julio Sahuquillo and Salvador Petit and Vicente Lorente and Ramon Canal and Pedro López and José Duato},
url = {https://ieeexplore.ieee.org/document/5375366},
doi = {https://doi.org/10.1145/1669112.1669140},
isbn = {978-1-60558-798-1},
year = {2009},
date = {2009-12-16},
urldate = {2009-01-01},
booktitle = {Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009)},
pages = {213-221},
organization = {ACM},
abstract = {SRAM and DRAM cells have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not destructive. In contrast, DRAM cells provide higher density and minimal leakage energy since there are no paths within the cell from Vdd to ground. Recently, DRAM cells have been embedded in logic-based technology, thus overcoming the speed limit of typical DRAM cells. In this paper we propose an n-bit macrocell that implements one static cell, and n-1 dynamic cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Our study shows that in a four-way set-associative cache with this macrocell compared to an SRAM based with the same capacity, leakage is reduced by about 75% and area more than half with a minimal impact on performance. Architectural mechanisms have also been devised to avoid refresh logic. Experimental results show that no performance is lost when the retention time is larger than 50 K processor cycles. In addition, the proposed delayed writeback policy that avoids refreshing performs a similar amount of writebacks than a conventional cache with the same organization, so no power wasting is incurred.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
0000
Journal Articles
Valero, Alejandro; Gracia, Darío Suárez; Tejero, Rubén Gran; Ramos, Luis M; Navarro-Torres, Agustín; Munoz, Adolfo; Ezpeleta, Joaquín; Briz, José Luis; Murillo, Ana C; Montijano, Eduardo; others,
Experimentacion Preliminar con un Trazador de Rayos para Relacionar Niveles de Abstraccion Journal Article
In: 0000.
@article{valeroexperimentacion,
title = {Experimentacion Preliminar con un Trazador de Rayos para Relacionar Niveles de Abstraccion},
author = {Alejandro Valero and Darío Suárez Gracia and Rubén Gran Tejero and Luis M Ramos and Agustín Navarro-Torres and Adolfo Munoz and Joaquín Ezpeleta and José Luis Briz and Ana C Murillo and Eduardo Montijano and others},
keywords = {},
pubstate = {published},
tppubtype = {article}
}