Associate Professor
Email: alvabre@unizar.es
Address:
Department of Computer Science and Systems Engineering
Universidad de Zaragoza
Calle María de Luna, 1
Ada Byron Building
50018 Zaragoza, Spain
BIOGRAPHY
Alejandro Valero received the BS, MS, and PhD degrees in Computer Engineering from the Universitat Politècnica de València, Spain, in 2009, 2011, and 2013, respectively. From 2013 to 2015 he was a Visiting Researcher with Northeastern University, Boston (MA), USA, and the University of Cambridge, UK. From 2016 to 2021 he was an Assistant Professor with the Department of Computer Science and Systems Engineering, Universidad de Zaragoza, Spain. Since 2021 he is an Associate Professor with the same department and institution. Prof. Valero has taught several courses on computer organization, including digital design, computer organization and design, heterogeneous systems programming and design, data center design, and operating systems. His PhD research contributions to the design of high-performance, energy-efficient CPU memory subsystems were recognized by multiple entities. He received the Intel Doctoral Student Honor Program Award in 2012 and the Gold Medal in the ACM Student Research Competition (SRC) held in the 27th International Conference on Supercomputing (ICS 2013). His research interests mainly focus on the design of memory hierarchies in terms of performance, energy efficiency, and reliability for different microprocessors: CPU systems, general-purpose GPUs, and accelerators for computer vision algorithms. Prof. Valero has participated in more than 20 national and local funded research projects and has published more than 30 papers in the main venues of the computer architecture area, such as the IEEE/ACM International Symposium on Microarchitecture (MICRO), the International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE Transactions on Computers, and IEEE Transactions on Very Large Scale Integration (VLSI) Systems. He has served as Technical Program Committee member in a significant number of conferences, workshops, and research competitions, like the Design Automation and Test in Europe (DATE) conference, the IEEE International Conference on Computer Design (ICCD), the Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS) workshop, and the ACM SRC Grand Finals. He is also a frequent reviewer in top journals of his area, such as IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Dependable and Secure Computing, and ACM Transactions on Design Automation of Electronic Systems. He was a recipient of the Outstanding Reviewer Award in the Design Methods and Tools track at the DATE 2024 conference. Prof. Valero is a member of the ACM, the Sociedad de Arquitectura y Tecnología de Computadores (SARTECO), the Aragon Institute of Engineering Research (I3A), and an affiliated member of the High Performance, Edge, And Cloud Computing (HiPEAC) European Network of Excellence.
PUBLICATIONS
2022
Artículos de revista
Muñoz, Nicolás Landeros; Valero, Alejandro; Tejero, Rubén Gran; Zoni, Davide
Gated-CNN: Combating NBTI and HCI aging effects in on-chip activation memories of Convolutional Neural Network accelerators Artículo de revista
En: Journal of Systems Architecture, vol. 128, pp. 1-13, 2022, ISSN: 1383-7621.
@article{Muñoz2022,
title = {Gated-CNN: Combating NBTI and HCI aging effects in on-chip activation memories of Convolutional Neural Network accelerators},
author = {Nicolás Landeros Muñoz and Alejandro Valero and Rubén Gran Tejero and Davide Zoni},
url = {https://www.sciencedirect.com/science/article/pii/S1383762122001072},
doi = {https://doi.org/10.1016/j.sysarc.2022.102553},
issn = {1383-7621},
year = {2022},
date = {2022-07-01},
urldate = {2022-07-01},
journal = {Journal of Systems Architecture},
volume = {128},
pages = {1-13},
abstract = {Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI) are two of the main reliability threats in current technology nodes. These aging phenomena degrade the transistor’s threshold voltage (Vth) over the lifetime of a digital circuit, resulting in slower transistors that eventually lead to a faulty operation when the critical paths become longer than the processor cycle time. Among all the transistors on a chip, the most vulnerable transistors to such wearout effects are those used to implement SRAM storage, since memory cells are continuously degrading. In particular, NBTI ages PMOS cell transistors when a given logic value is stored for a long period (i.e., a long duty cycle), whereas HCI ages NMOS cell transistors not only when the stored value flips but also when it is accessed. This work focuses on mitigating aging in the on-chip SRAM memories of Convolutional Neural Network (CNN) accelerators storing activations. This paper makes two main contributions. At the software level, we quantify the aging induced by current CNN benchmarks with a characterization study of duty cycle, flip, and access patterns in every activation memory cell. Based on the insights from this study, this work proposes a novel microarchitectural technique, Gated-CNN, that ensures a uniform aging degradation of every memory cell. To do so, Gated-CNN exploits power-gating and address rotation techniques tailored to the memory demands and temporal/spatial localities exhibited by CNN applications, as well as the memory organization and management of CNN accelerators. Experimental results show that, compared to a conventional design, the average Vth degradation savings are at least as much as 49% depending on the type of transistor.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Proceedings Articles
Gracia, Darío Suárez; Valero, Alejandro; Tejero, Rubén Gran; Villarroya-Gaudó, María; Viñals, Víctor
peRISCVcope: A Tiny Teaching-Oriented RISC-V Interpreter Proceedings Article
En: Proceedings of the 37th Conference on Design of Circuits and Integrated Circuits (DCIS 2022), pp. 1-6, 2022, ISBN: 978-1-6654-5950-1.
@inproceedings{Gracia2022,
title = {peRISCVcope: A Tiny Teaching-Oriented RISC-V Interpreter},
author = {Darío Suárez Gracia and Alejandro Valero and Rubén Gran Tejero and María Villarroya-Gaudó and Víctor Viñals},
url = {https://ieeexplore.ieee.org/document/9970050},
doi = {https://doi.org/10.1109/DCIS55711.2022.9970050},
isbn = {978-1-6654-5950-1},
year = {2022},
date = {2022-11-16},
urldate = {2022-11-16},
booktitle = {Proceedings of the 37th Conference on Design of Circuits and Integrated Circuits (DCIS 2022)},
pages = {1-6},
abstract = {The fast advances of computer systems translate into a growing demand of methodologies and tools to introduce those novelties into classes. Among the plethora of those advances, virtualization has become an essential technology in almost every relevant system stack, from connected cars to hyperscaled cloud servers. However, introducing those technologies into the classroom remains a challenging task because of the huge complexity of their software components that may hinder the learning process of students. peRISCVcope aims to help in this area by proposing a tiny yet powerful interpreter to dig into virtualization technologies, such as the implementation of trap&emulate hypervisors. With less than 2,000 lines of code, and thanks to the conciseness of the RV32I base instruction set of RISC-V, peRISCVcope enables students to make virtualization knowledge their own. This paper presents our experiences developing and testing a virtualization laboratory where students implement parts of an interpreter. After the practical experience, peRISCVcope has been proved as a useful pedagogical tool, and, most importantly, students have positively rated the experience.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tárrega, Hugo; Valero, Alejandro; Lorente, Vicente; Petit, Salvador; Sahuquillo, Julio
Fast-Track Cache: a huge racetrack memory L1 data cache Proceedings Article
En: Proceedings of the 36th ACM International Conference on Supercomputing (ICS 2022), pp. 1-12, ACM, 2022, ISBN: 978-1-4503-9281-5.
@inproceedings{Tárrega2022,
title = {Fast-Track Cache: a huge racetrack memory L1 data cache},
author = {Hugo Tárrega and Alejandro Valero and Vicente Lorente and Salvador Petit and Julio Sahuquillo},
url = {https://dl.acm.org/doi/10.1145/3524059.3532383},
doi = {https://doi.org/10.1145/3524059.3532383},
isbn = {978-1-4503-9281-5},
year = {2022},
date = {2022-06-28},
urldate = {2022-06-28},
booktitle = {Proceedings of the 36th ACM International Conference on Supercomputing (ICS 2022)},
pages = {1-12},
publisher = {ACM},
abstract = {First-level (L1) caches have been traditionally implemented with Static Random-Access Memory (SRAM) technology, since it is the fastest memory technology, and L1 caches call for tight timing constraints in the processor pipeline. However, one of the main downsides of SRAM is its low density, which prevents L1 caches to improve their storage capacity beyond a few tens of KB. On the other hand, the recent Domain Wall Memory (DWM) technology overcomes such a constraint by arranging multiple bits in a magnetic racetrack, and sharing a header to access those bits. Accessing a bit requires a shift operation to align the target bit under the header. Such shifts increase the final access latency, which is the main reason why DWM has been mostly used to implement slow last-level caches. This paper proposes a novel DWM-based L1 cache data array design, namely Fast-Track Cache (FTC), that allows L1 caches with bigger storage capacities while reducing the shift overhead thanks to an enhanced exploitation of spatial and temporal localities. Experimental results show that most FTC accesses do not require shifts. As a consequence, and due to its larger capacity, FTC improves the processor performance on average by 15% over a conventional SRAM memory subsystem and the state-of-the-art TapeCache architecture based on DWM. At the same time, energy savings are improved on average by 34% over the conventional design.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2021
Artículos de revista
Valero, Alejandro; Tejero, Ruben Gran; Gracia, Darío Suárez; Georgescu, Emanuel A.; Ezpeleta, Joaquín; Álvarez, Pedro; Muñoz, Adolfo; Ramos, Luis M.; Ibáñez, Pablo
A learning experience toward the understanding of abstraction-level interactions in parallel applications Artículo de revista
En: J. Parallel Distributed Comput., vol. 156, pp. 38–52, 2021.
@article{DBLP:journals/jpdc/ValeroTGGEAMRI21,
title = {A learning experience toward the understanding of abstraction-level
interactions in parallel applications},
author = {Alejandro Valero and Ruben Gran Tejero and Darío Suárez Gracia and Emanuel A. Georgescu and Joaquín Ezpeleta and Pedro Álvarez and Adolfo Muñoz and Luis M. Ramos and Pablo Ibáñez},
url = {https://doi.org/10.1016/j.jpdc.2021.05.008},
doi = {10.1016/j.jpdc.2021.05.008},
year = {2021},
date = {2021-01-01},
journal = {J. Parallel Distributed Comput.},
volume = {156},
pages = {38--52},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2020
Artículos de revista
Valero, Alejandro; Gracia, Darío Suárez; Tejero, Rubén Gran
DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files Artículo de revista
En: IEEE Access, vol. 8, pp. 173276-173288, 2020, ISSN: 2169-3536.
@article{Valero2020,
title = {DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files},
author = {Alejandro Valero and Darío Suárez Gracia and Rubén Gran Tejero},
url = {https://ieeexplore.ieee.org/document/9203907},
doi = {https://doi.org/10.1109/ACCESS.2020.3025899},
issn = {2169-3536},
year = {2020},
date = {2020-09-22},
urldate = {2020-09-22},
journal = {IEEE Access},
volume = {8},
pages = {173276-173288},
abstract = {The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty entry to store compressed register values. Experimental results show that, with more than a third of faulty register entries, DC-Patch ensures a reliable operation of the register file and reduces the energy consumption by 47% with respect to a conventional register file working at nominal supply voltage. The energy savings are 21% compared to a voltage noise smoothing scheme operating at the safe supply voltage limit. These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}