
Email: alvabre@unizar.es
Address:
Department of Computer Science and Systems Engineering
Universidad de Zaragoza
Calle María de Luna, 1
Ada Byron Building
50018 Zaragoza, Spain
BIOGRAPHY
Alejandro Valero received the BS, MS, and PhD degrees in Computer Engineering from the Universitat Politècnica de València, Spain, in 2009, 2011, and 2013, respectively. From 2013 to 2015 he was a Visiting Researcher with Northeastern University, Boston, MA, USA, and the University of Cambridge, UK. From 2016 to 2021 he was an Assistant Professor with the Department of Computer Science and Systems Engineering, Universidad de Zaragoza, Spain. Since 2021 he is an Associate Professor with the same department and institution. Prof. Valero has taught several courses on computer organization, including Introduction to Computer Systems, Architecture and Organization of Computer Systems, Operating Systems, Data Center Design, and Programming and Architecture of Heterogeneous Computing Systems. His PhD research contributions to the design of high-performance and energy-efficient CPU on-chip memory hierarchies were recognized by multiple entities. He received the Intel Doctoral Student Honor Program Award in 2012 and the Gold Medal in the ACM Student Research Competition (SRC) Award held in the 27th International Conference on Supercomputing (ICS 2013). His current research interests mainly focus on emerging memory technologies, resource management, and the design of GPU and ASIC architectures in terms of performance, energy efficiency, and reliability. Prof. Valero has participated in more than 20 national and local funded projects, some of them as Lead Researcher. He has published more than 30 papers in the main venues of his area such as the International Symposium on Microarchitecture (MICRO), the International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE Transactions on Computers, and IEEE Transactions on VLSI Systems. He has served as Program Committee Member in a significant number of conferences, journals, workshops, and research competitions like the Design Automation and Test in Europe (DATE) conference, the International Conference on Computer Design (ICCD), the Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) workshop, and the ACM SRC Grand Finals. He is also a frequent reviewer in top journals of his area like IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on Dependable and Secure Computing, and ACM Transactions on Design Automation of Electronic Systems. He received the Best Reviewer Award in the Design Methods and Tools track at the DATE 2024 conference. Prof. Valero is a member of the Association for Computing Machinery (ACM), the Sociedad de Arquitectura y Tecnología de Computadores (SARTECO), and the Aragon Institute of Engineering Research (I3A).
PUBLICATIONS
2025
Journal Articles
Valero, Alejandro; Lorente, Vicente; Petit, Salvador; Sahuquillo, Julio
Dual Fast-Track Cache: Organizing Ring-Shaped Racetracks to Work as L1 Caches Journal Article
In: IEEE Transactions on Computers, vol. 74, no. 8, pp. 2812-2826, 2025, ISSN: 0018-9340.
@article{Valero2025,
title = {Dual Fast-Track Cache: Organizing Ring-Shaped Racetracks to Work as L1 Caches},
author = {Alejandro Valero and Vicente Lorente and Salvador Petit and Julio Sahuquillo},
url = {https://www.computer.org/csdl/journal/tc/2025/08/11022726/27fzlt4rw88},
doi = {10.1109/TC.2025.3575909},
issn = {0018-9340},
year = {2025},
date = {2025-08-01},
urldate = {2025-08-01},
journal = {IEEE Transactions on Computers},
volume = {74},
number = {8},
pages = {2812-2826},
abstract = {Static Random-Access Memory (SRAM) is the fastest memory technology and has been the common design choice for implementing first-level (L1) caches in the processor pipeline, where speed is a key design issue that must be fulfilled. On the contrary, this technology offers much lower density compared to other technologies like Dynamic RAM, limiting L1 cache sizes of modern processors to a few tens of KB. This paper explores the use of slower but denser Domain Wall Memory (DWM) technology for L1 caches. This technology provides slow access times since it arranges multiple bits sequentially in a magnetic racetrack. To access these bits, they need to be shifted in order to place them under a header. A 1-bit shift usually takes one processor cycle, which can significantly hurt the application performance, making this working behavior inappropriate for L1 caches. Based on the locality (temporal and spatial) principles exploited by caches, this work proposes the Dual Fast-Track Cache (Dual FTC) design, a new approach to organizing a set of racetracks to build set-associative caches. Compared to a conventional SRAM cache, Dual FTC enhances storage capacity by 5× while incurring minimal shifting overhead, thereby rendering it a practical and appealing solution for L1 cache implementations. Experimental results show that the devised cache organization is as fast as an SRAM cache for 78% and 86% of the L1 data cache hits and L1 instruction cache hits, respectively (i.e., no shift is required). Consequently, due to the larger L1 cache capacities, significant system performance gains (by 22% on average) are obtained under the same silicon area.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2024
Journal Articles
Toca-Díaz, Yamilka; Tejero, Rubén Gran; Valero, Alejandro
Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators Journal Article
In: Journal of Systems Architecture, vol. 157, pp. 1-13, 2024, ISSN: 1383-7621.
@article{Toca-Díaz2024,
title = {Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators},
author = {Yamilka Toca-Díaz and Rubén Gran Tejero and Alejandro Valero},
url = {https://www.sciencedirect.com/science/article/pii/S1383762124002297},
doi = {https://doi.org/10.1016/j.sysarc.2024.103292},
issn = {1383-7621},
year = {2024},
date = {2024-12-01},
urldate = {2024-12-01},
journal = {Journal of Systems Architecture},
volume = {157},
pages = {1-13},
abstract = {Underscaling the supply voltage (Vdd) to ultra-low levels below the safe-operation threshold voltage (Vmin) holds promise for substantial power savings in digital CMOS circuits. However, these benefits come with pronounced challenges due to the heightened risk of bitcell permanent faults stemming from process variations in current technology node sizes. This work delves into the repercussions of such faults on the accuracy of a 16-bit fixed-point Convolutional Neural Network (CNN) inference accelerator powering on-chip activation memories at ultra-low Vdd voltages. Through an in-depth examination of fault patterns, memory usage, and statistical analysis of activation values, this paper introduces Shift-and-Safe: two novel and cost-effective microarchitectural techniques exploiting the presence of outlier activation values and the underutilization of activation memories. Particularly, activation outliers enable a shift-based data representation that reduces the impact of faults on the activation values, whereas the memory underutilization is exploited to maintain a safe replica of affected activations in idle memory regions. Remarkably, these mechanisms do not add any burden to the programmer and are independent of application characteristics, rendering them easily deployable across real-world CNN accelerators. Experimental results show that Shift-and-Safe maintains the CNN accuracy even in the presence of almost a quarter of the total activations with faults. In addition, average energy savings are by 5% and 11% compared to the state-of-the-art approach and a conventional accelerator supplied at Vmin, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Toca-Díaz, Yamilka; Palacios, Reynier Hernández; Tejero, Ruben Gran; Valero, Alejandro
Flip-and-Patch: A fault-tolerant technique for on-chip memories of CNN accelerators at low supply voltage Journal Article
In: Microprocessors and Microsystems, vol. 106, pp. 1-13, 2024, ISSN: 0141-9331.
@article{Toca-Díaz2024b,
title = {Flip-and-Patch: A fault-tolerant technique for on-chip memories of CNN accelerators at low supply voltage},
author = {Yamilka Toca-Díaz and Reynier Hernández Palacios and Ruben Gran Tejero and Alejandro Valero},
url = {https://www.sciencedirect.com/science/article/pii/S0141933124000188},
doi = {https://doi.org/10.1016/j.micpro.2024.105023},
issn = {0141-9331},
year = {2024},
date = {2024-04-01},
urldate = {2024-04-01},
journal = {Microprocessors and Microsystems},
volume = {106},
pages = {1-13},
abstract = {Aggressively reducing the supply voltage (Vdd) below the safe threshold voltage (Vmin) can effectively lead to significant energy savings in digital circuits. However, operating at such low supply voltages poses challenges due to a high occurrence of permanent faults resulting from manufacturing process variations in current technology nodes. This work addresses the impact of permanent faults on the accuracy of a Convolutional Neural Network (CNN) inference accelerator using on-chip activation memories supplied at low Vdd below Vmin. Based on a characterization study of fault patterns, this paper proposes two low-cost microarchitectural techniques, namely Flip-and-Patch, which maintain the original accuracy of CNN applications even in the presence of a high number of faults caused by operating at Vdd < Vmin. Unlike existing techniques, Flip-and-Patch remains transparent to the programmer and does not rely on application characteristics, making it easily applicable to real CNN accelerators.
Experimental results show that Flip-and-Patch ensures the original CNN accuracy with a minimal impact on system performance (less than 0.05% for every application), while achieving average energy savings of 10.5% and 46.6% in activation memories compared to a conventional accelerator operating at safe and nominal supply voltages, respectively. Compared to the state-of-the-art ThUnderVolt technique, which dynamically adjusts the supply voltage at run time and discarding any energy overhead for such an approach, the average energy savings are by 3.2%.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Experimental results show that Flip-and-Patch ensures the original CNN accuracy with a minimal impact on system performance (less than 0.05% for every application), while achieving average energy savings of 10.5% and 46.6% in activation memories compared to a conventional accelerator operating at safe and nominal supply voltages, respectively. Compared to the state-of-the-art ThUnderVolt technique, which dynamically adjusts the supply voltage at run time and discarding any energy overhead for such an approach, the average energy savings are by 3.2%.
Proceedings Articles
Toca-Díaz, Yamilka; Tejero, Rubén Gran; Valero, Alejandro
Ensuring the Accuracy of CNN Accelerators Supplied at Ultra-Low Voltage Proceedings Article
In: pp. 92-95, 2024, ISBN: 979-8-3503-8040-8.
@inproceedings{Toca-Díaz2024c,
title = {Ensuring the Accuracy of CNN Accelerators Supplied at Ultra-Low Voltage},
author = {Yamilka Toca-Díaz and Rubén Gran Tejero and Alejandro Valero},
url = {https://ieeexplore.ieee.org/document/10817950},
doi = {https://doi.org/10.1109/ICCD63220.2024.00024},
isbn = {979-8-3503-8040-8},
year = {2024},
date = {2024-11-18},
urldate = {2024-11-18},
journal = {Proceedings of the 42nd IEEE International Conference on Computer Design (ICCD 2024)},
pages = {92-95},
abstract = {Underscaling the supply voltage (Vdd) to ultra-low levels below the safe-operation threshold voltage (Vmin) brings significant energy savings in digital CMOS circuits but introduces reliability challenges due to increased risk of bitcell permanent faults. This work explores the impact of such faults on the accuracy of a CNN inference accelerator supplying on-chip activation memories at ultra-low Vdd. By examining fault pat-terns, activation values, and memory usage, this paper proposes two microarchitectural techniques exploiting activation outliers and activation memory underutilization. These approaches are cost-effective, do not require programmer intervention, and are application-independent. Experimental results show that the proposed approaches maintain the original CNN accuracy and achieve energy savings by 2.1 % and 8.2 % compared to the state-of-the-art technique and a conventional accelerator supplied at Vmin, respectively, with a negligible impact on the system performance (less than 0.25 %).},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2023
Proceedings Articles
Toca-Díaz, Yamilka; Muñoz, Nicolás Landeros; Tejero, Ruben Gran; Valero, Alejandro
On Fault-Tolerant Microarchitectural Techniques for Voltage Underscaling in On-Chip Memories of CNN Accelerators Proceedings Article
In: pp. 138-145, 2023, ISBN: 979-8-3503-4419-6.
@inproceedings{Toca-Díaz2023,
title = {On Fault-Tolerant Microarchitectural Techniques for Voltage Underscaling in On-Chip Memories of CNN Accelerators},
author = {Yamilka Toca-Díaz and Nicolás Landeros Muñoz and Ruben Gran Tejero and Alejandro Valero},
url = {https://ieeexplore.ieee.org/document/10456839},
doi = {https://doi.org/10.1109/DSD60849.2023.00029},
isbn = {979-8-3503-4419-6},
year = {2023},
date = {2023-09-06},
urldate = {2023-09-06},
journal = {Proceedings of the 26th Euromicro Conference on Digital System Design (DSD 2023)},
pages = {138-145},
abstract = {Aggressively underscaling the supply voltage (Vdd) below the safe voltage (Vmin) margin is an effective solution to attain substantial energy savings. Unfortunately, operating at such low voltages is challenging due to the high number of permanent faults as a result of variations in the manufacturing process of current technology nodes. This work characterizes the impact of permanent faults on the accuracy of a Convolutional Neural Network (CNN) inference accelerator with on-chip activation memories supplied at low Vdd below Vmin. Based on these observations, this paper proposes a couple of low-cost microarchitectural techniques, referred to as flipping and patching, that ensure the accuracy of CNN applications despite the presence of permanent faults. Contrary to prior work, the proposed techniques are transparent to the programmer and do not depend on application characteristics. Experimental results show that the proposed techniques maintain the original CNN accuracy with a minimal impact on system performance (less than 0.05%), while reducing the energy consumption of activation memories by 11.2% and 46.7% compared to those of a conventional accelerator operating at safe and nominal supply voltages, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}