TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks

dc.contributor.authorKlhůfek, Jancs
dc.contributor.authorMarchisio, Albertocs
dc.contributor.authorMrázek, Vojtěchcs
dc.contributor.authorSekanina, Lukášcs
dc.contributor.authorShafique, Muhammadcs
dc.coverage.issueOctobercs
dc.coverage.volume13cs
dc.date.issued2025-10-13cs
dc.description.abstractTransformers are neural network models that have gained popularity in various advanced AI systems including embedded/Edge-AI. Due to their architecture, hardware accelerators can leverage massive parallelism, especially when processing attention head operations. While accelerators for Transformers are being discussed in the literature, efficient scheduling of cache operations and detailed modeling of inference dynamics has not yet been addressed comprehensively. In this paper, we introduce TransInferSim, a novel tool that combines cycle-accurate simulation for performance estimation (including latency, memory usage, memory access counts, and computation counts) with a discrete-event-based scheduler that determines the execution order of compute and memory operations. By combining this tool with the Accelergy tool, the simulator enables accurate estimation of energy consumption and on-chip area, leveraging pre-characterized hardware parameters. The proposed tool allows for the accurate determination of cache misses at different levels and with different victim selection configurations. It supports different memory hierarchies and offers several strategies for scheduling operations on compute units. In addition, TransInferSim can extract the full execution plan generated during simulation, enabling its further use for behavioral Register Transfer Level validation or for deployment in real hardware implementations. This makes the tool applicable not only for high-level design space exploration, but also as a software front-end for hardware execution mapping. Finally, we can optimize the architecture for a particular network, as demonstrated through multiobjective design space exploration to adjust the size of processing arrays. In our experiments, the introduction of an on-chip memory hierarchy improved the inference speed by 3.5× and reduced energy by 1.9× for the RoBERTaBase Transformer model, while design space exploration achieved up to 10× latency reduction and 6× area savings for the ViTTiny vision Transformer. The tool is available online at https://github.com/ehw-fit/TransInferSim.en
dc.description.abstractTransformers are neural network models that have gained popularity in various advanced AI systems including embedded/Edge-AI. Due to their architecture, hardware accelerators can leverage massive parallelism, especially when processing attention head operations. While accelerators for Transformers are being discussed in the literature, efficient scheduling of cache operations and detailed modeling of inference dynamics has not yet been addressed comprehensively. In this paper, we introduce TransInferSim, a novel tool that combines cycle-accurate simulation for performance estimation (including latency, memory usage, memory access counts, and computation counts) with a discrete-event-based scheduler that determines the execution order of compute and memory operations. By combining this tool with the Accelergy tool, the simulator enables accurate estimation of energy consumption and on-chip area, leveraging pre-characterized hardware parameters. The proposed tool allows for the accurate determination of cache misses at different levels and with different victim selection configurations. It supports different memory hierarchies and offers several strategies for scheduling operations on compute units. In addition, TransInferSim can extract the full execution plan generated during simulation, enabling its further use for behavioral Register Transfer Level validation or for deployment in real hardware implementations. This makes the tool applicable not only for high-level design space exploration, but also as a software front-end for hardware execution mapping. Finally, we can optimize the architecture for a particular network, as demonstrated through multiobjective design space exploration to adjust the size of processing arrays. In our experiments, the introduction of an on-chip memory hierarchy improved the inference speed by 3.5× and reduced energy by 1.9× for the RoBERTaBase Transformer model, while design space exploration achieved up to 10× latency reduction and 6× area savings for the ViTTiny vision Transformer. The tool is available online at https://github.com/ehw-fit/TransInferSim.en
dc.formattextcs
dc.format.extent177215-177226cs
dc.format.mimetypeapplication/pdfcs
dc.identifier.citationIEEE Access. 2025, vol. 13, issue October, p. 177215-177226.en
dc.identifier.doi10.1109/ACCESS.2025.3621062cs
dc.identifier.issn2169-3536cs
dc.identifier.orcid0009-0003-8399-9699cs
dc.identifier.orcid0000-0002-0689-4776cs
dc.identifier.orcid0000-0002-9399-9313cs
dc.identifier.orcid0000-0002-2693-9011cs
dc.identifier.orcid0000-0002-2607-8135cs
dc.identifier.other193349cs
dc.identifier.researcheridU-3706-2019cs
dc.identifier.researcheridAAF-8828-2019cs
dc.identifier.researcheridE-8394-2014cs
dc.identifier.scopus56559922700cs
dc.identifier.scopus35616481600cs
dc.identifier.urihttp://hdl.handle.net/11012/255607
dc.language.isoencs
dc.relation.ispartofIEEE Accesscs
dc.relation.urihttps://ieeexplore.ieee.org/document/11202474cs
dc.rightsCreative Commons Attribution 4.0 Internationalcs
dc.rights.accessopenAccesscs
dc.rights.sherpahttp://www.sherpa.ac.uk/romeo/issn/2169-3536/cs
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/cs
dc.subjectTransformersen
dc.subjecthardware acceleratorsen
dc.subjectmodeling toolsen
dc.subjectmemory subsystemen
dc.subjectevaluation and optimizationsen
dc.subjectTransformers
dc.subjecthardware accelerators
dc.subjectmodeling tools
dc.subjectmemory subsystem
dc.subjectevaluation and optimizations
dc.titleTransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networksen
dc.title.alternativeTransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networksen
dc.type.driverarticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
eprints.grantNumberinfo:eu-repo/grantAgreement/GA0/GA/GA25-15490Scs
sync.item.dbidVAV-193349en
sync.item.dbtypeVAVen
sync.item.insts2025.11.20 15:49:21en
sync.item.modts2025.11.20 15:20:45en
thesis.grantorVysoké učení technické v Brně. Fakulta informačních technologií. Fakulta informačních technologiícs

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TransInferSim_Toward_Fast_and_Accurate_Evaluation_of_Embedded_Hardware_Accelerators_for_Transformer_Networks.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format
Description:
file TransInferSim_Toward_Fast_and_Accurate_Evaluation_of_Embedded_Hardware_Accelerators_for_Transformer_Networks.pdf