TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge

Abstract

On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make training challenging, particularly for Transformer architectures. This work introduces a framework enabling efficient on-device training on RISC-V hardware, achieving significant improvements through Low-Rank Adaptation, including 23% dynamic memory reduction and 15x parameter reduction compared to standard backpropagation.

Publication
In Design, Automation and Test in Europe (DATE) 2026
Run Wang
Run Wang
PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

Related