TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning at the Extreme Edge
On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make training challenging, particularly for Transformer architectures.
This work introduces TrainDeeploy, a framework enabling efficient on-device training on RISC-V hardware. By leveraging Low-Rank Adaptation (LoRA), the system achieves 23% dynamic memory reduction and 15x parameter reduction compared to standard backpropagation, making parameter-efficient fine-tuning feasible on heterogeneous System-on-Chips (SoCs) at the extreme edge.
Accepted at DATE 2026 (Design, Automation and Test in Europe).