TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning at the Extreme Edge

On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make training challenging, particularly for Transformer architectures.

This work introduces TrainDeeploy, a framework enabling efficient on-device training on RISC-V hardware. By leveraging Low-Rank Adaptation (LoRA), the system achieves 23% dynamic memory reduction and 15x parameter reduction compared to standard backpropagation, making parameter-efficient fine-tuning feasible on heterogeneous System-on-Chips (SoCs) at the extreme edge.

Accepted at DATE 2026 (Design, Automation and Test in Europe).

Run Wang
Run Wang
PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

Related