TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning at the Extreme Edge

Last updated on Mar 12, 2026

On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make training challenging, particularly for Transformer architectures.

This work introduces TrainDeeploy, a framework enabling efficient on-device training on RISC-V hardware. By leveraging Low-Rank Adaptation (LoRA), the system achieves 23% dynamic memory reduction and 15x parameter reduction compared to standard backpropagation, making parameter-efficient fine-tuning feasible on heterogeneous System-on-Chips (SoCs) at the extreme edge.

Accepted at DATE 2026 (Design, Automation and Test in Europe).

Research

TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning at the Extreme Edge

Run Wang

PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

Related

TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning at the Extreme Edge

Run Wang

PhD Student at Integrated Systems Laboratory (IIS), ETH Zürich

Related

Publications

TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge