"Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization"

Hongjing Niu*, Hanting Li, Bin Li, Feng Zhao* ;

Abstract


"Pre-training on large-scale datasets has become a fundamental method for training deep neural networks. Pre-training provides a better set of parameters than random initialization, which reduces the training cost of deep neural networks on the target task. In addition, pre-training also provides a large number of feature representations, which may help improve generalization capabilities. However, this potential advantage has not received enough attention and has been buried by rough fine-tuning. Based on some exploratory experiments, this paper rethinks the fine-tuning process and gives a new perspective on understanding fine-tuning. Moreover, this paper proposes some plug-and-play fine-tuning strategies as alternatives for simple fine-tuning. These fine-tuning strategies all preserve pre-trained features better by creating idling of some neurons, leading to better generalization."

Related Material


[pdf] [DOI]