HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation

Noranart Vesdapunt*, Kah Kuen Fu, Yue Wu, Xu Zhang, Pradeep Natarajan ;

Abstract


"Recent advancement in the large-scale image-text pre-training model (such as CLIP) has significantly improved unsupervised domain adaptation (UDA) by leveraging the pre-trained knowledge to bridge the source and target domain gap. However, Catastrophic forgetting still remains to be the main challenge, since traditional fine-tuning method to adjust CLIP model weights on a target domain can quickly override CLIP’s pre-trained knowledge. To address the above issue, we propose to convert CLIP’s features into high-dimensional vector (hypervector) space to utilize the robustness property of hypervector. We first study the feature dimension size in the hypervector space to empirically find the dimension threshold that allows enough feature patterns to be redundant to avoid excessive training (thus mitigating catastrophic forgetting). To further utilize the robustness of hypervector, we propose Discrepancy Reduction to reduce the domain shift between source and target domains, and Feature Augmentation to synthesize labeled target domain features from source domain features. We achieved the best results on four public UDA datasets, and showed the generalization of our method to other applications (few-shot learning, continual learning). The proposed method also shows model-agnostic property across vision-language and vision backbones."

Related Material


[pdf] [supplementary material] [DOI]