Most and Least Retrievable Images in Visual-Language Query Systems

Liuwan Zhu, Rui Ning, Jiang Li, Chunsheng Xin, Hongyi Wu ;

Abstract


"This is the first work to introduce the Most Retrievable Image(MRI) and Least Retrievable Image(LRI) concepts in modern text-to-image retrieval systems. An MRI is associated with and thus can be retrieved by many unrelated texts, while an LRI is disassociated from and thus not retrievable by related texts. Both of them have important practical applications and implications. Due to their one-to-many nature, it is fundamentally challenging to construct MRI and LRI. This research addresses this nontrivial problem by developing novel and effective loss functions to craft perturbations that essentially corrupt feature correlation between visual and language spaces, thus enabling MRI and LRI. The proposed schemes are implemented based on CLIP, a state-of-the-art image and text representation model, to demonstrate MRI and LRI and their application in privacy-preserved image sharing and malicious advertisement. They are evaluated by extensive experiments based on the modern visual-language models on multiple benchmarks, including Paris, ImageNet, Flickr30k, and MSCOCO. The experimental results show the effectiveness and robustness of the proposed schemes for constructing MRI and LRI."

Related Material


[pdf] [supplementary material] [DOI]