Abstract: The Contrastive Language-Image Pre-Training (CLIP) model, pretrained on large visual text corpora, has demonstrated significant improvements in visual and linguistic tasks and has been ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results