softmax ( dim =- 1 ) print ( "Label probs:", text_probs ) # prints: ] norm ( dim =- 1, keepdim = True ) text_probs = ( 100.0 * image_features text_features. norm ( dim =- 1, keepdim = True ) text_features /= text_features. encode_text ( text ) image_features /= image_features. encode_image ( image ) text_features = model. unsqueeze ( 0 ) text = tokenizer () with torch. get_tokenizer ( 'ViT-B-32' ) image = preprocess ( Image. create_model_and_transforms ( 'ViT-B-32', pretrained = 'laion2b_s34b_b79k' ) tokenizer = open_clip. Import torch from PIL import Image import open_clip model, _, preprocess = open_clip. Note that portions of src/open_clip/ modelling and tokenizer code are adaptations of OpenAI's official repository. We welcome anyone to submit an issue or send an email if you have any other requests or suggestions. If you found this repository useful, please consider citing. Model cards with additional model specific details can be found on the Hugging Face Hub under the OpenCLIP library tag. We provide more details about our full collection of pretrained models here, and zero-shot results for 38 datasets here. Some of our best models and their zero-shot ImageNet-1k accuracy are shown below, along with the ViT-L model trained by OpenAI. Many of our models and their scaling properties are studied in detail in the paper reproducible scaling laws for contrastive language-image learning. Using this codebase, we have trained several models on a variety of data sources and compute budgets, ranging from small-scale experiments to larger runs including models trained on datasets such as LAION-400M, LAION-2B and DataComp-1B. Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |