Vision Models - Claude vs. GPT 4 Turbo vs. Llava

Vision models are a class of models that are designed to understand and interpret visual data. They are used in a wide range of applications, including image classification, object detection, etc. In this blog post, we will compare three popular vision models - Claude Haiku from Anthropic, GPT 4 Turbo from OpenAI, and Llava, an open-source model that runs locally. We will compare their performance on image interpretation, profile picture assessment, and identifying complex objects. We will also compare the cost of using these models.

Image Interpretation

Just to quickly see how these models interpret an image, we fed them a picture of TinyLlama's logo using Msty's split chats feature.

This was the prompt:

Explain this image

Here are the results:

Claude, ChatGPT, and Llava interpret an image of TinyLlama's logo

Claude's Haiku interpretation:

The image depicts a colorful, stylized illustration of a cute and energetic-looking creature that appears to be a fantastical animal, possibly a type of llama or alpaca. The creature has a vibrant, fluffy coat in shades of orange, red, and purple, with large expressive eyes and pointed ears. The background is filled with dynamic bursts of color and abstract shapes, creating a sense of movement and energy around the creature. The overall style of the illustration has a whimsical, playful, and visually striking quality, capturing the lively and joyful nature of the depicted character.

Llava's interpretation:

This is an illustration of a cute, stylized llama or vicuña character that seems to be designed with a friendly and whimsical appearance. The creature has colorful fur in shades of pink, yellow, orange, and white, giving it a playful look. It appears to be running towards the viewer with its front legs raised slightly off the ground as if it's in mid-gallop or dashing forward.

The background is abstract and features vibrant colors that blend together without clear boundaries, which creates a sense of motion and energy around the character. The artwork is likely intended for entertainment purposes, possibly for children given its cartoonish style and appealing nature. It could also be related to video games or animated content where such characters are commonplace.

ChatGPT's interpretation:

This image features a stylized, animated depiction of an alpaca. The alpaca, drawn in a vibrant and energetic style, appears to be leaping or bounding joyfully. It is surrounded by dynamic, colorful bursts and splashes, suggesting motion and energy. The color palette is rich, with shades of orange, red, and purple, which adds a playful and magical quality to the depiction. Overall, the artwork is cheerful and captures a light-hearted, whimsical essence, typical of fantasy or cartoon illustrations.

All 3 models were able to identify the image as a logo and provided some additional context, but, and this is a bit subjective, Llava's interpretation was better than the other in our opinion. It explained not just Llama but also explained the background a bit.

Profile Picture Assessment

For the second prompt, we used was a picture from and ask the models to assess the traits of the person in the image and make some actionable improvement suggestions.

This was the prompt:

Assess the professionalism and approachability of my profile picture; provide actionable improvement suggestions.

These were the results:

Claude, ChatGPT, and Llava assess the traits of a person in an image

Claude's Haiku assessment:

The profile picture appears professional and approachable. The person in the image is smiling warmly, which conveys a friendly and welcoming demeanor. The neutral background and attire suggest a professional setting. A few potential improvements could be:

1.Ensuring the lighting is even and flattering, as there are some shadows on the face that could be minimized.

2.Considering a slightly more formal or business-appropriate outfit, such as a collared shirt or blouse, to further emphasize the professional tone.

3.Experimenting with different camera angles or framing to potentially showcase more of the upper body or a wider shot of the surroundings.

Overall, the profile picture effectively communicates a positive, professional, and approachable impression. With a few minor adjustments, it could be even more impactful.

Llava's assessment:

The image you've shared is quite pleasant and approachable. Here are some suggestions to enhance your professional profile picture:

1.Smile: Your smile is genuine and warm, which helps create a friendly first impression. However, it might be helpful to ensure that both your upper and lower teeth can be seen for more balance in the shot. This creates a sense of openness and approachability.

2.Eye Contact: While you make eye contact with the camera, maintaining direct gaze while also looking at whoever is viewing your profile picture might help create a stronger connection with potential contacts or employers.

3.Hair and Makeup: Your hair looks neatly styled and your makeup enhances your natural beauty without overpowering it. If you're seeking to convey more professionalism, consider having a hairstyle that is not too casual, depending on the industry norms of the field you are in.

4.Attire: Your attire seems appropriate for a professional setting and does not distract from your face or message. However, if you're going for a more polished look, ensure clothing is neatly fitted without any visible wrinkles that could be perceived as unprofessional.

5.Background: The background is simple and doesn't detract attention away from the subject (you). This is ideal for professional use. However, if you want to make a statement about your personality or interests, consider having an image with more context that could be relevant to what you do professionally without being too busy or distracting.

6.Confidence: You radiate confidence in the picture which is important for professional networking and job interviews. Maintain this confident demeanor during your interactions, both online and offline.

Remember, these are just suggestions to enhance what seems like a strong profile picture already!

ChatGPT's assessment:

Your profile picture already exudes a sense of approachability due to your friendly smile and direct eye contact, which are great traits for making a positive impression. Additionally, your overall appearance is neat and presentable, contributing to a professional image.

Here are a few actionable suggestions for potentially enhancing the professionalism and approachability of your profile picture:

1.Background Considerations: The current background, although not very distracting, features elements of foliage that might be slightly casual depending on the professional context. Choosing a more neutral background or one that is a solid color could convey a more formal and polished image.

2.Lighting: The lighting in the current photo is good, but ensuring even lighting on the face can further improve visibility and emphasis on facial expressions, providing a smoother and more consistent image.

3.Clothing and Accessories: While what you are wearing isn't fully visible in the picture, ensuring that your attire in the photo aligns with professional attire customary in your field can enhance the professional aspect. For accessories, simple and understated options typically work best for professional settings.

4.Hairstyle: Your current hairstyle looks neat and suits you well; just ensure that hair does not obscure your face, as clear visibility of the face is key for approachable and professional images.

5.Facial Expression: You have a great smiling expression which is very welcoming. For different professional contexts, consider a range of expressions from a warm, approachable smile to a more subdued smile to adjust the level of formality.

6.Photography Quality: Your current picture is of high quality. Always ensure high resolution and sharp focus for a professional finish. Avoid pixelation or blurring.

These tips should help adjust or fine-tune your profile picture depending on various professional environments you may wish to fit into.

Claude and GPT 4 both provided a detailed assessment of the person's traits and made some actionable improvement but the best assessment and suggestions came from Llava. It was able to provide a more detailed analysis of the person's traits and also suggested some actionable improvements. GPT 4's suggestions were really good as well but Claude's suggestions weren't in depth. This probably is due to the fact that we are using the base model of Claude - the Haiku model.

Identifying and Classifying Complex Objects

As a final test, we provide all three models with an image of a complex object - an aquarium with various fish, plants, and rocks. We asked them to identify the objects in the image and categorize them.

This was the prompt:

Explain this image. For any objects identified, make a table of what they are and their colors and what kingdom they belong to.

Here are the results:

Claude, ChatGPT, and Llava identify and classify objects in an image of an aquarium

Claude's Haiku assessment:

The image depicts an aquarium or fish tank filled with a variety of aquatic plants and fish. Let's examine the objects identified in the image and create a table with their names, colors, and the kingdom they belong to.

AngelfishOrange, white, and blackAnimalia (Animal)
TetrasRed and blackAnimalia (Animal)
Aquatic plantsGreenPlantae (Plant)
DriftwoodBrownAnimalia (Animal)
Gravel/substrateBlackAbiotic (Non-living)
Aquarium equipment (tubing, etc.)BlackAbiotic (Non-living)

The image depicts a well-established and diverse freshwater aquarium ecosystem, with various species of tropical fish, such as angelfish and tetras, swimming among the lush aquatic plants and driftwood. The plants and gravel/substrate provide a natural-looking environment for the fish to thrive in.

Llava's assessment:

This image shows an aquarium containing a diverse collection of tropical fish. The fish in the foreground are primarily yellow and red, suggesting that they may be species from the family Poeciliidae, such as guppies or mollies, which exhibit bright colors to signal their fitness and attract mates.

In addition to these brightly colored fish, there are also other less colorful ones in the background. These could belong to different families of tropical fish. The plants within the aquarium provide a naturalistic setting for these animals, helping them blend into their environment while providing shelter and hiding spots.

Here is a table identifying some objects in the image based on their colors:


ChatGPT's assessment:

The provided image is of a beautifully maintained aquarium featuring an assortment of fish and aquatic plants. The aquarium has fish of various sizes and colors, and plants that enhance its ecological aesthetics, alongside some aquarium equipment like heaters and possibly a filtration system.

Here is a table describing some visible objects in the image along with their color and biological kingdom:

Fish (Angelfish)Black, white, orangeAnimalia
Fish (Tetras)Red, black, yellowAnimalia
Aquarium PlantsGreenPlantae
Substrate (Gravel)BlackN/A (Mineral)
DriftwoodBrownN/A (Material)
HeaterBlack, silverN/A (Man-made)

The aquarium scene depicted forms a small ecosystem, providing a habitat that mimics a natural environment for the various species residing within. The diversity of plants helps in maintaining the oxygen level and the water quality, which is essential for the health of the fish. The colors of the fish, ranging from stark oranges to subtle greens, add a vivid dynamic to the overall aesthetic appeal of the aquarium.

All three models were able to identify the objects in the image and provide a detailed classification. I wasn't expecting them to identify each and every fish and plants in the image. But I was expecting a bit more than what I got. However, this time GPT 4 Turbo was the best at identifying the objects and providing a detailed classification. It even identified Driftwood, Heater, and Gravel and their colors. Claude wasn't that far behind and given it was the cheapest and smallest model, it did a very impressive job. Llava, on the other hand, was the worst at identifying the objects and providing a detailed classification.

Cost Comparison

Now, let's compare the cost of using these models. Claude is the cheapest model, with a cost of $0.00001351 per 200x200px image. ChatGPT is the most expensive model, with a cost of $0.00255 per 150x150px image. Llava, on the other hand, is a free model since it runs locally. Of course, there is a cost associated with running the model on your local machine, but it is significantly cheaper than using a cloud-based model like Claude or ChatGPT. If you are doing a lot of image processing, it might be worth investing in a local model like Llava.

ModelImage Size$ / Image$ / 1K Images
Claude Haiku200x200 px$0.0000135$0.01
Claude Sonnet200x200 px$0.000162$0.16
Claude Opus200x200 px$0.00081$0.81
Claude Haiku1000x1000 px$0.000334$0.33
Claude Sonnet1000x1000 px$0.004$4.00
Claude Opus1000x1000 px$0.02$20.01
Claude Haiku192x1092 px$0.000467$4.77
Claude Sonnet1092x1092 px$0.00477$4.77
Claude Opus1092x1092 px$0.02385$23.85
GPT4 Vision150x150 px$0.00255$2.55

