LLaVA Integration in Msty
In the latest version of Msty, we added support for LLaVA, Large Language and Vision Assistant, an open-source multi-modal model that is mostly use for "chatting" with images.
With most of the chat apps, you select a model as a main model and start chatting with it. The conversation then is based on that model. Msty takes it step further with how it integrates and uses LLaVA.
Once Msty senses that a LLaVA model is available, it makes the model smartly available making it part of your conversation.
To begin with, once one of the LLaVA model variants is available, Msty enables an "Attach Images" button and then coverts the message input box to a drag-and-drop area. But then, you can start with a base model and then summon LLaVA when you need it.
This doesn't take anything away from you because using it in the usual way is still possible, just start with LLaVA as a base model.
When you start with a different base model, if there are no images attached before a prompt is sent, the base model will be used. So basically, you can use LLaVA on-demand as and when needed. This is a very simple, intuitive, and creative way to use the model.
This opens many ways to creatively use the LLaVA model for what it is superb at - explaining images, and not use it for what it is not good at - for other creative prompts that are handled better by models such as Mistral, Mixtral, Llama2, and even Qwen, etc.
Try it out yourself to see how powerful and intuitive this is. Just go to Text Module page and download one of the LLaVA variants. You can start with the smallest 7B version to test the waters or go all in with the 34B version.
Here is a video that shows how it works where I ask LLaVA to explain 3 different images, and then I ask Mistral, the base model in this case, to write a beautiful story without ever leaving the chat. Notice how I ask the base model, Mistral, to summarize the story at the end as well as refine first output of LLaVA.
This opens many creative avenues; not just with this feature in itself but also all the "summoning" feature we could now start supporting in the future. How about "@codellama, write a function to do X", in the middle of writing a technical document? The possibilities are endless. And we are only getting started!