Codestral vs GPT-4o

Mistral's first-ever code model is out - Codestral. Although it's a little late to enter the market, we are glad that it's here nevertheless. Using Msty's split chats feature, let's compare how Codestral and GPT-4o compare at some coding challenges.

Code Completion

Like in our CodeGemma vs CodeLlama blog post, we start with something basic to see how models perform at simple tasks. For this instance, we asked the models to write code in JavaScript that checks if a string is a palindrome.

Codestral and GPT-4o write a function to check if a string is a palindrome in JavaScript

The models gave almost identical answers with the only notable difference being Codestral's choice of the regular expression used to remove non-alphanumeric characters. Both models provided appropriate explanations and code comments in their answers.

Code Review / Debugging

We gave the models the following C program and asked what it was doing and to identify any bugs if present:

#include <stdio.h>
int main() {
  int i;
  for (i = 0; i < 10; i++);
    {
      printf("i = %d\n", i);
    }
}
Codestral and GPT-4o debug a C program

The models pointed out that the code intends to print numbers from 0 to 9 but can't due to the presence of a semi-colon at the end of the for-loop declaration.

Although both models provided the corrected output, GPT-4o's response was a bit detailed with a step-by-step examination of the code. From our perspective, Codestral was very precise with its answer and got the point across with minimal explanation.

Unit Testing

Using the isPalindrome() function Codestral generated earlier, we asked the models to generate unit tests in Vitest.

Codestral and GPT-4o write unit tests in Vitest

GPT-4o generated many test cases for the given function while Codestral only provided a few. GPT-4o seems to excel at testing multiple edge cases like an empty string or a non-alphanumeric string.

Coding Interview Questions

To test the models' ability to follow instructions and converse in a human-like tone, we set the model instructions to the following:

You are an interviewee.

We then asked the following question:

What is the difference between the == and === operators in JavaScript?

Codestral and GPT-4o answer a coding interview question in JavaScript

As we can see, both models gave correct answers to the question but the tones that they responded with were vastly different. Codestral's answer read more like how you'd answer a question in an interview (esp. with how it started its response by saying 'Sure, I'd be happy to explain that') whereas GPT-4o's answer felt like you were writing an exam paper.

Overall, we can see that Codestral is excellent at what it is supposed to do - write and understand code. The response it generated was always very concise with correct code formatting applied where necessary.

Codestral also seems better at following instructions compared to the GPT-4o. Although GPT-4o is a powerful model, it often responds with longer answers - which may not always be preferable by the user unless asked to do so. As we saw earlier, we could probably leverage this to write unit test cases for our programs.

We are excited about the improvements that Codestral will bring to our workflow. If you would like to try it out, get started by downloading Msty today.

Haven't downloaded Msty yet?

Get started today. Available for Windows, MacOS, and Linux.