CodeGemma Vs CodeLlama

Google recently released the CodeGemma model, an LLM trained to handle coding-related tasks. It claims the model to be faster and more performant than CodeLlama, Meta's equivalent Code LLM (see benchmarks below). We decided not to take their word for it and ran tests of our own using Msty's convenient split chats feature.

Note: We will be using the 7B and 7B Instruct variants with similar configurations for both models. If you are interested in exploring further, CodeGemma and CodeLlama are available for download directly through Msty.

Code Completion

Let's start with one of the most basic (and probably one of the first) tasks we learn to solve as programmers - writing a function to reverse a string. Let's ask CodeGemma and CodeLlama to write it for us this time in JavaScript since most of us are already familiar with the programming language.

CodeGemma and CodeLlama write a function to reverse a string in JavaScript

CodeGemma answered with included explanations whereas CodeLlama just provided the answer. Also notable is the approach to the problem by the two models. CodeGemma provided a manual solution without using JavaScript's built-in methods. CodeLlama chose to use the built-in reverse() method.

Code Review / Debugging

LLMs can be useful in spotting errors in data that could be overlooked by the human eye. In this particular example, we asked the models to debug a VueJS code snippet where we directly modify the value of a prop.

CodeGemma and CodeLlama debug a VueJS code

CodeGemma promptly identified the problem with our code and suggested that we should not modify the value of a prop in a child component. On the other hand, CodeLlama's response was a bit off-topic. It reported that the problem was actually with our syntax rather than the implementation. Both of the syntaxes are correct but CodeLlama completely missed out on the main problem with our code.

Unit Testing

Using the reverseString() function that CodeGemma generated earlier to reverse a string in JavaScript, let's ask our models to write test cases for us.

CodeGemma and CodeLlama generate unit tests for a function that reverses a string in JavaScript

We noticed that CodeGemma's test cases were pretty comprehensive and covered various edge cases - while CodeLlama's tests only covered a few basic scenarios.

Text-to-SQL Generation

Code LLMs excel at generating complex database queries. Some models like DuckDB NSQL and SQL Coder are specifically trained for this purpose. In the following example, we gave CodeGemma and CodeLlama a MySQL schema that tracks the attendance of students in classrooms and asked them both to write a query to get the total attendance of a particular classroom on a particular date.

CodeGemma and CodeLlama generate a MySQL query given a schema

CodeLlama took a shortcut approach to the solution and directly queried the attendance table where classroom_id was 'MEB-1'. This would not work as expected because the classroom_id column values are of INT type. The correct solution would be the one proposed by CodeGemma where you merge the tables and then filter on the name of the classroom after the join.

Coding Interview Questions

The Instruct variants of LLMs are trained to respond in natural language which allows them to converse in a more human-friendly manner. This means we can leverage the models for tasks like interview preparation and quizzes. We can also set system instructions on the models to further fine-tune their conversational style.

For this exercise, we used the 7B Instruct variants of the models and asked them a question about space complexity for a function from Cracking the Coding Interview book. The given function adds adjacent elements between 0 and n.

CodeGemma Instruct and CodeLlama Instruct analyze space complexity for a function that adds adjacent elements between 0 and n

The correct answer for the space complexity of this function is O(1) but CodeLlama insisted that the complexity is O(n). While both models mentioned that the for loop iterates from 0 to n-1, only CodeGemma noted that this doesn't create new objects or data structures in memory thus resulting in the space complexity of O(1).

Overall, we can see that Google's CodeGemma 7B is much superior to CodeLlama 7B in handling diverse coding problems. It's also better at writing unit tests and analyzing code issues and complexities.

During our tests, CodeLlama answered most queries incorrectly and its correct answers were often either limited or it didn't cover a lot of edge case scenarios.

CodeLlama also seemed to struggle with providing language-aware markdown formatting for generated code snippets.

That's it for CodeGemma vs CodeLlamma! You can explore more LLMs and compare their responses side-by-side by downloading Msty and using our split chats feature.

This table compares the performance of CodeGemma with other similar models on both single and multi-line code completion tasks. Source: Google for Developers.

Interact with any AI model with just a click of a button