Back to articles

Building an LLM-based CLI tool to generate commit messages

A simple afternoon project to familiarize myself with the Ollama API.

12 min read

Introduction

I’ve been chronically late to the LLM party. When ChatGPT first became public, it took me a while to even try it. I heard a lot about it, but I tend to be quite suspicious of tech hypes, so I took my time. Then things accelerated. By now, large language models are everywhere, and I’ve used an AI bot out of convenience (when it’s useful1), I’ve integrated Github Copilot in my IDE, and I’ve also studied the theory behind LLMs during my path to becoming a data scientist.

But somehow, I still don’t see the appeal of making everything “AI-something”. I still keep those chats as a last resort option when all else fails, and I mostly use copilot as a (great) auto-complete for boring tasks I really dislike, like writing Python docstrings. But this all changed (slightly) recently: the last part of the training we have at work is on LLMs, and it culminates with an LLM-focused project.

During the introductory part of the module, we went through practical examples of how to use these models both locally and through an API, what the best practices in model development are, as well as deep dives about tuning and prompt engineering. We also covered all sorts of RAG examples, and spent a healthy amount of time on evaluating the models.2 And this is when it clicked! The reason I’ve struggled with adopting LLMs in my own workflow is because of the chat UX. I like my tools to be simple, functional interfaces, and somehow having to explain my problem through a chat, and then copy/paste code or whatever is just very unpleasant to me. But now that I know how to go beyond that, how to use models in a more programmatic way, it all feels like a new world opened to me.

So this is how this little exercise/afternoon project was born. I wanted to play around with a local model — I chose Ollama for this, but I appreciate how most frameworks for these tasks are so easy to use you can just swap them out — and I looked for a real problem I can solve with this. The second thing I hate the most after docstrings are commit messages, and I figured it would be great if I could get an assistant to help me write them.3 Of course it’s a bit of an overkill solution for such a simple problem, but simplicity was just not one of my criteria of choice that day.

In this blog post, I just want to show you how I made this project, but before I start let me open with the fact that this is not original by any measure (in fact I got the idea from a recent HackerNews post) and there are most likely smarter or better ways to do what I did. In particular, this project and this other project were a great source of inspiration. However, I often find that when it comes to specialized scripts or tools you want to integrate tightly in your workflow, writing them yourself from scratch is often better than using ready-made ones, because you can tweak it and adapt it closer to your needs as those change.

LLM commit

Without further ado, let’s dive into the project! I wanted to build a simple CLI tool so I used the click library to make my life easier. Then I set out to define the parameters and behaviour of the tool:

> llm-commit --help
Usage: llm-commit [OPTIONS]

  CLI tool to analyze git diffs and generate a commit message using a local LLM.

Options:
  --model TEXT
  --max-size-diff INTEGER
  --message-max-length INTEGER
  --type-commit TEXT
  --temperature FLOAT
  --top-p FLOAT
  --top-k INTEGER
  --num-predict INTEGER
  --help                        Show this message and exit.

As you can see, I wanted to be able to quickly change the LLM parameters such as temperature, top-p, top-k values to tweak the output of the model. The type-commit parameter can be used to enforce the correct type in the conventional commit format if you already know what you want. When you have to deal with so many inputs, validation can quickly become a pain. For that of course, you should use Pydantic, and here in particular I used pydanclick to specify the CLI parameters using Pydantic BaseModel-inherited objects like this:

@click.command()
@from_pydantic(ToolConfiguration)
@from_pydantic(ModelOptions)
def entrypoint(tool_configuration: ToolConfiguration, model_options: ModelOptions):
    pass

The two objects are ToolConfiguration, which defines the general parameters, and ModelOptions, which defines model-specific options. To make this work, you then need to define these classes somewhere:

from pydantic import BaseModel, confloat, conint

class ToolConfiguration(BaseModel):
    """Configuration class.

    Attributes:
        model (str): Name of model to use (available through Ollama). Defaults to Qwen2.5-coder with 3B parameters.
        max_size_diff (int): Maximum size of the git diff to use before truncating. Should be an integer greater than 1. Defaults to 4096.
        message_max_length (int): Maximal length of the commit message. Should be an integer greater than 1. Defaults to 150.
        type_commit (str | None): Type of the commit to guide the LLM. Defaults to None.
    """

    model: str = "qwen2.5-coder:3b"
    max_size_diff: conint(ge=1) = 4096
    message_max_length: conint(ge=1) = 150
    type_commit: str | None = None

class ModelOptions(BaseModel):
    """Class to store options to give the model.

    Attributes:
        temperature (float): Temperature of the model. Should be between 0.0 and 1.0. Defaults to 0.4.
        top_p (float): Top p cumulative probability truncation. Should be between 0.0 and 1.0.  Defaults to 0.9.
        top_k (int): Top k tokens truncation. Should be greater than 1. Defaults to 40.
        num_predict (int): Maximum number of tokens for the LLM to predict. Should be positive or -1 (no limit).  Defaults to 2048.
    """

    temperature: confloat(ge=0.0, le=1.0) = 0.8
    top_p: confloat(ge=0.0, le=1.0) = 0.9
    top_k: conint(ge=1) = 40
    num_predict: conint(ge=-1) = 2048

Let’s take a moment here to discuss some of these choices. I didn’t go through a systematic check of which parameter is best, but instead modified the defaults to improve the results I was getting as I was developping the package. In a more structured project, you might want to actually have a strategy to select the LLM hyperparameters. So what did I choose?

  • model for which I chose the qwen2.5 family as there are lots of small models which fit in my GPU-poor laptop, with the coder edition since I want to analyze code diffs. I initially tried the 1.5B version but the results were poor, and then eventually found that the 3B one worked so much better.
  • max_size_diff is to truncate the diffs if they get too big to avoid overflowing inside the context window. In practice I haven’t needed to touch this (I tend to commit often and aim for small diffs), but you might need to tweak it to your taste.
  • message_max_length is to restrict the number of words you want to include inside the commit message description.
  • type_commit which I previously explained.

As for the more model specific parameters, we have

  • temperature defines how stochastic the model will behave. I found that 0.8 works well for my needs, I didn’t get any hallucination and the model is creative enough to actually describe the commit instead of listing keywords for the changes.
  • top_p, top_k, num_predict (the number of tokens to produce) I didn’t touch too much, these values were fine for my needs.

The rest of the code in the project then goes rather simply:

  1. Call the command git diff --cached, catch the output and truncate it so it fits insize the context window using
import subprocess
import sys
import click

def get_git_diff_raw() -> str:
    """Return the raw git diff from staged files."""
    result = subprocess.run(["git", "diff", "--cached"], capture_output=True, text=True)
    if result.returncode != 0:
        click.echo(f"Error running the command `git diff --cached`, return code: {result.returncode}")
        sys.exit(1)
    return result.stdout

def truncate_git_diff(git_diff: str, max_size: int) -> str:
    """Truncates the output from the git diff command if it is larger than the requested size."""
    max_size = min(max_size, len(git_diff))
    return git_diff[:max_size]
  1. Craft the user prompt using the git diff and the type_commit parameter, through the function
def get_prompt(git_diff: str, length_git_commit: int, type_commit: str | None = None) -> str:
    """Returns the prompt to feed the LLM.

    Args:
        git_diff (str): Truncated output of the `git diff --cached` command.
        length_git_commit (int): Maximal length in words of the commit message
        type_commit (str): Type of commit to guide the LLM. Defaults to None.

    Returns:
        str: User prompt specifying the query.
    """
    type_commit_str = f"- Make sure to choose {type_commit} as value for TYPE" if type_commit is not None else ""
    return f"""Summarize the diff placed between the XML tags <diff>, following these guidelines:
    - You must keep the response under {length_git_commit} words. 
    - You must focus on why the changes were made.
    - Place the summary inside <summary> XML tags.
    {type_commit_str}
    
<diff>{git_diff}</diff>
"""
  1. Send the prompt to the LLM and get the answer. In the user prompt (see above), I requested that the commit message should be inside XML <summary> tags, so the message must be extracted from the full response, and it should be cleaned from symbols which could interfere later on in the shell
def parse_output(model_response: str, message_max_length: int) -> str:
    """Parses the response of the model to extract the commit message. Raises a ValueError if no XML `<summary>` tag can be found. Truncate the commit message

    Args:
        model_response (str): Response of the LLM.
        message_max_length (int): Maximal length of the commit message. The message will be automatically truncated above this.

    Returns:
        str: Commit message returned by the LLM.

    """
    if "<summary>" not in model_response or "</summary>" not in model_response:
        raise ValueError("No summary found in the model response.")

    # Extract the commit message from the response
    tag_name = "summary"
    position_begin = model_response.find(f"<{tag_name}>") + 2 + len(tag_name)
    position_end = model_response.find(f"</{tag_name}>")
    if position_begin > position_end:
        raise ValueError("Invalid placement of <summary> tags in the model response.")
    commit_message = model_response[position_begin:position_end]
    commit_message = clean_commit_message(commit_message)

    # Truncate the commit message if necessary
    truncated_commit = truncate_sentence(commit_message, max_words=message_max_length)
    return truncated_commit

def truncate_sentence(sentence: str, max_words: int) -> str:
    """Given a sentence, truncates it if it exceeds `max_words` and ensure it ends as a sentence."""
    sentence_split = sentence.split(" ")
    max_words = min(max_words, len(sentence_split))
    sentence_reconstructed = " ".join(sentence_split[:max_words])
    return sentence_reconstructed


def clean_commit_message(message: str) -> str:
    """Cleans the commit message by removing backticks and extra whitespaces."""

    message_no_backticks = message.replace("`", "'").replace('"', "'")
    message_no_whitespace = message_no_backticks.strip().replace("  ", " ")
    return message_no_whitespace
  1. Finally print the candidate commit message to the user and ask if they want to finalize the commit. If so, the commit is made using:
import subprocess
import sys
import click

def set_git_commit(commit_message: str) -> None:
    """Set a commit with the given commit message."""
    result = subprocess.run(["git", "commit", "-m", '"commit_message"'])
    if result.returncode != 0:
        click.echo(f'Error running the command `git commit -m "{commit_message}"`, return code: {result.returncode}')
        sys.exit(1)

Okay now all the pieces are in place, they just need to be linked together inside the entrypoint function like this:

import sys
from ollama import generate
from pydanclick import from_pydantic

@click.command()
@from_pydantic(ToolConfiguration)
@from_pydantic(ModelOptions)
def entrypoint(tool_configuration: ToolConfiguration, model_options: ModelOptions):
    """CLI tool to analyze git diffs and generate a commit message using a local LLM."""

    # Get the git diff of staged files
    git_diff = get_git_diff_raw()
    # Handle empty git diff
    if len(git_diff) == 0:
        click.echo("Empty `git diff`, nothing to do...")
        sys.exit(0)

    git_diff_truncated = truncate_git_diff(git_diff=git_diff, max_size=tool_configuration.max_size_diff)

    # Build prompt and query the LLM
    prompt = get_prompt(
        git_diff=git_diff_truncated,
        length_git_commit=tool_configuration.message_max_length,
        type_commit=tool_configuration.type_commit,
    )
    response = generate(
        model=tool_configuration.model, prompt=prompt, system=get_system_prompt(), options=model_options.model_dump()
    )
    response_content = response["response"]
    try:
        parsed_response = parse_output(response_content, message_max_length=tool_configuration.message_max_length)
        click.echo(parsed_response)
        answer = input("Would you like to make a commit with this message? [y/N]: ")
        if answer.lower() == "y":
            set_git_commit(parsed_response)
    except ValueError as error:
        click.echo(str(error))
        sys.exit(0)

With the Ollama API, it’s very easy to interact with the model through the generate function, and you can override pretty much any option by specifying it inside the options dictionary. The system prompt can also be overriden and for this tool, I used the following prompt4

def get_system_prompt() -> str:
    """Returns the system prompt."""
    system_prompt = """You are a helpful assistant specialized in summarizing code diffs and crafting concise git commit messages. When writing a commit message, think carefully and write a summary in the format:
<summary>TYPE: SUMMARY</summary>
The TYPE parameter can be:
- 'feat' for changes involving new features, 
- 'fix' for small bugfixes,
- 'refactor' for changes which rewrite or restructure the code without changing the logic,
- 'perf' for changes aimed at improving performance,
- 'test' for changes adding, removing or modifying tests,
- 'docs' for changes affecting the documentation, comments or docstrings,
- 'build' for changes affecting the build system such as build tools, CI pipelines, dependencies, project version and manifest.

If the user does not specify a value for TYPE, choose one that matches the changes.
"""
    return system_prompt

And here we go, with these simple ingredients, you have a versatile LLM assistant to summarize your code changes. So what does it look like? Well a GIF is worth 1000 words per frame:

Summary

As I mentioned at the beginning of this post, this was a small afternoon project, but it fills a real need for me and it encapsulated well how LLMs can act as assistant in our day-to-day lives, through simple programs. I really enjoyed writing this tool, and I really encourage anyone who wants to learn how to work with large language models to try their hands with this kind of small projects: this is the best way to learn and get some intuition behind their, somewhat mysterious at times, behaviour.

NOTE

All the code for this project can be found on this Github repository.

Footnotes

  1. I tend to really like LeChat by Mistral. Not only is it European, but I salute their open-source contributions.

  2. As far as I can see, this is something which is often skipped or ignored, and honestly I’m surprised any company would put an LLM in production without strict checks.

  3. I’ve read some very valid opinions on what exactly should be used as a commit message (and I daresay there is little consensus here beyond the agreement that it should be useful), and I agree that full automation of it makes little sense, as an LLM will rarely truly catch your intent (the why), and just end up describing what changed (the what). I personally use it to speed up the process, not as a replacement.

  4. I used the system prompt in this project for inspiration.