Back to articles
Interfaces in Python
Photo by Kelly Sikkema on Unsplash.

Interfaces in Python

Some musings on how to use protocols and abstract classes as interfaces in Python.

7 min read

I’ve been working on some applications of metaheuristics to solving optimization problems (more on that in a future post), and this led me into a rabbit hole on how to properly define my interfaces in Python. It started with a project at work where I’m building a logistics optimization module, using metaheuristics algorithms like simulated annealing. In anticipation of experimenting with various methods, as well as redefining the problem in different ways, I tried to find the correct way to abstract the setup into a Problem interface, a Solution interface and an Optimizer interface.1

My first hunch was to use abstract classes to define the interfaces that my problem should follow. The idea behind abstract classes is to define the methods and attributes a family of classes is expected to have, while leaving the implementation to the children classes. For example, if we assume the Problem interface should evaluate the cost function, and the Solution interface should allow to coerce the abstract solution into a numerical vector, this is what it would look like:

from abc import ABC, abstractmethod
import numpy as np

class Solution(ABC):
    """Abstract solution interface."""

    @abstractmethod
    def get_vector(self) -> np.typing.NDArray[np.float64]:
        pass

class Problem(ABC):
    """Abstract problem interface."""

    @abstractmethod
    def evaluate_cost(self, solution: Solution, *args, **kwargs) -> float:
        pass

To define the abstract classes, you just need to inherit from the ABC class and decorate all the interface methods with @abstractmethod. This decorator enforces that any descendant class will need to implement these methods explicitly. For example, if we want to implement some specific problems, we just subclass the abstract classes:

import numpy as np

class PhiFourProblem(Problem):
    """Problem describing the Phi-4 potential in physical systems."""

    def __init__(self, critical_temperature: float, a: float, b: float) -> None:
        self.critical_temperature = critical_temperature
        self.a = a
        self.b = b

    def evaluate_cost(self, solution: Solution, temperature: float) -> float:
        diff = temperature - self.critical_temperature
        vector = solution.get_vector()
        phi_squared = np.dot(vector, vector)
        potential = self.a * diff * phi_squared + (self.b / 2) * phi_squared ** 2
        return potential

class PhiFourSolution(Solution):
    """Solution of the Phi-4 potential as a vector of real numbers."""
    def __init__(self, vector: np.typing.ArrayLike) -> None:
        self.vector = np.array(vector)

    def get_vector(self) -> np.typing.NDArray[np.float64]:
        return self.vector

Here I chose the typical field theory problem of the ϕ4\phi^4 potential in physics. The cost function associated to this problem provides a good approximation to the free energy of a system near a second-order phase transition: it applies to a wide ranging class of systems, from the physics of magnets with the ferromagnet-paramagnet transition at the Curie temperature to the Higgs potential responsible for subatomic particles having mass.2 This potential can be written in simple terms as

V(ϕ)=a(T) (ϕϕ)+b2(ϕϕ)2 ,a(T)a(TTc)+O((TTc)2) ,b>0 , V(\phi) = a(T) ~ (\phi^\top \cdot \phi) + \frac{b}{2} (\phi^\top \cdot \phi)^2~, \quad a(T) \sim a \, (T - T_c) + \mathcal O((T - T_c)^2)~, \quad b > 0~,

and it has two types of extrema, one when ϕ=0\phi = \bm 0 and the other types when ϕ2=ab|\phi|^2 = -\frac{a}{b}. Above TcT_c, the null vector solution is the only minimum, while below TcT_c, it becomes a maximum and the second condition defines a manifold of solutions.

This is a great way to define abstract interfaces, but it requires a strict inheritance scheme where each implementation must be a subclass of the abstract interface base class. The second method which I considered is a bit more fluid, and is a formalization of the “duck typing” properties of Python.

In case you didn’t know, Python is a dynamically typed language where types can be swapped so long as they implement the same interface. Since Python introduced type hints, there has been a lot of effort towards making the language more statically typed. But beyond external static tools like mypy or pyright, these are not enforced at runtime3. Let’s assume you have a function like

def my_function(element: SomeClass) -> None:
    element.do_something()

where we type hint that the element parameter should inherit from SomeClass, which we can define as

class SomeClass:
    def do_something(self) -> None:
        print("Some action was done!")

Now let’s define some other class with the same method implemented, possibly in a different manner

class SomeOtherClass:
    def do_something(self) -> None:
        print("Some other action was done!")

In a statically typed language, if you plugged an instance of SomeOtherClass to my_function, you’d get an error. However, in Python, this would work fine, mainly because the two classes implement the same method, so calling the do_something method is valid on both SomeClass and on SomeOtherClass instances.

Protocols are basically a formal version of this: you can define a protocol by subclassing the typing.Protocol class, like this

from typing import Protocol

class CanDoSomething(Protocol):
    def do_something(self) -> None:
        ...

Then if you redefine my_function as

def my_function(element: CanDoSomething) -> None:
    element.do_something()

you can plug both instances of SomeClass and SomeOtherClass as argument, as well as instances from any other class implementing do_something, without offending your favorite type checker. Note that from a runtime perspective, nothing has really changed, unlike with abstract classes.

One big advantage of protocols against abstract classes is modularity (which is why I ended up settling for them in the case of my optimization module). Abstract classes require strict inheritance, while protocols are automatically valid for any class implementing a given set of methods. So it’s very easy to define a protocol for each type of behaviour you are interested in, and have concrete classes implementing specific subsets of protocols.

For example, since there are different conceptual types of metaheuristics for optimization problems, a given problem and a given type of solution will need to implement different methods for each. In a previous post, I talked about the difference between simulated annealing (SA) and genetic algorithms (GA). The former relies on local search where a solution is randomly and incrementally changed, while the latter relies on various steps like mutation and genetic crossover. So schematically, you could consider a solution protocol like

from typing import Protocol, Self

class SupportsGetVector(Protocol):
    def get_vector(self) -> np.typing.NDArray[np.float64]:
        ...

class SupportsLocalSearch(Protocol):
    def local_search(self, *args, **kwargs) -> Self:
        """Returns a new instance of the solution in a local neighborhood."""
        ...

class SupportsMutation(Protocol):
    def mutate(self, *args, **kwargs) -> Self:
        """Returns a new instance of the solution after applying a random mutation."""
        ...

class SupportsCrossover(Protocol):
    def crossover(self, other_solution: Self, *args, **kwargs) -> tuple[Self, Self]:
        """Returns a new instance of the solution after applying a random mutation."""
        ...

Then an implementation of the SA algorithm would expect a solution class which implements the SupportsLocalSearch protocol, while the GA algorithm would instead expect an implementation of the SupportsMutation and SupportsCrossover protocols4. In practice, when implementing solutions, I would implement both protocols if I want to compare both methods, like in our previous example:

import numpy as np

class PhiFourSolutionWithProtocols:
    """Solution of the Phi-4 potential as a vector of real numbers."""

    def __init__(self, vector: np.typing.ArrayLike) -> None:
        self.vector = np.array(vector, dtype=np.float64)

    def get_vector(self) -> np.typing.NDArray[np.float64]:
        return self.vector

    def local_search(self, *args, **kwargs) -> Self:
        random_vec = np.random.random(self.vector.shape)
        perturbed_vec = self.vector + random_vec
        return type(self)(perturbed_vec)

    def mutate(self, *args, **kwargs) -> Self:
        random_vec = np.random.random(self.vector.shape)
        perturbed_vec = self.vector + random_vec
        return type(self)(perturbed_vec)

    def crossover(self, other_solution: Self, *args, **kwargs) -> tuple[Self, Self]:
        parent_vector_1 = self.vector
        parent_vector_2 = other_solution.get_vector()
        crossover_point = min(parent_vector_1.shape[0], parent_vector_2.shape[0]) // 2
        child_vector_1 = np.concatenate((parent_vector_1[:crossover_point], parent_vector_2[crossover_point:]))
        child_vector_2 = np.concatenate((parent_vector_2[:crossover_point], parent_vector_1[crossover_point:]))

        return type(self)(child_vector_1), type(self)(child_vector_2)

This way, the PhiFourSolutionWithProtocols class naturally implements all four protocols.

To summarize, both protocols and abstract classes are great ways to define interfaces in Python. The former requires that concrete implementations subclass the interfaces: they define a recipe for how interfaces should be related in a top-down manner. On the other hand, protocols are simply an acknowledgement that duck typing, a cornerstone of the Python language, are all you need to define interfaces. After all, interfaces are really just a way to define public behaviour of classes. Protocols help doing this formally and statically instead of trusting the runtime behaviour of our classes.

Footnotes

  1. As a side note, this is a classic example of premature optimization. As it turns out, we settled on the first optimization algorithm we chose, and only needed to experiment with a couple of problem formulations. There was no need at the time to pre-emptively optimize and generalize the codebase.

  2. Although in this case, the field is more complicated than a simple real vector.

  3. There are some ways to do this with packages, see for example the excellent beartype.

  4. In this situation, you would probably end up creating a subprotocol which would inherit from SupportsMutation, SupportsCrossover, and Protocol to define classes compatible with genetic algorithms.