
Interfaces in Python
Some musings on how to use protocols and abstract classes as interfaces in Python.
I’ve been working on some applications of metaheuristics to solving optimization problems (more on that in a future post), and this led me into a rabbit hole on how to properly define my interfaces in Python. It started with a project at work where I’m building a logistics optimization module, using metaheuristics algorithms like simulated annealing. In anticipation of experimenting with various methods, as well as redefining the problem in different ways, I tried to find the correct way to abstract the setup into a Problem
interface, a Solution
interface and an Optimizer
interface.1
My first hunch was to use abstract classes to define the interfaces that my problem should follow. The idea behind abstract classes is to define the methods and attributes a family of classes is expected to have, while leaving the implementation to the children classes. For example, if we assume the Problem
interface should evaluate the cost function, and the Solution
interface should allow to coerce the abstract solution into a numerical vector, this is what it would look like:
To define the abstract classes, you just need to inherit from the ABC
class and decorate all the interface methods with @abstractmethod
. This decorator enforces that any descendant class will need to implement these methods explicitly. For example, if we want to implement some specific problems, we just subclass the abstract classes:
Here I chose the typical field theory problem of the potential in physics. The cost function associated to this problem provides a good approximation to the free energy of a system near a second-order phase transition: it applies to a wide ranging class of systems, from the physics of magnets with the ferromagnet-paramagnet transition at the Curie temperature to the Higgs potential responsible for subatomic particles having mass.2 This potential can be written in simple terms as
and it has two types of extrema, one when and the other types when . Above , the null vector solution is the only minimum, while below , it becomes a maximum and the second condition defines a manifold of solutions.
This is a great way to define abstract interfaces, but it requires a strict inheritance scheme where each implementation must be a subclass of the abstract interface base class. The second method which I considered is a bit more fluid, and is a formalization of the “duck typing” properties of Python.
In case you didn’t know, Python is a dynamically typed language where types can be swapped so long as they implement the same interface. Since Python introduced type hints, there has been a lot of effort towards making the language more statically typed. But beyond external static tools like mypy or pyright, these are not enforced at runtime3. Let’s assume you have a function like
where we type hint that the element
parameter should inherit from SomeClass
, which we can define as
Now let’s define some other class with the same method implemented, possibly in a different manner
In a statically typed language, if you plugged an instance of SomeOtherClass
to my_function
, you’d get an error. However, in Python, this would work fine, mainly because the two classes implement the same method, so calling the do_something
method is valid on both SomeClass
and on SomeOtherClass
instances.
Protocols are basically a formal version of this: you can define a protocol by subclassing the typing.Protocol
class, like this
Then if you redefine my_function
as
you can plug both instances of SomeClass
and SomeOtherClass
as argument, as well as instances from any other class implementing do_something
, without offending your favorite type checker. Note that from a runtime perspective, nothing has really changed, unlike with abstract classes.
One big advantage of protocols against abstract classes is modularity (which is why I ended up settling for them in the case of my optimization module). Abstract classes require strict inheritance, while protocols are automatically valid for any class implementing a given set of methods. So it’s very easy to define a protocol for each type of behaviour you are interested in, and have concrete classes implementing specific subsets of protocols.
For example, since there are different conceptual types of metaheuristics for optimization problems, a given problem and a given type of solution will need to implement different methods for each. In a previous post, I talked about the difference between simulated annealing (SA) and genetic algorithms (GA). The former relies on local search where a solution is randomly and incrementally changed, while the latter relies on various steps like mutation and genetic crossover. So schematically, you could consider a solution protocol like
Then an implementation of the SA algorithm would expect a solution class which implements the SupportsLocalSearch
protocol, while the GA algorithm would instead expect an implementation of the SupportsMutation
and SupportsCrossover
protocols4. In practice, when implementing solutions, I would implement both protocols if I want to compare both methods, like in our previous example:
This way, the PhiFourSolutionWithProtocols
class naturally implements all four protocols.
To summarize, both protocols and abstract classes are great ways to define interfaces in Python. The former requires that concrete implementations subclass the interfaces: they define a recipe for how interfaces should be related in a top-down manner. On the other hand, protocols are simply an acknowledgement that duck typing, a cornerstone of the Python language, are all you need to define interfaces. After all, interfaces are really just a way to define public behaviour of classes. Protocols help doing this formally and statically instead of trusting the runtime behaviour of our classes.
Footnotes
-
As a side note, this is a classic example of premature optimization. As it turns out, we settled on the first optimization algorithm we chose, and only needed to experiment with a couple of problem formulations. There was no need at the time to pre-emptively optimize and generalize the codebase. ↩
-
Although in this case, the field is more complicated than a simple real vector. ↩
-
There are some ways to do this with packages, see for example the excellent beartype. ↩
-
In this situation, you would probably end up creating a subprotocol which would inherit from
SupportsMutation
,SupportsCrossover
, andProtocol
to define classes compatible with genetic algorithms. ↩