Linklog
A curated collection of links and resources I have found over time.
February 2025
- Prototyping in Rust (corrode.dev)
This article presents various pieces of advice and tips on how to efficiently write Rust code at the prototype stage. Most introductory material on the language focuses on "proper use of syntax." But prototyping is often a compromise between code quality and coding efficiency, and this article makes some great suggestions on how to do that.
- Deep dive into LLMs like ChatGPT by Andrej Karpathy (TL;DR) | Anfal Mushtaq (anfalmushtaq.com)
A TL;DR version of Andrej Karpathy's "Deep dive into LLMs like ChatGPT" video. Manages to keep the essentials but presents them in digestible clear chunks.
- (Ab)using General Search Algorithms on Dynamic Optimization Problems (dubovik.eu)
An interesting analysis of various optimization algorithms applied to a simple dynamical programming problem. Features beautiful visualizations of those algorithms.
- uchū (uchu.style)
uchū is a minimalistic color palette based on OKLCH color space. I personally find it very aesthetically pleasing.
- flywhl/logis (github.com)
An interesting library to record ML experiments metadata through commit messages. Even better, it supports a query language to find which commit satisfies a given criterion.
- jj init (v5.chriskrycho.com)
I've been more and more tempted by jujutsu as a drop-in replacement for git. Its default way to handle changes seems so sane compared to git. This article is a very thorough and accessible introduction to how it works, and it definitely nudged me further along the jj train.
- Luxa CSS (www.luxacss.com)
This is an interesting CSS framework which picks some parts out of Tailwind while also being more minimalistic. Bonus point: it was made by the creator of the fantastic Dracula theme.
- A Gentle Intro to Running a Local LLM (www.dbreunig.com)
Has lots of great insights on which LLM models to use locally, depending on needs and performance available.
- How I Use Git Worktrees (matklad.github.io)
Useful example of a git worktree workflow. Worktrees help avoiding all the stashing and branch hopping a typical workflow would have. You can pull the repository multiple times on different branches and work on different features, review pull requests, run automated tests, etc..., without having to break your flow.
- Binary vector embeddings are so cool (emschwartz.me)
A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).
- efugier/smartcat (github.com)
An interesting CLI tool designed to call on to LLMs from the CLI with isolated short prompts. Seems to adhere to core Unix philosophy unlike most AI tools out there. Handles both local and hosted LLMs.
- How I program with LLMs (crawshaw.io)
Some interesting reflections on how to use LLMs in daily development work. I personally adhere mostly to the "autocomplete" part with Github Copilot, and I'm getting used to the "search" part where the LLM helps me find information on some language or coding paradigm faster than I can search it. I'm not yet onboard with "Chat-driven programming".
- A Deep Dive into Memorization in Deep Learning (blog.kjamistan.com)
An interesting series of articles explaining how machine learning models memorize data.
- ben-nour/SQL-tips-and-tricks (github.com)
I'm not a great SQL user, I have experience (mainly from database management in web development and now as a data scientist) but I don't consider myself an SQL wizard. This list of opinionated "tips" was quite useful to me.
- How to fine-tune open LLMs in 2025 with Hugging Face (www.philschmid.de)
An in-depth example of how to fine-tune an LLM using the Hugging Face ecosystem.
- A Visual Guide to How Diffusion Models Work (towardsdatascience.com)
An interesting dive into what makes diffusion models work. The summary is that diffusion models are models trained on data with noise to find the original data, at various level of noise. They eventually learn the probability distribution of the images in the space of all possible pixel arrangements. You can then iteratively denoise a pure Gaussian noise picture until you generate a new image: this is like sampling the learned probability distribution.
January 2025
- Einsum in Depth (einsum.joelburget.com)
A guide on how to use "einsum" in Python for tensor manipulation. Einstein notation made working with algebra a much nicer experience in physics, and for anyone doing heavy tensorial operations, they should do the same. But I always found the python implementation a bit awkward and difficult to understand. This article really helped with that.
December 2024
- Quick software tips for new ML researchers (www.eugenevinitsky.com)
A short list of best practices. Some are obvious from a software development perspective (VCS, package manager, linter), but some others have some good recommendations on ML specific tools (Hydra for configs, Optuna for hyperparameter tuning).
- GitHub Actions by Example (www.actionsbyexample.com)
I always have to google Github actions format and snippets, or prompt an LLM for it. This is a collection of examples so you never have to google it again.
November 2024
- Perspectives on diffusion – Sander Dieleman (sander.ai)
Some interesting thoughts on diffusion models.
- Thoughts on Riemannian metrics and its connection with diffusion/score matching [Part I] (blog.christianperone.com)
An in-depth description of the connections between diffusion models and Riemannian geometry.
September 2024
- Visualizing Algorithms (bost.ocks.org)
A beautiful set of visualizations of common algorithms. Perfect to truly understand what happens in a quicksort algorithm, or to compare different sampling algorithms.
- dleemiller/WordLlama (github.com)
Natural language processing toolkit optimized for CPU hardware. I haven't tested it yet but it looks really useful for quick clustering, deduplication, similarity search, etc...
August 2024
- Column Names as Contracts (emilyriederer.netlify.app)
An interesting explanation of implicit data contracts through naming conventions.
February 2024
- Python Data Science Handbook | Python Data Science Handbook (jakevdp.github.io)
A must-read for anyone beginning in data science. Chapter 5 features some great in-depth notebooks on classical machine learning methods like SVM, random forests, etc...