Projects Publications Blog Linklog

Linklog

A curated collection of links and resources I have found over time.

Tags: #algorithms (6)#best-practices (9)#cli (4)#data (10)#data-science (7)#deep-learning (8)#diffusion (3)#library (11)#llm (10)#LLM (1)#markdown (2)#optimization (6)#physics (2)#python (26)#rust (6)#SQL (2)#statistics (4)#tools (7)#vcs (6)#web-dev (4)

May 2025

Week 21
  • The Copilot Delusion (deplet.ing)

    Opinionated take on using Copilot. I love it.

  • Are you more likely to die on your birthday? (pudding.cool) - #data#statistics

    A fun analysis of the birthday effect using actual data and thorough methodology.

Week 20
  • dtolnay/thiserror: derive(Error) for struct and enum error types (github.com) - #rust#library

    This is a very useful crate providing macros to make wrapping custom/various error types in one error Enum. Importantly, it is equivalent to using the standard library and does not introduce any custom error handling, just reducing boilerplate.

  • Adventures in Imbalanced Learning and Class Weight | andersource (andersource.dev) - #data-science

    The question of imbalanced classes is a source of recurring discussions within the data science community. Common wisdom says to weigh the samples inversely proportional to their frequency in order to make sure all classes get enough representation during training. This post provides a thorough mathematical derivation that this does not work for the F1-score. It does work for different metrics, though, and so has some value there. The important takeaway from this is to always consider the metric most relevant to the problem at hand, and adapt the methodology to that.

Week 19
  • astral-sh/ty: An extremely fast Python type checker and language server, written in Rust. (github.com) - #python#tools

    I've been waiting for the astral type checker for a while: mypy is just excruciatingly slow, and every astral product tends to be a superperformer. And finally, it looks like it's coming together, with a preview release!

  • zerowidth positive lookahead (zerowidth.com) - #vcs

    I'm still itching to try jujutsu in more details, but every time I dive in the documentation, I feel like I'm missing a piece. I've been wondering if I just need to "see it at work", see how someone actually uses it, especially from my perspective. This post is much closer to that and I found it useful.

Week 18
  • dataframely — A declarative, 🐻‍❄️-native data frame validation library (tech.quantco.com) - #data#library

    I've been working a lot on our data pipelines at work, switching to polars mostly for performance and introducing rigorous checks and validations of data at various stages. I haven't yet used dataframely, but its principle really resonates with my use case, so I recommend checking it out.

April 2025

Week 17
  • Bloom Filters: A Memory-Saving Solution for Set Membership Checks (www.thecoder.cafe)

    Bloom filters are interesting data structures. This blog post explains them very well!

  • Are polynomial features the root of all evil? (alexshtf.github.io) - #data-science

    This is a great article presenting various polynomial bases used in mathematics (canonical, Legendre, Chebyshev) and how they can be used to fit data. It is well-known that they tend to overfit and be hard to regularize, but by using an appropriate basis for this kind of problem (Bernstein), you can get really good results. Interestingly, the reasoning behind this choice reminds me a lot of the kind of physics reasoning with regards to scaling and units.

Week 16
  • 14 Advanced Python Features | Edward Li's Blog (blog.edward-li.com) - #python

    You can find all sorts of beginner "top 10 features of X" online, and most of the time, they're basic and barely interesting. This article attempts to go counter that experience and, at least in my case, succeeded in teaching me a few things and provided some interesting points of discussion.

Week 15
  • A Visual Exploration of Gaussian Processes (distill.pub) - #optimization

    This is a very thorough and well designed visual introduction to Gaussian Processes and Bayesian optimisation. The article features interactive visualisations, which I found great to truly get a feel for what's happening.

  • A feel for the data | Briefer (briefer.cloud)

    This is a high-quality review of how visualisations shape our understanding of data. Its focus on the strengths of each visualisation type makes it a great learning resource to improve our storytelling skills.

  • Managing friction (arslan.io)

    This article shares an interesting viewpoint on the role of friction in our lives, both as a positive and negative influence.

  • Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice (8thlight.com)

    TDD can feel daunting, and advocacy to strict adherence of TDD principles can be off-putting when you are starting with it. This article does a good job of reminding all of us of the pragmatic take that some testing is better than no testing, and that TDD just like any other practice, is something you learn with time.

March 2025

Week 14
  • Writing useful Documentation (www.blog.philodev.one)

    A great write-up on how to write good documentation.

Week 12
  • Don't Be Afraid Of Types (lmika.org)

    Adding new types to existing codebases can be daunting, but one shouldn't be shy to do what's necessary. This is a good opinion piece on this topic!

  • "Vibe Coding" vs Reality (cendyne.dev)

    Read this if you want a good reality check on the current "vibe coding" trend.

  • A Visual Guide to LLM Agents (newsletter.maartengrootendorst.com) - #llm

    It is possibly one of the best summaries out there of how LLMs function, broken down by high-level components (memory, tools), and well illustrated.

Week 11
  • Learning Word Embedding (lilianweng.github.io) - #deep-learning

    An old but fantastic reference on vector embeddings.

  • On the Importance of Naming in Programming (wasp.sh) - #best-practices

    Some musings on the importance of good naming conventions in programming.

  • How To Boil the Mediterranean Sea (benbyfax.substack.com)

    This is an extremely interesting take on recent global warming data and the role of sulfur in masking some of the effects.

  • Algorithms Books (algorithmsbook.com) - #optimization

    A fantastic collection of free textbooks on algorithms for optimization, decision making and validation.

  • Slidev (sli.dev) - #tools

    During my PhD, I wrangled with beamer for important presentations, but I always yearned for a simpler markdown-based system for smaller, recurrent presentations. I just discovered slidev, and it just checked every feature I would want from this, and more.

Week 10
  • Succinct data structures (blog.startifact.com) - #data

    Succinct data structures are clever ways to pack a lot of information in lightweight structures like bit vectors. A very interesting read!

  • patrick-kidger/jaxtyping (github.com) - #python#library

    I've been looking for a good numpy and pytorch typing system in Python. Initially written for Jax, this library looks like exactly what I wanted.

  • Understanding Attention in LLMs (bartoszmilewski.com) - #llm

    This is a good example that even if you understand the math behind a concept, there's nothing like good storytelling. I knew how attention worked, but this post brillantly summarized it and clarified some steps for me. A great read!

  • Death of Best Practices (korshakov.com) - #best-practices

    An interesting take on the rigidity of best practices and how much more productive we can be once we let go of them.

  • Markov Chains explained visually (setosa.io) - #algorithms

    A very neat summary of what Markov chains are and how they work, with beautiful animations.

Week 9
  • Some Advanced Typing Concepts in Python (jellis18.github.io) - #python

    Another article about python's type system. This one is addressed to a more advanced audience. I had been looking for such a resource for a while, and I wasn't disappointed.

  • Abstract Base Classes and Protocols: What Are They? When To Use Them?? Lets Find Out! (jellis18.github.io) - #python

    Very cool breakdown of the difference between abstract classes and protocols in python. Well written and with lots of clear examples.

February 2025

Week 9
  • Git Branching for Small Teams (victoria.dev) - #vcs

    A good git workflow for small teams. Reading it, it happens to be the one we use in my team, and I can confirm it's a very effective one!

  • SolracHQ/bmath (github.com) - #tools

    An interesting CLI math tool, with its own language and sane defaults.

  • Do not log (sobolevn.me)

    An interesting analysis of the cost of modern logging infrastructure and its usefulness (or lack thereof).

Week 8
  • Generating Mazes (healeycodes.com) - #algorithms

    A great introduction to maze generation algorithms with informative visual.

  • Summary of Major Changes Between Python Versions (www.nicholashairs.com) - #python

    This is a very useful reference sheet containing major changes added by every new python version. Extremely handy if you have to update an old codebase multiple versions at once.

  • Prototyping in Rust (corrode.dev) - #rust

    This article presents various pieces of advice and tips on how to efficiently write Rust code at the prototype stage. Most introductory material on the language focuses on "proper use of syntax." But prototyping is often a compromise between code quality and coding efficiency, and this article makes some great suggestions on how to do that.

  • Deep dive into LLMs like ChatGPT by Andrej Karpathy (TL;DR) (anfalmushtaq.com) - #llm

    A TL;DR version of Andrej Karpathy's "Deep dive into LLMs like ChatGPT" video. Manages to keep the essentials but presents them in digestible clear chunks.

  • (Ab)using General Search Algorithms on Dynamic Optimization Problems (dubovik.eu) - #optimization#algorithms

    An interesting analysis of various optimization algorithms applied to a simple dynamical programming problem. Features beautiful visualizations of those algorithms.

  • uchū (uchu.style) - #web-dev

    uchū is a minimalistic color palette based on OKLCH color space. I personally find it very aesthetically pleasing.

  • flywhl/logis (github.com) - #python#vcs#library

    An interesting library to record ML experiments metadata through commit messages. Even better, it supports a query language to find which commit satisfies a given criterion.

Week 7
  • jj init (v5.chriskrycho.com) - #vcs

    I've been more and more tempted by jujutsu as a drop-in replacement for git. Its default way to handle changes seems so sane compared to git. This article is a very thorough and accessible introduction to how it works, and it definitely nudged me further along the jj train.

  • Luxa CSS (www.luxacss.com) - #web-dev

    This is an interesting CSS framework which picks some parts out of Tailwind while also being more minimalistic. Bonus point: it was made by the creator of the fantastic Dracula theme.

  • How I Use Git Worktrees (matklad.github.io) - #vcs

    Useful example of a git worktree workflow. Worktrees help avoiding all the stashing and branch hopping a typical workflow would have. You can pull the repository multiple times on different branches and work on different features, review pull requests, run automated tests, etc..., without having to break your flow.

  • Binary vector embeddings are so cool (emschwartz.me) - #llm#deep-learning#data

    A description of the effect of binary quantization on embeddings. By restricting the dtype of embedding vectors, you can get a tradeoff between accuracy in latent space and size of the embedding. Using binary dtype seems to conserve a surprisingly high amount of the original information content (about 97%) while yielding a gigantic amount of saving in space (about 97% too here).

  • efugier/smartcat (github.com) - #cli#LLM#tools

    An interesting CLI tool designed to call on to LLMs from the CLI with isolated short prompts. Seems to adhere to core Unix philosophy unlike most AI tools out there. Handles both local and hosted LLMs.

  • How I program with LLMs (crawshaw.io) - #llm

    Some interesting reflections on how to use LLMs in daily development work. I personally adhere mostly to the "autocomplete" part with Github Copilot, and I'm getting used to the "search" part where the LLM helps me find information on some language or coding paradigm faster than I can search it. I'm not yet onboard with "Chat-driven programming".

  • A Deep Dive into Memorization in Deep Learning (blog.kjamistan.com) - #deep-learning

    An interesting series of articles explaining how machine learning models memorize data.

  • ben-nour/SQL-tips-and-tricks (github.com) - #SQL#best-practices

    I'm not a great SQL user, I have experience (mainly from database management in web development and now as a data scientist) but I don't consider myself an SQL wizard. This list of opinionated "tips" was quite useful to me.

  • How to fine-tune open LLMs in 2025 with Hugging Face (www.philschmid.de) - #llm

    An in-depth example of how to fine-tune an LLM using the Hugging Face ecosystem.

  • aneeshnaik/lintsampler (github.com) - #python#library

    A useful Python library to sample custom probability distributions. Looks useful if the PDF is expensive to compute.

  • Skforecast (skforecast.org) - #library#data-science#python

    A Python library for timeseries forecasting with very extensive features. The documentation also features some in-depth pedagogical explanations of how to properly forecast data and what methods can be used to improve results.

  • Bayesian Methods for Hackers (dataorigami.net) - #statistics#python

    An illustrated introduction to Bayesian statistics using Jupyter notebooks. I was always confused about the difference between Bayesian and Frequentist approaches until I read this.

  • Helix (helix-editor.com) - #tools#rust

    A rust-based alternative to neovim with opinionated defaults. After setting up an LSP for Python, it immediately became my daily driver.

  • LoRA (jaketae.github.io) - #llm

    Explanation of LoRA methods for LLMs.

  • Thinking About Recipe Formats More Than Anyone Should (rknight.me) - #markdown

    An interesting reflection on markup languages for recipes. I was surprised other people spent as much time as I did pondering on recipe formats.

  • Rust for the Polyglot Programmer (www.chiark.greenend.org.uk) - #rust

    A book introducing Rust for programmers with experience in other languages. I'm not polyglot enough, but some of it helped me better understand the design choices in the language.

  • Effective Simulated Annealing with Python (nathan.fun) - #optimization#python

    Fantastic introduction to the simulated annealing metaheuristic in Python. This is a powerful method to build good approximate solutions to optimization problems.

  • Taking a Look at Compression Algorithms (cefboud.com) - #algorithms

    A short blog post summarizing the main compression algorithms. It's incredible how little I knew about something I use so much.

  • Linklog (ewintr.nl)

    Example of what a linklog should look like, and what I am for with my own.

  • Solving differential equations using neural networks (labpresse.com) - #deep-learning#physics

    Toy example of how to use neural networks to solve differential equations. This blew my mind when I first read it.

  • Blogging in Djot instead of Markdown (www.jonashietala.se) - #web-dev#rust#markdown

    Interesting dive on how to handle multiple markup languages in a Rust-based static website generator. My Rust journey hasn't taken me there yet, but it probably will eventually!

Week 6
  • A Visual Guide to How Diffusion Models Work (towardsdatascience.com) - #deep-learning#diffusion

    An interesting dive into what makes diffusion models work. The summary is that diffusion models are models trained on data with noise to find the original data, at various level of noise. They eventually learn the probability distribution of the images in the space of all possible pixel arrangements. You can then iteratively denoise a pure Gaussian noise picture until you generate a new image: this is like sampling the learned probability distribution.

January 2025

Week 5
  • Understanding LSTM Networks (colah.github.io) - #deep-learning

    In-depth explanation of LSTM Networks. The figures on this blog are incredible and truly help explaining what happens inside the network.

  • A Brief Introduction to Recurrent Neural Networks (jaketae.github.io) - #deep-learning#python

    Introduction and example of how to build a recurrent neural network from scratch.

  • Data Contracts as Therapy (benrutter.github.io) - #data

    Musings about the use of data contracts to validate data sources. If you've ever been frustrated by a data source suddenly changing its schema or sending unexpected data, this is for you!

Week 4
  • Polars for initial data analysis, Polars for production (pythonspeed.com) - #python#data

    Article about the use of Polars for both production and development stages. When starting with Polars, I found it easy to write production code (usually a long pipeline of LazyFrames ending with a collect), but struggled with writing optimal development code.

  • Modern Polars (kevinheavey.github.io) - #python#data

    Great online book about Polars targeted to Pandas users. If you haven't heard about Polars yet, do yourself a favor and read this.

Week 2
  • Einsum in Depth (einsum.joelburget.com) - #python

    A guide on how to use "einsum" in Python for tensor manipulation. Einstein notation made working with algebra a much nicer experience in physics, and for anyone doing heavy tensorial operations, they should do the same. But I always found the python implementation a bit awkward and difficult to understand. This article really helped with that.

  • Building effective agents (www.anthropic.com) - #llm

    Advice on agentic workflow for practical applications from Anthropic. A good read to better understand what structure you should use when establishing your project.

Week 1
  • Hyperparameter Tuning LightGBM (macalusojeff.github.io) - #data-science

    A useful guide for hyperparameter tuning of LGBM models. Mostly, if like me you always forget what parameter range is sensible, you can find it in there.

December 2024

Week 52
  • Software design principles for machine learning applications (github.com) - #best-practices#python

    A series of examples of proper software design in data science beyond Jupyter notebooks. Very good examples of proper refactoring, step by step, from a messy script to a properly encapsulated program.

Week 51
  • Quick software tips for new ML researchers (www.eugenevinitsky.com) - #best-practices

    A short list of best practices. Some are obvious from a software development perspective (VCS, package manager, linter), but some others have some good recommendations on ML specific tools (Hydra for configs, Optuna for hyperparameter tuning).

  • Hands-on Optimization with OR-Tools in Python (kunlei.github.io) - #python#optimization

    Detailed use cases of the OR-Tools library for optimization problems. Many problems can be solved in a more efficient way with linear programming, and this library makes it a breeze to do so.

Week 50
  • GitHub Actions by Example (www.actionsbyexample.com) - #vcs

    I always have to google Github actions format and snippets, or prompt an LLM for it. This is a collection of examples so you never have to google it again.

Week 49
  • Data Science at the Command Line (jeroenjanssens.com) - #cli#data-science

    Online book on how to use command-line tools for quick data science results. This is for when your boss asks you about some statistics of your recent data output and you don't want to write a whole script for it.

November 2024

Week 47
  • Perspectives on diffusion (sander.ai) - #diffusion

    Some interesting thoughts on diffusion models.

  • Thoughts on Riemannian metrics and its connection with diffusion/score matching [Part I] (blog.christianperone.com) - #physics#diffusion

    An in-depth description of the connections between diffusion models and Riemannian geometry.

Week 46
  • shshemi/tabiew (github.com) - #cli#rust#tools

    A handy rust-based TUI application to view and manipulate data from CSV and databases. Supports SQL syntax to query the data regardless of its sources.

Week 45
  • Algorithm Afternoon (algorithmafternoon.com) - #optimization#algorithms

    This a collection of all the optimization metaheuristic you can possibly imagine, with comments on how to implement them and what parameters can be tuned. The aim is to take it one algorithm per afternoon.

October 2024

Week 44
  • First aid for figures: all resources (helenajamborwrites.netlify.app) - #data

    A collection of resources to help make better data visualizations. Definitely useful as a refresher or reference before making a report or a presentation.

  • Transformers From Scratch (blog.matdmiller.com) - #deep-learning#llm

    Thorough explanation of the Transformers model. If like me you've been confused about what's so special about transformers compared to RNNs or LSTMs, this might help.

Week 43
  • dry-python/returns (github.com) - #library#python

    Bring some sanity to Python and remove null checks. Clearly inspired by Haskell's Maybe or Rust's Option type. I am mostly familiar with the latter, and I often wish it existed in Python, and now it does.

Week 42
  • Blog of Claudio Jolowicz (cjolowicz.github.io) - #python#best-practices

    Series of articles on best practices around Python coding and tooling. Definitely worth checking it out if you're still building your workflow.

Week 41
  • shap/shap (github.com) - #python#library

    Useful library to estimate feature importance of machine learning models, based on game theory principles. The main idea is to estimate the importance of each feature to take a sample from the mean prediction value to a given prediction value. It can also be aggregated over samples to understand global feature importance, conditional on feature value.

  • Modern Good Practices for Python Development (www.stuartellis.name) - #python#best-practices

    A set of best-practices in Python development. Given the permissiveness of Python in terms of syntax and design, I find that following community accepted best practices is the best way to learn how to write good code too.

September 2024

Week 40
  • Introduction to Data Science (rafalab.dfci.harvard.edu) - #statistics#data-science

    An online book focusing on the fundamentals of data science (statistics, traditional machine learning). I don't know much about R (on which this book is based) but most of the theory in there is relevant for any junior data scientist.

Week 39
  • Visualizing Algorithms (bost.ocks.org) - #algorithms

    A beautiful set of visualizations of common algorithms. Perfect to truly understand what happens in a quicksort algorithm, or to compare different sampling algorithms.

  • Was Michael Scott the World’s Best Boss? (datacream.substack.com) - #data

    I always love when data scientists take it too far on their hobbies. This is a cool example of data science applied to "The Office", to figure out through sentiment analysis if Michael Scott was truly appreciated.

Week 38
  • dleemiller/WordLlama (github.com) - #library#llm

    Natural language processing toolkit optimized for CPU hardware. I haven't tested it yet but it looks really useful for quick clustering, deduplication, similarity search, etc...

Week 37
  • Pico CSS (picocss.com) - #web-dev#library

    A minimalistic take on CSS frameworks which is simple and lightweight. Hopefully I one day have the time to rewrite this blog with it. Update: it looks semi-abandoned, but some forks are keeping the torch alive.

Week 36
  • posit-dev/great-tables (github.com) - #library#python

    Library to make great-looking tables from Polars dataframes. It works with Pandas too but there you can just generate HTML directly, while Polars currently does not have many more options.

August 2024

Week 35
  • sharkdp/hyperfine (github.com) - #cli#tools

    A very useful CLI tool to perform benchmarking tests. Very useful to test a bash script of a simple script file without any complicated profiling.

  • REDOKU (padolsey.github.io)

    A fun RegExp-based crossword. Not easy though, and you might see English differently afterwards.

Week 33
  • Modern SQL Style Guide (gist.github.com) - #SQL#best-practices

    An interesting and opinionated take on SQL formatting. You might not be able to impose it at work, but you can always try!

Week 31
  • Column Names as Contracts (emilyriederer.netlify.app) - #best-practices#data

    An interesting explanation of implicit data contracts through naming conventions.

July 2024

Week 31
  • A User’s Guide to Statistical Inference and Regression (mattblackwell.github.io) - #statistics

    Brief introductory book to essential statistics. This online book is very clear and helped me understand concepts I always found confusing.

March 2024

Week 10
  • Machine Learning Notebooks (sebastianraschka.com) - #python

    A collection of detailed Python notebooks written by Sebastian Raschka. It's like a big cheatsheet of machine learning methods.

February 2024

Week 7
  • Python Data Science Handbook | Python Data Science Handbook (jakevdp.github.io) - #python#data-science

    A must-read for anyone beginning in data science. Chapter 5 features some great in-depth notebooks on classical machine learning methods like SVM, random forests, etc...

  • faif/python-patterns (github.com) - #python

    A detailed list of design patterns in Python. While I don't believe you should always look to insert design patterns everywhere you can, knowing them is often the key to writing more robust code when relevant.

Week 5
  • Scientific Computing with Python — Scientific Computing with Python (caam37830.github.io) - #python

    A reference on how to use Python for efficient computations in science. I wish I had read this before my PhD.

You can follow me via RSS. Switch theme.
© 2025 Nicolas Chagnet. All rights reserved.