Projects Publications Blog Linklog

Linklog

A curated collection of links and resources I have found over time.

Tags: #algorithms (8)#best-practices (9)#cli (4)#data (10)#data-science (7)#deep-learning (8)#diffusion (3)#library (11)#llm (10)#LLM (1)#markdown (2)#optimization (6)#physics (3)#python (27)#rust (7)#SQL (2)#statistics (4)#tools (7)#vcs (6)#web-dev (4)

May 2025

Week 20
  • Adventures in Imbalanced Learning and Class Weight | andersource (andersource.dev) - #data-science

    The question of imbalanced classes is a source of recurring discussions within the data science community. Common wisdom says to weigh the samples inversely proportional to their frequency in order to make sure all classes get enough representation during training. This post provides a thorough mathematical derivation that this does not work for the F1-score. It does work for different metrics, though, and so has some value there. The important takeaway from this is to always consider the metric most relevant to the problem at hand, and adapt the methodology to that.

April 2025

Week 17
  • Are polynomial features the root of all evil? (alexshtf.github.io) - #data-science

    This is a great article presenting various polynomial bases used in mathematics (canonical, Legendre, Chebyshev) and how they can be used to fit data. It is well-known that they tend to overfit and be hard to regularize, but by using an appropriate basis for this kind of problem (Bernstein), you can get really good results. Interestingly, the reasoning behind this choice reminds me a lot of the kind of physics reasoning with regards to scaling and units.

February 2025

Week 7
  • Skforecast (skforecast.org) - #library#data-science#python

    A Python library for timeseries forecasting with very extensive features. The documentation also features some in-depth pedagogical explanations of how to properly forecast data and what methods can be used to improve results.

January 2025

Week 1
  • Hyperparameter Tuning LightGBM (macalusojeff.github.io) - #data-science

    A useful guide for hyperparameter tuning of LGBM models. Mostly, if like me you always forget what parameter range is sensible, you can find it in there.

December 2024

Week 49
  • Data Science at the Command Line (jeroenjanssens.com) - #cli#data-science

    Online book on how to use command-line tools for quick data science results. This is for when your boss asks you about some statistics of your recent data output and you don't want to write a whole script for it.

September 2024

Week 40
  • Introduction to Data Science (rafalab.dfci.harvard.edu) - #statistics#data-science

    An online book focusing on the fundamentals of data science (statistics, traditional machine learning). I don't know much about R (on which this book is based) but most of the theory in there is relevant for any junior data scientist.

February 2024

Week 7
  • Python Data Science Handbook | Python Data Science Handbook (jakevdp.github.io) - #python#data-science

    A must-read for anyone beginning in data science. Chapter 5 features some great in-depth notebooks on classical machine learning methods like SVM, random forests, etc...

You can follow me via RSS. Switch theme.
© 2025 Nicolas Chagnet. All rights reserved.