ayan@website $ ./blogs --all _

Below are the articles/tutorials that I could write when I got time. Click the one that interests you. If you want me to write something specific that you need OR have suggestion/correction on an existing blog, just mail me at `ayan05das`@`gmail.com`. Thank you.

We talked extensively about Directed PGMs in my earlier article and also described one particular model following the principles of Variational Inference (VI). There exists another class of models conveniently represented by *Undirected* Graphical Models which are practiced relative less than modern methods of Deep Learning (DL) in the research community. They are also characterized as **Energy Based Models (EBM)**, as we shall see, they rely on something called *Energy Functions*. In the early days of this Deep Learning *renaissance*, we discovered few extremely powerful models which helped DL to gain momentum. The class of models we are going to discuss has far more theoretical support than modern day Deep Learning, which as we know, largely relied on intuition and trial-and-error. In this article, I will introduce you to the general concept of Energy Based Models (EBMs), their difficulties and how we can get over them. Also, we will look at a specific family of EBM known as **Boltmann Machines (BM)** which are very well known in the literature.

Welcome to another tutorial about probabilistic models, after a primer on PGMs and VAE. However, I am particularly excited to discuss a topic that doesn’t get as much attention as traditional Deep Learning does. The idea of **Probabilistic Programming** has long been there in the ML literature and got enriched over time. Before it creates confusion, let’s declutter it right now - it’s not really writing traditional “programs”, rather it’s building Probabilistic Graphical Models (PGMs), but *equipped with imperative programming style* (i.e., iterations, branching, recursion etc). Just like Automatic Differentiation allowed us to compute derivative of arbitrary computation graphs (in PyTorch, TensorFlow), Black-box methods have been developed to “solve” probabilistic programs. In this post, I will provide a generic view on why such a language is indeed possible and how such black-box solvers are materialized. At the end, I will also introduce you to one such *Universal* Probabilistic Programming Language, Pyro, that came out of Uber’s AI lab and started gaining popularity.

Welcome folks ! This is an article I was planning to write for a long time. I finally managed to get it done while locked at home due to the global COVID-19 situation. So, its basically something fun, interesting, attractive and hopefully understandable to most readers. To be specific, my plan is to dive into the world of finding visually appealing patterns in different sections of mathematics. I am gonna introduce you to four distinct mathematical concepts by means of which we can generate artistic patterns that are very soothing to human eyes. Most of these use random number as the underlying principle of generation. These are not necessarily very useful in real life problem solving but widely loved by artists as a tool for content creation. They are sometimes referred to as *Mathematical Art*. I will deliberately keep the fine-grained details out of the way so that it is reachable to a larger audience. In case you want to reproduce the content in this post, here is the code. **Warning: This post contains quite heavily sized images which may take some time to load in your browser; so be patient**.

Neural Ordinary Differential Equation (Neural ODE) is a very recent and first-of-its-kind idea that emerged in NeurIPS 2018. The authors, four researchers from University of Toronto, reformulated the parameterization of deep networks with differential equations, particularly first-order ODEs. The idea evolved from the fact that ResNet, a very popular deep network, possesses quite a bit of similarity with ODEs in their core structure. The paper also offered an efficient algorithm to train such ODE structures as a part of a larger computation graph. The architecture is flexible and memory efficient for learning. Being a bit non-trivial from a deep network standpoint, I decided to dedicate this article explaining it in detail, making it easier for everyone to understand. Understanding the whole algorithm requires fair bit of rigorous mathematics, specially ODEs and their algebric understanding, which I will try to cover at the beginning of the article. I also provided a (simplified) PyTorch implementation that is easy to follow.

In the previous article, I started with Directed Probabilitic Graphical Models (PGMs) and a family of algorithms to do efficient approximate inference on them. Inference problems in Directed PGMs with continuous latent variables are intractable in general and require special attention. The family of algorithms, namely **Variation Inference (VI)**, introduced in the last article is a general formulation of approximating the intractable posterior in such models. **Variational Autoencoder** or famously known as **VAE** is an algorithm based on the principles on VI and have gained a lots of attention in past few years for being extremely efficient. With few more approximations/assumptions, VAE eshtablished a clean mathematical formulation which have later been extended by researchers and used in numerous applications. In this article, I will explain the intuition as well as mathematical formulation of Variational Autoencoders.

Welcome to the first part of a series of tutorials about Directed Probabilistic Graphical Models (PGMs) & Variational methods. Directed PGMs (OR, Bayesian Networks) are very powerful probabilistic modelling techniques in machine learning literature and have been studied rigorously by researchers over the years. Variational Methods are family of algorithms arise in the context of Directed PGMs when it involves solving an intractable integrals. Doing inference on a set of latent variables (given a set of observed variables) involves such an intractable integral. Variational Inference (VI) is a specialised form of variation method that handles this situation. This tutorial is NOT for absolute beginners as I assume the reader to have basic-to-moderate knowledge about Random Variables, probability theories and PGMs. The next tutorial in this series will cover one perticular VI method, namely “Variational Autoencoder (VAE)” built on top of VI.

Welcome to the very first and an introductory article on `typesetting`

. If you happened to be from the scientific community, you must have gone through at least one document (maybe in the form of `.pdf`

or a printed paper) which is the result of years of developments in typesetting. If you are from technical/research background, chances are that you have even *typeset* a document before using something called `LaTeX`

. Let me assure you that `LaTeX`

is neither the beginning nor the end of the entire typesetting ecosystem. In this article, I will provide a brief introduction to what typesetting is and what all modern tools are available for use. Specifically, the most popular members of the `TeX`

family will be introduced, including `LaTeX`

.

Over the years, **Python** has become one of the major general purpose programming language that the industry and academia care about. But even with a vast community around the language, very few of them are aware of how Python is actually executed on a computer system. Some of them have a vague idea of how Python is executed, which is partly because it’s totally possible to know nothing about it and still be a successful Python programmer. They believe that unlike C/C++, Python is “interpreted” instead of “compiled”, i.e. they are *executed one statement at a time* rather that converting down to some kind of machine code. **It is not entirely correct**. This post is targeted towards programmers with a fair knowledge of Python’s language fundamentals, to clear the fog around *what really goes on* when you hit `python script.py`

.
**WARNING**: This is a fairly advanced topic and not for beginners.

In the last post, we went through the basics of `Distributed computing`

and `MPI`

, and also demonstrated the steps of setting up a distributed environment. This post will focus on the practical usage of distributed computing strategies to accelerate the training of Deep learning (DL) models. To be specific, we will focus on one particular distributed training algorithm (namely `Synchronous SGD`

) and implement it using `PyTorch`

’s distributed computing API (i.e., `torch.distributed`

). I will use 4 nodes for demonstration purpose, but it can easily be *scaled up* with minor changes. This tutorial assumes the reader to have working knowledge of Deep learning model implementation as I won’t go over typical concepts of deep learning.

Welcome to an in-depth tutorial on **Distributed Deep learning** with some standard tools and frameworks available to everyone. From the very beginning of my journey with DL when I was an undergrad student, I realized that it’s not as easy as it seems to achieve what the mainstream industry has achieved with Deep learning even thought I was quite confident about my knowledge of “DL algorithms”. Because clearly, algorithm wasn’t the only driving force that the industry survives on - it’s also the *scale* at which they execute their well-planned implementation on high-end hardwares, which was near to impossible for me to get access to. So it’s extremely important to understand the concept of *scale* and the consequences that comes with it. This tutorial is targeted towards people who have working knowledge of Deep learning and have access to *somewhat* industry standard hardware or at least a well-equipped academic research lab. If not, you can still follow along as the techniques shown here can be scaled down with proper changes.

In my previous post, I laid out the plan for couple of tutorials in the same series about intermediate level Python. This series of posts are intended to introduce some of the intermediate concepts to the programmers who are already familiar with the basic concepts of Python. Specifically, I planned to ellaborately describe `Generator`

s, `Decorator`

s and `Context Manager`

s, among which, I have already dedicated a full-fledged post on the first one - `Generator`

s. This one will be all about `Decorator`

s and some of it’s lesser known features/applications. Without further I do, let’s dive into it.

Welcome to the series of *Intermediate* Python tutorials. Before we begin, let me make it very clear that this tutorial is **NOT** for absolute beginners. This is for `Python`

programmers who have familiarity with the standard concepts and syntax of Python. In this three parts tutorial, we will specifically look at three features of Python namely `Generators`

, `Decorators`

and `Context Managers`

which, in my opinion, are not heavily used by average or below-average python programmers. In my experience, these features are lesser known to programmers whose primary purpose for using python is not to focus on the language too much but just to get their own applications/algorithms working. This leads to very monotonous, imparative-style codes which, in long run, become unmaintainable.

I recently wrote an article explaining the intuitive idea of `capsule`

s proposed by Geoffrey Hinton and colleagues which created a buzz in the deep learning community. In that article, I explained in simple terms the motivation behind the idea of `capsule`

s and its (minimal) mathematical formalism. It is highly recommended that you read that article as a prerequisite to this one. In this article, I would like to explain the specific `CapsNet`

architecture proposed in the same paper which managed to achieve state-of-the-art performance on the MNIST digit classification.

Recently, Geoffrey Hinton, the godfather of deep learning argued that one of the key principles in the ConvNet model is flawed, i.e., they don’t work the way human brain does. Hinton also proposed an alternative idea (namely `capsules`

), which he thinks is a better model of the human brain. In this post, I will try to present an intuitive explanation of this new proposal by Hinton and colleagues.