machine_learning

Lecture 6: Statistics

Sample variance for normal distribution

Correcting standart deviation for normal distribution

MLE for Gaussian with fixed dispersion

Correlation of 2D points

Explore correlation coefficient for 2D points

Lecture 7: Naive Bayes Classification

Monty Hall paradox

This demo should help to convince you that all the theory behind Monty Hall paradox actually makes sense.

2D Gaussian Naive Bayes

This demo allows you to explore 2D Naive Bayes with Gaussian as an underlying model.

Lecture 8: Mathematical optimization

Explore different optimization methods

Here are a handful of optimization methods defined in Scilearn library. You can tweak starting point, number of iterations, method to compute Hessian and Jacobian (for the methods which use them). You can also choose function you want to find the minimum of.

Lecture 9: Regression

Ridge and Lasso regularization

Custom regression with polynomials

Lecture 10: Support Vector Machines

Linear Kernel SVM classification of custom 2D dataset

Kernel SVM classification of custom 2D dataset

Lecture 11: Decision Trees and Random Forests

Decision tree classification of custom set of 2D points

Explore how tree depth influences the resule of decision tree classification

Handwritten digits recognition

Draw any digit 0…9 and press “Do recognition”.

Lecture 12: Principal Component Analysis

PCA on custom 2D datapoints

Lecture 13: k-Means Clustering

k-Means, custom implementation

k-Means with Scikit Learn

Lecture 14: Gaussian Mixture Models

GMM in 2D

This demo is the playground to provide intuition on how a 2D mixture model based on Gaussians works.

Implement simple GMM algo (Expectation-Maximization)

Lecture 15: Kernel Density Estimation (KDE)

1D KDE exploration

This demo aims to show how KDE with different kernels and bandwidths deal with data following Gaussian mixture (3 Gaussians) distribution.

KDE on 2D custom dataset

This demo is the playground to explore 2D KDE with different bandwidth, kernels and metric for custom user datapoints.

1854 Broad Street cholera outbreak visualization

The Broad Street cholera outbreak was a severe outbreak of cholera that occurred in 1854 near Broad Street (now Broadwick Street) in the Soho district of the City of Westminster, London, England, and occurred during the 1846–1860 cholera pandemic happening worldwide. This outbreak, which killed 616 people, is best known for the physician John Snow’s study of its causes and his hypothesis that germ-contaminated water was the source of cholera, rather than particles in the air.This discovery came to influence public health and the construction of improved sanitation facilities beginning in the mid-19th century.

On 31 August 1854, after several other outbreaks had occurred elsewhere in the city, a major outbreak of cholera occurred in Soho. Snow, the physician who eventually linked the outbreak to contaminated water, later called it “the most terrible outbreak of cholera which ever occurred in this kingdom.”

Over the next three days, 127 people on or near Broad Street died. During the next week, three quarters of the residents had fled the area. By 10 September, 500 people had died and the mortality rate was 12.8 percent in some parts of the city. By the end of the outbreak, 616 people had died.

Many of the victims were taken to the Middlesex Hospital, where their treatment was superintended by Florence Nightingale (we’ve talked about here in statistics lecture), who briefly joined the hospital in early September in order to help with the outbreak.

By talking to local residents Snow identified the source of the outbreak as the public water pump on Broad Street. Although Snow’s chemical and microscope examination of a sample of the water from this Broad Street pump water did not conclusively prove its danger, his facts about the patterns of illness and death among residents in Soho persuaded the St James parish authorities to disable the well pump by removing its handle.

It’s worth mentioning that the germ theory was not established at this point (Louis Pasteur did not propose it until 1861). Snow did not understand the mechanism by which disease was transmitted, but the evidence led him to believe that it was NOT due to breathing foul air. Based on the pattern of illness among residents, Snow hypothesized that cholera was spread by an agent in contaminated water. He first published his theory in 1849, in an essay titled “On the Mode of Communication of Cholera”.

Lecture 16: Manifold Learning

LLE, MDS and Isomap performance on common manifolds

This demo demonstates data on a few common manifolds in 3D and how common machine learning mehods struggle to unfold it.