KDE, kernel density estimation, non-parametric estimation
Size: 777.64 KB
Language: en
Added: Mar 21, 2021
Slides: 11 pages
Slide Content
Venkata Padmavathi Metta Kernel Density Estimation (KDE )
Parametric Vs Non-parametric Estimation Parametric probability density estimation involves selecting a common distribution and estimating the parameters for the density function from a data sample. Nonparametric probability density estimation involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation
Perhaps the most common nonparametric approach for estimating the probability density function of a continuous random variable is called kernel smoothing, or kernel density estimation , KDE for short. Kernel Density Estimation : Nonparametric method for using a dataset to estimating probabilities for new points. Kernel Density Estimation (KDE)
The kernel function weights the contribution of observations from a data sample based on their relationship. A parameter, called the smoothing parameter or the bandwidth , controls the scope, or window of observations, from the data sample that contributes to estimating the probability for a given sample. As such, kernel density estimation is sometimes referred to as a Parzen -Rosenblatt window, or simply a Parzen window, after the developers of the method. Smoothing Parameter ( bandwidth ) : Parameter that controls the number of samples or window of samples used to estimate the probability for a new point. Kernel Density Estimation (KDE)
The contribution of samples within the window can be shaped using different functions, sometimes referred to as basis functions , e.g. uniform normal, etc., with different effects on the smoothness of the resulting density function. Basis Function ( kernel ) : The function chosen used to control the contribution of samples in the dataset toward estimating the probability of a new point. Here we consider Gaussian kernel Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE) A range of kernel functions are commonly used: uniform, triangular, biweight , triweight , normal, and others Let ( x 1 , x 2 , …, x n ) be independent and identically distributed samples drawn from some univariate distribution with an unknown density ƒ at any given point x . We are interested in estimating the shape of this function ƒ . Its kernel density estimator is where K is the kernel — a non-negative function which integrates to 1 — and h > 0 is a smoothing parameter called the bandwidth .
KDE-Example Sample 1 2 3 4 5 6 Value -2.1 -1.3 -0.4 1.9 5.1 6.2 Kernel density estimates are closely related to histograms , but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The table above contains six data points For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12=0.083 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other.
Sample 1 2 3 4 5 6 Value -2.1 -1.3 -0.4 1.9 5.1 6.2 Bin size=2 units So different bins can be (-4 to -2), (-2 to 0), (0 to 2), (2 to 4), (4 to 6), (6 to 8) KDE-Example
KDE-Example For the kernel density estimate, normal kernels with standard deviation 2.25 (indicated by the red dashed lines) are placed on each of the data points x i . The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables.
KDE for different bandwidths(h) https://mathisonian.github.io/kde/
Different Estimators We can build classifiers when the underlying densities are known Bayesian Decision Theory introduced the general formulation In most situations, however, the true distributions are unknown and must be estimated from data. Parametric Estimation : Assume a particular form for the density (e.g. Gaussian), so only the parameters (e.g., mean and variance) need to be estimated. Examples: Maximum Likelihood Estimation (MLE) Maximum A Posteriori (MAP) Estimation Bayesian Estimation Non-parametric Density Estimation: Assume NO knowledge about the density Example: Kernel Density Estimation