Disjoint Set or Union Find is a data structures that tracks a set of keys/elements which are partitioned into number of disjoint sets. It is useful in number of applications such as speeding up Kruskal's Minimum Spanning Tree, maximal k space clustering etc. It basically supports two operations:

view more

Unsupervised anomaly detection has its importance in the cases where we need to detect novilities from the unlabeled dataset of iids (independent and identically distributed). There has been different approaches to this problem such as Statistical Outlier Detection approaches e.g regression, gaussian density estimation, density based outlier detection e.g. Local Outlier Factor, Kernel density estimation etc.. (Aggarwal, 2013).

view more

Local outlier factor (LOF) is an outlier detection algorithm, that detects outliers based on comparing local density of data instance with its neighbors. It does so to decide if data instance belongs to region of similar density. It can detect an outlier in a dataset, for which number of clusters is unknown, and clusters are of different density and size. It's inspired from KNN (K-Nearest Neighbors) algorithm, and is widely used. There is a R implemantation available.

view more

umb, powered by jekyll, bootstrap, gh-pages