Umanga Bista's Blog | Umanga Bista

Union Find

Disjoint Set or Union Find is a data structures that tracks a set of keys/elements which are partitioned into number of disjoint sets. It is useful in number of applications such as speeding up Kruskal's Minimum Spanning Tree, maximal k space clustering etc. It basically supports two operations:

August 18, 2014
union-find
scala
disjoint-set

Principal Component Analysis Based Unsupervised Anomaly Detection

Unsupervised anomaly detection has its importance in the cases where we need to detect novilities from the unlabeled dataset of iids (independent and identically distributed). There has been different approaches to this problem such as Statistical Outlier Detection approaches e.g regression, gaussian density estimation, density based outlier detection e.g. Local Outlier Factor, Kernel density estimation etc.. (Aggarwal, 2013).

Improving performance of Local outlier factor with KD-Trees

Local outlier factor (LOF) is an outlier detection algorithm, that detects outliers based on comparing local density of data instance with its neighbors. It does so to decide if data instance belongs to region of similar density. It can detect an outlier in a dataset, for which number of clusters is unknown, and clusters are of different density and size. It's inspired from KNN (K-Nearest Neighbors) algorithm, and is widely used. There is a R implemantation available.

April 21, 2014
outlier
local-outlier
kd-tree