gpytorch.nearest_neighbors¶
These modules provide a set of interfaces for partitioning datasets and establishing neighborhood structures between partitions. This kind of partitioning is required for nearest-neighbor-style Gaussian Process models, and we ensure behind the scenes that nearest-neighbor models based on these partitions still form valid joint density functions.
Indexes¶
Indexes are the interfaces used to partition datasets with clustering algorithms, measure distance between partitions with a distance metric for establishing neighboring structure, and ordering the data with ordering strategies.
KMeansIndex¶
- class gpytorch.nearest_neighbors.KMeansIndex(data, n_blocks, n_neighbors, distance_metric)[source]¶
This index performs K-Means clustering on a given feature set, computes neighboring blocks, enables evaluating block membership for test points, and enables reordering of the blocks based on block centroids.
VoronoiIndex¶
- class gpytorch.nearest_neighbors.VoronoiIndex(data, n_blocks, n_neighbors, distance_metric, seed=None)[source]¶
This index constructs a Voronoi diagram from a given feature set, computes neighboring blocks, enables evaluating block membership for test points, and enables reordering of the blocks based on the inducing points used to construct the diagram.
- Parameters:
data (None.tensor) – Features to use for Voronoi diagram, typically an (n,2) tensor of spatial lat-long coordinates.
n_blocks (int) – Number of desired polygons. Note that this does not guarantee similarly-sized clusters.
n_neighbors (int) – Number of neighboring polygons per polygon.
seed (int, optional) – Seed for randomly selected inducing points from training points.
Distance Metrics¶
Distance metrics are used to define distances between partitions of data. Each index defines the points that represent each block, and distance between blocks is defined as the distance between these representatives per the supplied distance metric. The DistanceMetrics class includes methods for Euclidean distance and Manhattan distance metrics, and custom distance metrics must return functions that take in vectors of observations and return the distance matrix for those observations.
DistanceMetrics¶
Ordering Strategies¶
Because nearest-neighbor approximations depend on the ordering of the data they’re trained on, we need a way to order the dataset by different metrics to find the best ordering strategy for a given problem. The OrderingStrategies class includes methods for ordering the data by a given coordinate or by an \(L_p\) norm. Custom ordering strategies can be implemented here and must return a function that takes in a vector of observations and returns a vector of integers indicating the index of each observation under the new ordering.