Neighborhood Blending

Generally, the embedding used for any task is already pre-trained and generalized. However, for a specific task, Neighborhood Blending/mixing can be used. Here, we try to use the fact that a specific query/instance is primarily affected by its nearest neighbors only. Neighborhood Blending is a general technique and can be used in text data with pre-trained embedding or on images data. After the queries get encoded using any sentence transformer model and using Facebook AI Similarity Search (FAISS) to search for the similar embedding of multimedia documents, we obtained the top k nearest neighbors (in terms of similarity) of each given query.

Neighborhood Blending is used to make similar queries more distinct and closer to their neighbors. It makes the clusters more coherent.

Official repository: https://github.com/sonisanskar/Neighborhood-Blending

PyPi repository: https://pypi.org/project/NeighborBlend/

Features

Cross-platform: Windows, Mac, and Linux are officially supported.
Works with Python 3.7, 3.8, 3.9

Requires

numpy, pandas , faiss-cpu,and scikit-learn

Installation

We recommend using Python 3.5 or higher.

Install with pip
Install the NeighborBlend using pip

  pip install NeighborBlend

Getting Started

First we list down all the functions which can be leveraged by the user

Function1

  # Funct1 : neighborhood_search(emb,thresh,k_neighbors)

  '''
  Finds the neighbors of all samples embeddings above a given threshold 
  '''
  # Params
  '''
  emb: 2D matrix of embeddings
  thresh : float value of similarity above which we want neighbors
  k_neighbors: Number of neighbors which need to be considered for thresholding
  '''
  # Return
  '''
  match_index_lst: Dictionary with matching indexes
  similarities_lst: Similarities scores corresponding to above indexes
  '''

Function2

  # Funct2 :blend_neighborhood(emb, match_index_lst, similarities_lst)
  '''
  Performs weighted average and normalization of embeddings
  with the simialrity scores of its neighbors
  '''
  # Params
  '''
  emb: 2D matrix of embeddings
  match_index_lst: Dictionary with matching indexes
  similarities_lst: Similarities scores corresponding to above indexes
  '''
  # Return
  '''
  updated_embedding: Embedding obtained after weighted average
  '''

Function3

  # Funct3 :iterative_neighborhood_blending(emb, threshes,k_neighbors):
  '''
  Finds the neighbors of all samples embeddings above a given threshold and weighted average of the samles
  '''
  # Params
  '''
  emb: 2D matrix of embeddings
  threshes: a list of thresholds applied iteratively
  k_neighbors: number of neighbors to be considered
  '''
  # Return
  '''
  updated_embedding: Embedding obtained after weighted average
  match_index_lst
  similarities_lst
  '''

Example Usage

from NeighborBlend import*
import NeighborBlend
#Func 1
match_index_lst, similarities_lst = neighborhood_search(emb, thresh,k_neighbors)

Func3 is used for the final updation of embeddings

#Func 3
updated_emb,match_lst, simi_lst = iterative_neighborhood_blending(emb, threshes,k_neighbors)

Acknowledgements

FAISS

Authors

@sonisanskar

🔗 Links

Support

For support, email soni.sanskar@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build/lib		build/lib
dist		dist
src		src
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neighborhood Blending

Features

Requires

Installation

Getting Started

Example Usage

Acknowledgements

Authors

🔗 Links

Support

About

Releases 1

Packages

Languages

License

sonisanskar/Neighborhood-Blending

Folders and files

Latest commit

History

Repository files navigation

Neighborhood Blending

Features

Requires

Installation

Getting Started

Example Usage

Acknowledgements

Authors

🔗 Links

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages