Skip to content

Latest commit

 

History

History
159 lines (114 loc) · 3.83 KB

Parameters.md

File metadata and controls

159 lines (114 loc) · 3.83 KB

Parameters

Common Parameters

ParametersName type default definition
Samples int 1000 how many points will be sampled to do tree node split
TPTNumber int 32 number of TPT trees to help with graph construction
TPTLeafSize int 2000 TPT tree leaf size
NeighborhoodSize int 32 number of neighbors each node has in the neighborhood graph
GraphNeighborhoodScale int 2 number of neighborhood size scale in the build stage
CEF int 1000 number of results used to construct RNG
MaxCheckForRefineGraph int 10000 how many nodes each node will visit during graph refine in the build stage
NumberOfThreads int 1 number of threads to uses for speed up the build
DistCalcMethod string Cosine choose from Cosine and L2
MaxCheck int 8192 how many nodes will be visited for a query in the search stage

BKT

ParametersName type default definition
BKTNumber int 1 number of BKT trees
BKTKMeansK int 32 how many childs each tree node has

KDT

ParametersName type default definition
KDTNumber int 1 number of KDT trees

Parameters that will affect the index size

  • NeighborhoodSize
  • BKTNumber
  • KDTNumber

Parameters that will affect the index build time

  • NumberOfThreads
  • TPTNumber
  • TPTLeafSize
  • GraphNeighborhoodScale
  • CEF
  • MaxCheckForRefineGraph

Parameters that will affect the index quality

  • TPTNumber
  • TPTLeafSize
  • GraphNeighborhoodScale
  • CEF
  • MaxCheckForRefineGraph
  • NeighborhoodSize
  • KDTNumber

Parameters that will affect search latency and recall

  • MaxCheck

NNI for parameters tuning

Prepare vector data file data.tsv, query data file query.tsv, and truth file truth.txt following the format introduced in the Get Started.

Install microsoft nni and write the following python code (nni_sptag.py), parameter search space configuration (search_space.json) and nni environment configuration (config.yml).

nni_sptag.py

import nni
import os

vector_dimension = 10
vector_type = 'Float'
index_algo = 'BKT'
threads = 32
k = 3

def main():
    para = nni.get_next_parameter()
    cmd_build = "./indexbuilder -d %d -v %s -i data.tsv -o index -a %s -t %d " % (vector_dimension, vector_type, index_algo, threads)
    for p, v in para.items():
        cmd_build += "Index." + p + "=" + str(v)
    cmd_test = "./indexsearcher index Index.QueryFile=query.tsv Index.TruthFile=truth.txt Index.K=%d" % (k)
    os.system(cmd_build)
    os.system(cmd_test + " > out.txt")
    with open("out.txt", "r") as fd:
        lines = fd.readlines()
        res = lines[-2]
        segs = res.split()
        recall = float(segs[-2])
        avg_latency = float(segs[-5])
    score = recall
    nni.report_final_result(score)

if __name__ == '__main__':
    main()

search_space.json

{
        "BKTKmeansK": {"_type": "choice", "_value": [2, 4, 8, 16, 32]},
        "GraphNeighborhoodScale": {"_type": "choice", "_value": [2, 4, 8, 16, 32]}
}

config.yml

authorName: default

experimentName: example_sptag

trialConcurrency: 1

maxExecDuration: 1h

maxTrialNum: 10

#choice: local, remote, pai

trainingServicePlatform: local

searchSpacePath: search_space.json

#choice: true, false

useAnnotation: false

tuner:

  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner

  #SMAC (SMAC should be installed through nnictl)

  builtinTunerName: TPE

  classArgs:

    #choice: maximize, minimize

    optimize_mode: maximize

trial:

  command: python3 nni_sptag.py

  codeDir: .

  gpuNum: 0

Then start the tuning (tunning results can be found in the Web UI urls in the command output):

nnictl create --config config.yml

stop the tunning:

nnictl stop