-
Notifications
You must be signed in to change notification settings - Fork 53
Information for building a simple Seeks P2P client
Seeks has an API to send queries and receive results.
Using the API, one call on a Seeks node then does the work of calling more nodes, and merging their results.
To call on other nodes, Seeks uses a simple protocol that allows it to retrieve results for a range of similar queries around the initial query. This range is referred to as a query halo.
It is possible for a developer to write his/her own application to communicate with each Seeks node directly, not using the high level API.
Doing this allows a developer to retrieve results (not the meta-search engine results) from each Seeks node of interest, and to merge / rank them using his/her own sauce.
The procedure is as follows:
- First create the query halo by computing n-grams over the initial query. Seeks works with n=5, though this is a configurable value. Most importantly, Seeks uses a special type of n-grams:
The sentence
Houston we have a problem
becomes
Houston
...
Houston <skip> have a problem
Houston we <skip> have a problem
...
Houston <skip> <skip> a problem
...
The word is used to match similar strings. It must be used as it is, that is using exactly.
Many developers may not want to bother with such a procedure. In this case, generating a subset of the n-grams above will work. Typically, not using the keyword, or computing unigrams, that is each single word in the original query, will work. The difference with the reference implementation is that the matching of similar queries will be less efficient.
- Hash every generated n-gram with RIPEMD-160
Hashing is a simple step: simply hash every generated string.
The one rule is that if you are using the keyword, you must be careful that:
- you select strings that do not contain the keyword,
- rank the words in these strings into alphabetical order before you hash the whole string.
You can compare your halo to the reference hashed halo generation by using
./src/lsh/tests/gen_mrf_query_160 "Houston we have a problem" 0 5
where 5 is the value given to n.
- Select a Seeks node and (cf API) send HTTP POST request with body filled with all hashes, serialized as protobuffers,
You must use the following protobuffer message structure (see src/plugins/udb_service/halo_msg.proto):
message hash_halo
{
required uint32 expansion = 1;
repeated string key = 2;
}
and use the following HTTP headers
Content-Type: application/x-protobuf
answer comes in the form of a list of results, serialized as a protobuffers, using the following message structure (see src/plugins/query_capture/db_query_record_msg.proto):
package sp.db;
import "db_record_msg.proto";
message visited_url
{
required string url = 1;
required int32 hits = 2; /* url hits for this query. */
optional string title = 3; /* url title. */
optional string summary = 4; /* snippet summary. */
optional uint32 url_date = 5; /* URL data date. */
}
message visited_urls
{
repeated visited_url vurl = 1;
}
message related_queries
{
repeated related_query rquery = 1;
}
message related_query
{
required uint32 radius = 1; /* similarity radius to the original query. */
required string query = 2; /* query (may be hashed). */
required uint32 query_hits = 3; /* number of query hits. */
required visited_urls vurls = 4; /* visited urls for this query. */
}
extend sp.db.record
{
required related_queries queries = 4; /* original queries */
}
- error codes:
HTML error pages.
- repeat the HTTP POST to every seeks node in the ring of interest, and merge results.