Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding full text search support #746

Merged
merged 4 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
build:
services:
couchbase:
image: couchbase/server-sandbox:7.0.2
image: couchbase/server-sandbox:7.1.3
ports:
- 8091-8094:8091-8094
- 11210:11210
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
build:
services:
couchbase:
image: couchbase/server-sandbox:7.0.2
image: couchbase/server-sandbox:7.1.3
ports:
- 8091-8094:8091-8094
- 11210:11210
Expand Down
33 changes: 33 additions & 0 deletions __test__/fts.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import { searchQuery, SearchQuery } from '../src';

const maybe = process.env.CI ? test.skip : test;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we skip these tests on CI? Is that due to the cb server version not supporting FTS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests were marked to be skipped in the CI due to the data being stored in an FTS index and we don't know the time for they to be ready and return the correct values, so we tried adding some delays but it is uncertain and only worked a few times with 15 sec of delay which is too much delay and even with that fails most the times.

Note: Testing indexed data in a CI can be tricky


maybe('fts match results', async () => {
const result = await searchQuery('hotels', SearchQuery.match('Gillingham'), { limit: 5 });
expect(result).toBeDefined();
expect(result.rows.length).toBeGreaterThanOrEqual(1);
expect(result.rows[0].id).toBeDefined();
});

maybe('fts matchPhrase basic', async () => {
const result = await searchQuery('hotels', SearchQuery.matchPhrase('Medway Youth Hostel'), { limit: 5 });
expect(result).toBeDefined();
expect(result.rows.length).toBeGreaterThanOrEqual(1);
expect(result.rows[0].id).toBe('hotel_10025');
});

maybe('fts conjuncts results', async () => {
const query = SearchQuery.conjuncts(SearchQuery.match('Berkeley'), SearchQuery.matchPhrase('luxury hotel'));
const result = await searchQuery('hotels', query);
expect(result).toBeDefined();
expect(result.rows.length).toBeGreaterThanOrEqual(1);
expect(result.rows[0].id).toBeDefined();
});

maybe('fts disjunction results', async () => {
const query = SearchQuery.disjuncts(SearchQuery.match('Louvre'), SearchQuery.match('Eiffel'));
const result = await searchQuery('hotels', query);
expect(result).toBeDefined();
expect(result.rows.length).toBeGreaterThanOrEqual(1);
expect(result.rows[0].id).toBeDefined();
});
205 changes: 205 additions & 0 deletions docusaurus/docs/advanced/fts.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docs page looks really good 🙂 nice work!

Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
---
sidebar_position: 0
title: Full Text Search
---

You can use the Full Text Search service (FTS) to create queryable full-text indexes in Couchbase Server.
FTS allows you to create, manage, and query full-text indexes on JSON documents stored in Couchbase buckets.

It uses natural language processing for querying documents, provides relevance scoring on the results of your queries, and has fast indexes for querying a wide range of possible text searches.

Supported query types include simple queries like Match and Term queries; range queries like Date Range and Numeric Range; and compound queries for conjunctions, disjunctions, and/or boolean queries.

Ottoman exposes an API for performing FTS queries which abstracts some of the complexity of using the Node.js SDK.

## Examples

Search queries are executed at the cluster level (not bucket or collection). All examples below will console log our returned documents along with their metadata and rows, each returned document has an index, id, score and sort value.

#### Match

Using the `travel-sample` [Sample Bucket](https://docs.couchbase.com/server/7.1/manage/manage-settings/install-sample-buckets.html), we define an FTS SearchQuery using the `match()` method to search for the specified term: "five-star".

```javascript
import { searchQuery, SearchQuery } from 'ottoman';

async function ftsMatchWord(term) {
return await searchQuery('index-hotel-description', SearchQuery.match(term), { limit: 5 });
}

var result = await ftsMatchWord('five-star');
console.log('RESULT:', result);
```

#### Match Phrase

An FTS SearchQuery using the `matchPhrase()` method to find a specified phrase: `"10-minute walk from the"`.

```javascript
import { searchQuery, SearchQuery } from 'ottoman';

async function ftsMatchPhrase(phrase) {
return await searchQuery('index-hotel-description', SearchQuery.matchPhrase(phrase), { limit: 10 });
}

result = await ftsMatchPhrase('10-minute walk from the');
console.log('RESULT:', result);
```

When searching for a phrase we get some additional benefits outside of the `match()` method. The match phrase query for `"10-minute walk from the"` will produce the following hits from our travel-sample dataset:

```shell
hits:
hotel_11331: "10-minute walk from village"
hotel_15915: "10 minute walk from Echo Arena"
hotel_3606: "10 minute walk to the centre"
hotel_28259: "10 minute walk to the coastal path"
```

If you run this code, notice that we matched `"10-minute"` with three additional hits on `"10 minute"` (without the dash).
So, we get some of the same matches on variations of that term just as we would with a regular `match()` method search,
however; notice that `"walk from the"` hits on several variations of this phrase:
`"walk from"` (where `"the"` was removed) and `"walk to the"` (where `"from"` was removed).
This is specific to searching phrases and helps provide us with various matches relevant to our search.


#### Date Range

Here we define an FTS SearchQuery that uses the `dateRange()` method to search for hotels where the updated field (`datetime`) falls within a specified date range.

```javascript
import { searchQuery, SearchQuery, Schema, model } from 'ottoman';

async function ftsHotelByDateRange(startDate, endDate) {
const schema = new Schema({
name: String,
updated: Date,
description: String
});

const Hotel = model('hotel', schema, {modelKey: 'type', keyGeneratorDelimiter: '_'});

await Hotel.updateById('hotel_fts_123', {
name: 'HotelFTS',
updated: new Date('2010-11-10 18:33:50 +0300'),
description: 'a fancy hotel',
})

return searchQuery(
'index-hotel-description',
SearchQuery.dateRange().start(startDate).end(endDate),
{
limit: 5,
}
)
}

result = await ftsHotelByDateRange('2010-11-10', '2010-11-20')
console.log('RESULT:', result)
```

#### Conjunction

A query satisfying multiple child queries. The example below will only return two documents hitting on the term `"five-star"`
and the phrase `"luxury hotel"` while no other documents match both criteria.

````javascript
import { searchQuery, SearchQuery } from 'ottoman';

async function ftsConjunction() {
return await searchQuery(
'index-hotel-description',
SearchQuery.conjuncts(
SearchQuery.match('five-star'),
SearchQuery.matchPhrase('luxury hotel')
)
)
}

var result = await ftsConjunction()
console.log('RESULT:', result)
````

:::info
Our match for `"five-star"` was not exact, but still produced a result because a similar term was found `"Five star"`,
we could have potentially matched `"5 star"` or the word `"five"`.
When you work with any full-text search the number of hits you get and their score are variable.
:::

#### Disjunction

A query satisfying (by default) one query or another.
If a conjunction query can be thought of like using an `AND` operator, a disjunction would be like using an `OR` operator.
The example below will return seven documents hitting on the term `"Louvre"` and five hits on the term `"Eiffel"` returning a total of 12 rows together as part of a disjunction query.

```javascript
import { searchQuery, SearchQuery, TermSearchFacet } from 'ottoman';

async function ftsDisjunction() {
return await searchQuery(
'index-hotel-description',
SearchQuery.disjuncts(
SearchQuery.match('Louvre'),
SearchQuery.match('Eiffel')
),
{
facets: {
Descriptions: new TermSearchFacet('description', 5),
},
limit: 12,
}
)
}

result = await ftsDisjunction()
console.log('RESULT:', result)
```

## Working with Results

As with all query result types in Ottoman, the search query results object contains two properties. The hits reflecting the documents that matched your query, emitted as rows. Along with the metadata available in the meta property.

Metadata holds additional information not directly related to your query, such as success total hits and how long the query took to execute in the cluster.

#### Iterating over Hits

```javascript
result.rows.forEach((hit, index) => {
const docId = hit.id
const score = hit.score
const resultNum = index + 1
console.log(`Result #${resultNum} ID: ${docId} Score: ${score}`)
})
```

#### Facets

```javascript
var facets = result.meta.facets
console.log('Descriptions facet:', facets.Descriptions)
```

## Scan Consistency and ConsistentWith

By default, all Search queries will return the data from whatever is in the index at the time of query.
These semantics can be tuned if needed so that the hits returned include the most recently performed mutations,
at the cost of slightly higher latency since the index needs to be updated first.

There are two ways to control consistency: either by supplying a custom `SearchScanConsistency` or using `consistentWith`.
At the moment the cluster only supports `consistentWith`, which is why you only see `SearchScanConsistency.NotBounded`
in the enum which is the default setting.
The way to make sure that recently written documents show up in the search works as follows (commonly referred to "`read your own writes"` — RYOW):
gsi-alejandro marked this conversation as resolved.
Show resolved Hide resolved
gsi-alejandro marked this conversation as resolved.
Show resolved Hide resolved

#### Scan consistency example:

```javascript
import { searchQuery, SearchQuery, SearchScanConsistency } from 'ottoman';

result = await searchQuery(
'index-hotel-description',
SearchQuery.match('swanky'),
{ consistency: SearchScanConsistency.NotBounded }
)
```


1 change: 1 addition & 0 deletions docusaurus/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ const config = {
{
label: 'Advanced',
items: [
{ label: 'Full Text Search', to: '/docs/advanced/fts' },
{ label: 'How Ottoman Works', to: '/docs/advanced/how-ottoman-works' },
{ label: 'Ottoman', to: '/docs/advanced/ottoman' },
{ label: 'Mongoose to Ottoman', to: '/docs/advanced/mongoose-to-couchbase' },
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"typescript": "4.9.3"
},
"dependencies": {
"couchbase": "4.2.0",
"couchbase": "4.2.8",
"jsonpath": "1.1.1",
"lodash": "4.17.21",
"uuid": "9.0.0"
Expand Down
3 changes: 3 additions & 0 deletions src/couchbase.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,7 @@ export {
QueryProfileMode,
MutationState,
QueryScanConsistency,
SearchQuery,
SearchScanConsistency,
TermSearchFacet,
} from 'couchbase';
1 change: 1 addition & 0 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export {
getOttomanInstances,
getModel,
Ottoman,
searchQuery,
} from './ottoman/ottoman';
export { Model, IModel } from './model/model';
export { Document, IDocument } from './model/document';
Expand Down
29 changes: 25 additions & 4 deletions src/ottoman/ottoman.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@
CouchbaseError,
} from 'couchbase';
import { generateUUID } from '../utils/generate-uuid';
import { SearchQuery } from 'couchbase/dist/searchquery';
import { SearchMetaData, SearchQueryOptions, SearchResult, SearchRow } from 'couchbase/dist/searchtypes';
import { StreamableRowPromise } from 'couchbase/dist/streamablepromises';

export interface ConnectOptions extends CouchbaseConnectOptions {
connectionString: string;
Expand Down Expand Up @@ -351,6 +354,18 @@
);
}

getIndexes() {
return this.cluster.searchIndexes().getAllIndexes();

Check warning on line 358 in src/ottoman/ottoman.ts

View check run for this annotation

Codecov / codecov/patch

src/ottoman/ottoman.ts#L357-L358

Added lines #L357 - L358 were not covered by tests
}

searchQuery(
indexName: string,
query: SearchQuery,
options?: SearchQueryOptions,

Check warning on line 364 in src/ottoman/ottoman.ts

View check run for this annotation

Codecov / codecov/patch

src/ottoman/ottoman.ts#L364

Added line #L364 was not covered by tests
): StreamableRowPromise<SearchResult, SearchRow, SearchMetaData> {
return this.cluster!.searchQuery(indexName, query, options);

Check warning on line 366 in src/ottoman/ottoman.ts

View check run for this annotation

Codecov / codecov/patch

src/ottoman/ottoman.ts#L366

Added line #L366 was not covered by tests
}

/**
* Closes the current connection.
*
Expand Down Expand Up @@ -535,12 +550,18 @@
await __ottoman.close();
}
};
export const start = () => __ottoman && __ottoman.start();
export const getModel = (name: string) => __ottoman && __ottoman.getModel(name);
export const start = () => __ottoman?.start();
export const getModel = (name: string) => __ottoman?.getModel(name);
export const getCollection = (collectionName = DEFAULT_COLLECTION, scopeName = DEFAULT_SCOPE) =>
__ottoman && __ottoman.getCollection(collectionName, scopeName);
__ottoman?.getCollection(collectionName, scopeName);
export const model = <T = any, R = T>(
name: string,
schema: Schema | Record<string, unknown>,
options?: ModelOptions,
): ModelTypes<T, R> => __ottoman && __ottoman.model<T, R>(name, schema, options);
): ModelTypes<T, R> => __ottoman?.model<T, R>(name, schema, options);

export const searchQuery = (
indexName: string,
query: SearchQuery,
options?: SearchQueryOptions,
): StreamableRowPromise<SearchResult, SearchRow, SearchMetaData> => __ottoman?.searchQuery(indexName, query, options);
Loading
Loading