Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log search queries and results #166

Merged
merged 9 commits into from
Nov 17, 2024
8 changes: 8 additions & 0 deletions cognee-frontend/src/modules/chat/getHistory.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import { fetch } from '@/utils';

export default function getHistory() {
return fetch(
'/v1/search',
)
Comment on lines +4 to +6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding query parameters for pagination and filtering.

Search history could grow large. Consider implementing pagination and filtering capabilities.

Here's a suggested implementation:

+interface HistoryParams {
+  page?: number;
+  limit?: number;
+  startDate?: string;
+  endDate?: string;
+}
+
-export default function getHistory() {
+export default function getHistory(params?: HistoryParams) {
+  const queryParams = new URLSearchParams();
+  if (params?.page) queryParams.append('page', params.page.toString());
+  if (params?.limit) queryParams.append('limit', params.limit.toString());
+  if (params?.startDate) queryParams.append('startDate', params.startDate);
+  if (params?.endDate) queryParams.append('endDate', params.endDate);
+
   return fetch(
-    '/v1/search',
+    `/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`,
   )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return fetch(
'/v1/search',
)
interface HistoryParams {
page?: number;
limit?: number;
startDate?: string;
endDate?: string;
}
export default function getHistory(params?: HistoryParams) {
const queryParams = new URLSearchParams();
if (params?.page) queryParams.append('page', params.page.toString());
if (params?.limit) queryParams.append('limit', params.limit.toString());
if (params?.startDate) queryParams.append('startDate', params.startDate);
if (params?.endDate) queryParams.append('endDate', params.endDate);
return fetch(
`/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`,
)

.then((response) => response.json());
}
borisarzentar marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 13 additions & 2 deletions cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
'use client';

import { v4 } from 'uuid';
import classNames from 'classnames';
import { useCallback, useState } from 'react';
import { useCallback, useEffect, useState } from 'react';
import { CTAButton, Stack, Text, DropdownSelect, TextArea, useBoolean } from 'ohmy-ui';
import { fetch } from '@/utils';
import styles from './SearchView.module.css';
import getHistory from '@/modules/chat/getHistory';

interface Message {
id: string;
Expand Down Expand Up @@ -52,6 +55,14 @@ export default function SearchView() {
}, 300);
}, []);

useEffect(() => {
getHistory()
.then((history) => {
setMessages(history);
scrollToBottom();
});
}, [scrollToBottom]);
Comment on lines +58 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and loading state management

The current implementation has several potential issues:

  1. No error handling for failed history fetches
  2. Missing loading state indicator
  3. Potential memory leak if component unmounts during fetch
  4. Unnecessary re-renders due to scrollToBottom dependency

Consider this improved implementation:

+ const [isLoading, setIsLoading] = useState(false);
  useEffect(() => {
+   let mounted = true;
+   setIsLoading(true);
    getHistory()
      .then((history) => {
+       if (mounted) {
          setMessages(history);
          scrollToBottom();
+       }
      })
+     .catch((error) => {
+       console.error('Failed to fetch chat history:', error);
+       // Consider showing a user-friendly error message
+     })
+     .finally(() => {
+       if (mounted) {
+         setIsLoading(false);
+       }
+     });
+   return () => {
+     mounted = false;
+   };
- }, [scrollToBottom]);
+ }, []); // Remove scrollToBottom from dependencies
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
getHistory()
.then((history) => {
setMessages(history);
scrollToBottom();
});
}, [scrollToBottom]);
const [isLoading, setIsLoading] = useState(false);
useEffect(() => {
let mounted = true;
setIsLoading(true);
getHistory()
.then((history) => {
if (mounted) {
setMessages(history);
scrollToBottom();
}
})
.catch((error) => {
console.error('Failed to fetch chat history:', error);
// Consider showing a user-friendly error message
})
.finally(() => {
if (mounted) {
setIsLoading(false);
}
});
return () => {
mounted = false;
};
}, []); // Remove scrollToBottom from dependencies


const handleSearchSubmit = useCallback((event: React.FormEvent<HTMLFormElement>) => {
event.preventDefault();

Expand All @@ -78,7 +89,7 @@ export default function SearchView() {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: inputValue,
query: inputValue.trim(),
borisarzentar marked this conversation as resolved.
Show resolved Hide resolved
searchType: searchTypeValue,
}),
})
Expand Down
2 changes: 1 addition & 1 deletion cognee/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from .api.v1.add import add
from .api.v1.cognify import cognify
from .api.v1.datasets.datasets import datasets
from .api.v1.search import search, SearchType
from .api.v1.search import search, SearchType, get_search_history
from .api.v1.prune import prune

# Pipelines
Expand Down
1 change: 1 addition & 0 deletions cognee/api/v1/search/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .search_v2 import search, SearchType
from .get_search_history import get_search_history
9 changes: 9 additions & 0 deletions cognee/api/v1/search/get_search_history.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from cognee.modules.search.operations import get_history
from cognee.modules.users.methods import get_default_user
from cognee.modules.users.models import User

async def get_search_history(user: User = None) -> list:
if not user:
user = await get_default_user()

return await get_history(user.id)
borisarzentar marked this conversation as resolved.
Show resolved Hide resolved
borisarzentar marked this conversation as resolved.
Show resolved Hide resolved
29 changes: 25 additions & 4 deletions cognee/api/v1/search/routers/get_search_router.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
from cognee.api.v1.search import SearchType
from uuid import UUID
from datetime import datetime
from fastapi import Depends, APIRouter
from fastapi.responses import JSONResponse
from cognee.api.v1.search import SearchType
from cognee.api.DTO import InDTO, OutDTO
from cognee.modules.users.models import User
from fastapi import Depends, APIRouter
from cognee.api.DTO import InDTO
from cognee.modules.search.operations import get_history
from cognee.modules.users.methods import get_authenticated_user


Expand All @@ -13,6 +16,24 @@ class SearchPayloadDTO(InDTO):
def get_search_router() -> APIRouter:
router = APIRouter()

class SearchHistoryItem(OutDTO):
id: UUID
text: str
user: str
created_at: datetime

@router.get("/", response_model = list[SearchHistoryItem])
async def get_search_history(user: User = Depends(get_authenticated_user)):
try:
history = await get_history(user.id)

return history
except Exception as error:
return JSONResponse(
status_code = 500,
content = {"error": str(error)}
)

Comment on lines +25 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and add pagination

  1. The generic exception handling could mask specific errors. Consider handling specific exceptions:
     try:
         history = await get_history(user.id)
         return history
-    except Exception as error:
+    except ValueError as error:
         return JSONResponse(
             status_code = 500,
             content = {"error": str(error)}
         )
+    except Exception as error:
+        # Log the unexpected error
+        return JSONResponse(
+            status_code = 500,
+            content = {"error": "Internal server error"}
+        )
  1. Consider adding pagination to handle large result sets:
@router.get("/", response_model=Page[SearchHistoryItem])
async def get_search_history(
    user: User = Depends(get_authenticated_user),
    page: int = Query(1, ge=1),
    size: int = Query(20, ge=1, le=100)
):

@router.post("/", response_model = list)
async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)):
""" This endpoint is responsible for searching for nodes in the graph."""
Expand All @@ -28,4 +49,4 @@ async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticat
content = {"error": str(error)}
)

return router
return router
17 changes: 12 additions & 5 deletions cognee/api/v1/search/search_v2.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import json
from uuid import UUID
from enum import Enum
from typing import Callable, Dict
from cognee.modules.search.operations import log_query, log_result
from cognee.modules.storage.utils import JSONEncoder
from cognee.shared.utils import send_telemetry
from cognee.modules.users.models import User
from cognee.modules.users.methods import get_default_user
Expand All @@ -14,15 +17,17 @@ class SearchType(Enum):
INSIGHTS = "INSIGHTS"
CHUNKS = "CHUNKS"

async def search(search_type: SearchType, query: str, user: User = None) -> list:
async def search(query_type: SearchType, query_text: str, user: User = None) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe rename SearchType to QueryType?

if user is None:
user = await get_default_user()

if user is None:
raise PermissionError("No user found in the system. Please create a user.")

query = await log_query(query_text, str(query_type), user.id)

own_document_ids = await get_document_ids_for_user(user.id)
search_results = await specific_search(search_type, query, user)
search_results = await specific_search(query_type, query_text, user)

filtered_search_results = []

Expand All @@ -33,19 +38,21 @@ async def search(search_type: SearchType, query: str, user: User = None) -> list
if document_id is None or document_id in own_document_ids:
filtered_search_results.append(search_result)

await log_result(query.id, json.dumps(filtered_search_results, cls = JSONEncoder), user.id)

return filtered_search_results

async def specific_search(search_type: SearchType, query: str, user) -> list:
async def specific_search(query_type: SearchType, query: str, user) -> list:
borisarzentar marked this conversation as resolved.
Show resolved Hide resolved
search_tasks: Dict[SearchType, Callable] = {
SearchType.SUMMARIES: query_summaries,
SearchType.INSIGHTS: query_graph_connections,
SearchType.CHUNKS: query_chunks,
}

search_task = search_tasks.get(search_type)
search_task = search_tasks.get(query_type)

if search_task is None:
raise ValueError(f"Unsupported search type: {search_type}")
raise ValueError(f"Unsupported search type: {query_type}")

send_telemetry("cognee.search EXECUTION STARTED", user.id)

Expand Down
3 changes: 0 additions & 3 deletions cognee/infrastructure/databases/relational/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,3 @@
from .config import get_relational_config
from .create_db_and_tables import create_db_and_tables
from .get_relational_engine import get_relational_engine

# Global data types
from .data_types.UUID import UUID
45 changes: 0 additions & 45 deletions cognee/infrastructure/databases/relational/data_types/UUID.py

This file was deleted.

4 changes: 2 additions & 2 deletions cognee/infrastructure/pipeline/models/Operation.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from datetime import datetime
from datetime import datetime, timezone
from sqlalchemy.orm import Mapped, MappedColumn
from sqlalchemy import Column, DateTime, ForeignKey, Enum, JSON
from cognee.infrastructure.databases.relational import Base, UUID
Expand All @@ -24,4 +24,4 @@ class Operation(Base):
data_id = Column(UUID, ForeignKey("data.id"))
meta_data: Mapped[dict] = MappedColumn(type_ = JSON)

created_at = Column(DateTime, default = datetime.utcnow)
created_at = Column(DateTime, default = datetime.now(timezone.utc))
4 changes: 2 additions & 2 deletions cognee/modules/data/models/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from typing import List
from datetime import datetime, timezone
from sqlalchemy.orm import relationship, Mapped
from sqlalchemy import Column, String, DateTime
from cognee.infrastructure.databases.relational import Base, UUID
from sqlalchemy import Column, String, DateTime, UUID
from cognee.infrastructure.databases.relational import Base
from .DatasetData import DatasetData

class Data(Base):
Expand Down
4 changes: 2 additions & 2 deletions cognee/modules/data/models/Dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from typing import List
from datetime import datetime, timezone
from sqlalchemy.orm import relationship, Mapped
from sqlalchemy import Column, Text, DateTime
from cognee.infrastructure.databases.relational import Base, UUID
from sqlalchemy import Column, Text, DateTime, UUID
from cognee.infrastructure.databases.relational import Base
Comment on lines +5 to +6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Standardize UUID imports across remaining model files

Two model files still use the local UUID import:

  • cognee/modules/pipelines/models/TaskRun.py
  • cognee/modules/pipelines/models/Task.py

These files should be updated to import UUID directly from SQLAlchemy for consistency with the rest of the codebase.

🔗 Analysis chain

LGTM! Good standardization of UUID type handling.

The change to import UUID directly from SQLAlchemy instead of a local module is a good practice as it:

  • Uses SQLAlchemy's native type system
  • Ensures consistent UUID handling across different database backends
  • Reduces dependency on local implementations

Let's verify this change is consistently applied across all model files:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining local UUID imports in model files
# and verify all UUID column definitions are using SQLAlchemy's UUID type

# Check for any remaining imports from local UUID module
echo "Checking for remaining local UUID imports..."
rg "from cognee\.infrastructure\.databases\.relational import.*UUID" "cognee/modules"

# Verify UUID column definitions in model files
echo "Checking UUID column definitions..."
rg "Column\(UUID" "cognee/modules"

Length of output: 3343

from .DatasetData import DatasetData

class Dataset(Base):
Expand Down
4 changes: 2 additions & 2 deletions cognee/modules/data/models/DatasetData.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, ForeignKey
from cognee.infrastructure.databases.relational import Base, UUID
from sqlalchemy import Column, DateTime, ForeignKey, UUID
from cognee.infrastructure.databases.relational import Base

class DatasetData(Base):
__tablename__ = "dataset_data"
Expand Down
5 changes: 3 additions & 2 deletions cognee/modules/pipelines/models/Pipeline.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from uuid import uuid4
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, String, Text
from sqlalchemy import Column, DateTime, String, Text, UUID
from sqlalchemy.orm import relationship, Mapped
from cognee.infrastructure.databases.relational import Base, UUID
from cognee.infrastructure.databases.relational import Base
from .PipelineTask import PipelineTask
from .Task import Task

class Pipeline(Base):
__tablename__ = "pipelines"
Expand Down
4 changes: 2 additions & 2 deletions cognee/modules/pipelines/models/PipelineRun.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import enum
from uuid import uuid4
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, JSON, Enum
from cognee.infrastructure.databases.relational import Base, UUID
from sqlalchemy import Column, DateTime, JSON, Enum, UUID
from cognee.infrastructure.databases.relational import Base

class PipelineRunStatus(enum.Enum):
DATASET_PROCESSING_STARTED = "DATASET_PROCESSING_STARTED"
Expand Down
4 changes: 2 additions & 2 deletions cognee/modules/pipelines/models/PipelineTask.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, ForeignKey
from cognee.infrastructure.databases.relational import Base, UUID
from sqlalchemy import Column, DateTime, ForeignKey, UUID
from cognee.infrastructure.databases.relational import Base

class PipelineTask(Base):
__tablename__ = "pipeline_task"
Expand Down
16 changes: 16 additions & 0 deletions cognee/modules/search/models/Query.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from uuid import uuid4
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, String, UUID
from cognee.infrastructure.databases.relational import Base

class Query(Base):
__tablename__ = "queries"

id = Column(UUID, primary_key = True, default = uuid4)

text = Column(String)
query_type = Column(String)
user_id = Column(UUID)
Comment on lines +11 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add constraints for data integrity and security.

Several improvements could be made to enhance data integrity and security:

  1. Add a maximum length constraint to the text column to prevent potential DoS attacks
  2. Add an enum or check constraint for query_type to ensure valid values
  3. Add a foreign key constraint and index for user_id
-    text = Column(String)
-    query_type = Column(String)
-    user_id = Column(UUID)
+    text = Column(String(1024), nullable=False)
+    query_type = Column(String(50), nullable=False)
+    user_id = Column(UUID, ForeignKey('users.id'), nullable=False, index=True)

Also consider adding an enum for query_type:

from enum import Enum
class QueryType(str, Enum):
    SEARCH = "search"
    FILTER = "filter"
    # add other types as needed

query_type = Column(Enum(QueryType), nullable=False)


created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
16 changes: 16 additions & 0 deletions cognee/modules/search/models/Result.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from datetime import datetime, timezone
from uuid import uuid4
from sqlalchemy import Column, DateTime, Text, UUID
from cognee.infrastructure.databases.relational import Base

class Result(Base):
__tablename__ = "results"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename results to query_results?


id = Column(UUID, primary_key = True, default = uuid4)

value = Column(Text)
query_id = Column(UUID)
user_id = Column(UUID, index = True)
Comment on lines +11 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding a foreign key constraint for query_id.

The query_id field appears to reference another table but lacks a foreign key constraint. This could lead to data integrity issues.

-    query_id = Column(UUID)
+    query_id = Column(UUID, ForeignKey("queries.id"), nullable=False)

Also, fix PEP 8 spacing in user_id definition:

-    user_id = Column(UUID, index = True)
+    user_id = Column(UUID, index=True)

Committable suggestion skipped: line range outside the PR's diff.


created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
Comment on lines +15 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Fix timezone handling inconsistencies and standardize datetime usage

There are several inconsistencies in datetime handling across the codebase:

  • Most SQLAlchemy models correctly use timezone-aware DateTime columns with UTC timestamps
  • However, there are problematic cases that need attention:
    • cognee/infrastructure/pipeline/models/Operation.py uses datetime.utcnow without timezone parameter
    • tools/daily_twitter_stats.py, cognee/shared/utils.py, and cognee/infrastructure/databases/graph/falkordb/adapter.py use naive datetime.now() calls
    • The commented code in falkordb adapter shows string-formatted timestamps

Suggested fixes:

  1. Update Operation.py to use timezone-aware UTC timestamps like other models
  2. Replace naive datetime.now() calls with datetime.now(timezone.utc)
  3. Use ISO format with timezone info when string formatting is needed
🔗 Analysis chain

Fix PEP 8 spacing and verify timestamp handling.

  1. Remove spaces around the = operator in keyword arguments.
-    created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
-    updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
+    created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
+    updated_at = Column(DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc))
  1. Ensure consistent timestamp handling across the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent timezone handling across the codebase
# Look for potential timezone-related issues or inconsistencies

# Check for datetime usage without explicit timezone
echo "Checking for datetime usage without explicit timezone..."
rg "datetime\.now\(\)" --type py

# Check for timezone-aware datetime column definitions
echo "Checking DateTime column definitions..."
rg "Column\(DateTime" --type py -A 1

Length of output: 5063

Comment on lines +1 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add a foreign key constraint for query_id referencing Query.id

The Query model exists at cognee/modules/search/models/Query.py with a matching UUID primary key, but the Result model's query_id column lacks a proper foreign key constraint. You should modify the query_id column definition to:

query_id = Column(UUID, ForeignKey("queries.id"))
  • Location: cognee/modules/search/models/Result.py, line 12
🔗 Analysis chain

Verify Query model exists for the foreign key relationship.


Let me gather more information about the Query model structure to verify the relationship properly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Query model exists and its structure

# Look for Query model definition
echo "Searching for Query model definition..."
rg -l "class Query.*Base" --type py

# Check Query model structure
echo "Checking Query model structure..."
ast-grep --pattern 'class Query {
  $$$
  id = Column($$$)
  $$$
}'

Length of output: 311


Script:

#!/bin/bash
# Get the content of Query model to verify its structure
echo "Content of Query model:"
cat cognee/modules/search/models/Query.py

# Check if there are any existing relationships between Query and Result
echo -e "\nChecking for existing relationships:"
rg "relationship|ForeignKey.*results|ForeignKey.*queries" --type py

Length of output: 11320

3 changes: 3 additions & 0 deletions cognee/modules/search/operations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .log_query import log_query
from .log_result import log_result
from .get_history import get_history
31 changes: 31 additions & 0 deletions cognee/modules/search/operations/get_history.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from uuid import UUID
from sqlalchemy import literal, select
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models.Query import Query
from ..models.Result import Result

async def get_history(user_id: UUID, limit: int = 10) -> list[Result]:
db_engine = get_relational_engine()

Comment on lines +7 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and input validation.

Consider these improvements for robustness:

  1. Add error handling for database connection failures
  2. Validate that the limit parameter is positive
 async def get_history(user_id: UUID, limit: int = 10) -> list[Result]:
+    if limit <= 0:
+        raise ValueError("Limit must be positive")
+
     db_engine = get_relational_engine()
+    if not db_engine:
+        raise RuntimeError("Failed to connect to database")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def get_history(user_id: UUID, limit: int = 10) -> list[Result]:
db_engine = get_relational_engine()
async def get_history(user_id: UUID, limit: int = 10) -> list[Result]:
if limit <= 0:
raise ValueError("Limit must be positive")
db_engine = get_relational_engine()
if not db_engine:
raise RuntimeError("Failed to connect to database")

queries_query = select(
Query.id,
Query.text.label("text"),
Query.created_at,
literal("user").label("user")
) \
.filter(Query.user_id == user_id)

results_query = select(
Result.id,
Result.value.label("text"),
Result.created_at,
literal("system").label("user")
) \
.filter(Result.user_id == user_id)

history_query = queries_query.union(results_query).order_by("created_at").limit(limit)

async with db_engine.get_async_session() as session:
history = (await session.execute(history_query)).all()

return history
Comment on lines +26 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and result processing.

The query execution should handle potential database errors and ensure proper result type conversion.

     history_query = queries_query.union(results_query).order_by("created_at").limit(limit)

     async with db_engine.get_async_session() as session:
-        history = (await session.execute(history_query)).all()
-
-        return history
+        try:
+            result = await session.execute(history_query)
+            history = result.all()
+            return [
+                Result(
+                    id=item.id,
+                    text=item.text,
+                    created_at=item.created_at,
+                    user=item.user
+                ) for item in history
+            ]
+        except Exception as e:
+            raise RuntimeError(f"Failed to fetch history: {str(e)}") from e

Committable suggestion was skipped due to low confidence.

Comment on lines +1 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Database indices needed for query performance

The codebase analysis reveals that neither the models nor migrations define indices for frequently queried columns. For optimal performance of the history query, you should add the following indices:

  • user_id and created_at columns in both queries and results tables
  • Consider a composite index on (user_id, created_at) as they are used together in filtering and sorting
🔗 Analysis chain

Verify database index coverage for performance.

The queries filter and order by frequently accessed columns. Ensure appropriate indices exist for optimal performance.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for index definitions in model files and migrations

# Look for index definitions in the Query model
echo "Checking Query model indices..."
rg -A 5 "Index|index" "cognee/modules/search/models/Query.py"

# Look for index definitions in the Result model
echo "Checking Result model indices..."
rg -A 5 "Index|index" "cognee/modules/search/models/Result.py"

# Look for index creation in migrations
echo "Checking migrations for indices..."
fd -e py . migrations/ --exec rg -A 5 "create_index|CreateIndex"

Length of output: 517


Script:

#!/bin/bash
# Let's try a different approach to find model definitions and migrations

# First, let's locate the model files to understand their structure
echo "Locating model files..."
fd -e py Query.py Result.py

# Look for any SQLAlchemy model definitions and index configurations
echo "Checking model definitions..."
rg -A 10 "class Query|class Result" 

# Look for migration files in common locations
echo "Finding migration files..."
fd -e py . alembic migrations database/migrations db/migrations

# Check for any SQL files that might contain index definitions
echo "Checking SQL files..."
fd -e sql . --exec rg -l "CREATE INDEX|create index"

Length of output: 2454


Script:

#!/bin/bash
# Let's check the initial migration file for table creation and indices
echo "Checking initial migration content..."
cat "alembic/versions/8057ae7329c2_initial_migration.py"

# Also check the model imports to ensure we haven't missed any index definitions
echo "Checking model imports..."
rg -A 5 "from sqlalchemy" "cognee/modules/search/models/Query.py" "cognee/modules/search/models/Result.py"

Length of output: 1792

17 changes: 17 additions & 0 deletions cognee/modules/search/operations/get_queries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from uuid import UUID
from sqlalchemy import select
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models.Query import Query

async def get_queries(user_id: UUID, limit: int) -> list[Query]:
db_engine = get_relational_engine()

async with db_engine.get_async_session() as session:
queries = (await session.scalars(
select(Query)
.filter(Query.user_id == user_id)
.order_by(Query.created_at.desc())
.limit(limit)
)).all()

return queries
Loading
Loading