Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log search queries and results #166

Merged
merged 9 commits into from
Nov 17, 2024

Conversation

borisarzentar
Copy link
Contributor

@borisarzentar borisarzentar commented Oct 25, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new search history retrieval feature, allowing users to access their previous search queries.
    • Added a new FastAPI endpoint for fetching search history for authenticated users.
    • Enhanced the SearchView component to automatically fetch chat history on mount.
    • Implemented asynchronous functions to log queries and results, improving search tracking capabilities.
  • Bug Fixes

    • Improved input handling by trimming whitespace from search queries.
  • Tests

    • Updated tests to validate the new search history functionality and ensure consistent parameter naming in search calls.

@borisarzentar borisarzentar self-assigned this Oct 25, 2024
Copy link
Contributor

coderabbitai bot commented Oct 25, 2024

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)
  • cognee/tests/integration/run_toy_tasks/data/cognee_db

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces several significant changes across multiple files, primarily focused on enhancing search functionality and managing search history. A new function getHistory is added to fetch chat history, which is integrated into the SearchView component. Additionally, a new router is created for handling search-related operations, including a GET endpoint for retrieving search history. Several asynchronous functions for logging queries and results, as well as fetching user-specific queries and results, are also introduced. The changes improve the overall API structure and enhance the handling of search-related data.

Changes

File Change Summary
cognee-frontend/src/modules/chat/getHistory.ts Added getHistory function to fetch and return chat history as JSON.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx Integrated getHistory in useEffect to fetch chat history on component mount; updated handleSearchSubmit to trim inputValue.
cognee/__init__.py Added import for get_search_history function from api.v1.search.
cognee/api/v1/search/__init__.py Added import for get_search_history function.
cognee/api/v1/search/get_search_history.py Introduced get_search_history function to retrieve user-specific search history.
cognee/api/v1/search/search_v2.py Updated parameter names in search and specific_search functions for clarity; enhanced logging functionality.
cognee/modules/search/models/Query.py Added Query class representing the "queries" database table with relevant attributes.
cognee/modules/search/models/Result.py Added Result class representing the "results" database table with relevant attributes.
cognee/modules/search/operations/__init__.py Added imports for log_query, log_result, and get_history.
cognee/modules/search/operations/get_history.py Introduced get_history function to retrieve combined query and result records for a user.
cognee/modules/search/operations/get_queries.py Introduced get_queries function to fetch user-specific queries.
cognee/modules/search/operations/get_results.py Introduced get_results function to fetch user-specific results.
cognee/modules/search/operations/log_query.py Introduced log_query function to log a new query in the database.
cognee/modules/search/operations/log_result.py Introduced log_result function to log a new result in the database.
cognee/tests/test_library.py Updated search function calls to use query_text and added validation for search history retrieval.
cognee/tests/test_neo4j.py Updated search function calls and added search history validation.
cognee/tests/test_pgvector.py Updated search function calls and added search history validation.
cognee/tests/test_qdrant.py Updated search function calls and added search history validation.
cognee/tests/test_weaviate.py Updated search function calls and added search history validation.
cognee/api/v1/search/routers/get_search_router.py Introduced a new router with a GET endpoint for retrieving search history and defined SearchHistoryItem class.
cognee/infrastructure/pipeline/models/Operation.py Updated created_at column to use timezone-aware default value.
cognee/infrastructure/databases/relational/__init__.py Removed import for UUID data type.
cognee/infrastructure/databases/relational/data_types/UUID.py Deleted file that contained a custom SQLAlchemy type decorator for UUID.
cognee/modules/data/models/Data.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/data/models/Dataset.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/data/models/DatasetData.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/pipelines/models/Pipeline.py Changed import for UUID to be sourced directly from sqlalchemy; added relationship with Task.
cognee/modules/pipelines/models/PipelineRun.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/pipelines/models/PipelineTask.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/ACL.py Changed import for UUID to be sourced directly from sqlalchemy; updated resources attribute to use Mapped.
cognee/modules/users/models/ACLResources.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/Group.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/Permission.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/Principal.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/Resource.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/modules/users/models/User.py Changed import for UUID to be sourced directly from sqlalchemy; added relationship with Group.
cognee/modules/users/models/UserGroup.py Changed import for UUID to be sourced directly from sqlalchemy.
cognee/tests/integration/run_toy_tasks/conftest.py Modified copy_cognee_db_to_target_location function to a no-op.

Possibly related PRs

  • Cog 170 pgvector adapter #158: Introduces getHistory function to fetch chat history, directly related to the get_search_history function for retrieving historical data.
  • Cog 334 structure routing #164: Changes in get_search_router include a new endpoint that may utilize the getHistory function for fetching search history.

🐇 In the land of code where rabbits play,
New functions hop in, brightening the day.
Fetching histories, logging with glee,
Search queries dance, as clear as can be.
With every change, our features grow wide,
Hopping through code, with joy as our guide! 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

🧹 Outside diff range and nitpick comments (23)
cognee-frontend/src/modules/chat/getHistory.ts (1)

3-8: Add JSDoc documentation.

Adding documentation will help other developers understand the purpose and usage of this function.

Add this documentation above the function:

+/**
+ * Fetches the search history from the API.
+ * @returns Promise containing the search history items
+ * @throws {Error} If the network request fails or returns non-OK status
+ */
export default function getHistory() {
cognee/api/v1/search/get_search_history.py (1)

5-5: Add comprehensive docstring with return type details.

The function lacks documentation about its purpose and the structure of the returned list.

Add a docstring:

 async def get_search_history(user: User = None) -> list:
+    """Retrieve search history for a given user or the default user.
+
+    Args:
+        user (User, optional): The user whose search history to retrieve.
+            If None, the default user's history will be fetched.
+
+    Returns:
+        list: A list of search history entries, where each entry is a dict containing
+            search query details (specify the actual structure here).
+
+    Raises:
+        ValueError: If unable to retrieve the default user or search history.
+    """
cognee/modules/search/operations/log_result.py (1)

5-15: Consider architectural improvements for scalability.

As this function handles search logging, consider these architectural improvements:

  1. For high-volume scenarios, consider implementing batch processing of logs
  2. Add metrics collection for monitoring (e.g., logging latency, success rate)
  3. Implement a data retention policy for search logs
  4. Consider adding an index on (user_id, query_id) for efficient querying

Would you like assistance in implementing any of these improvements?

cognee/modules/search/operations/log_query.py (1)

5-19: Consider adding query result logging.

Since this is a search query logging system, it might be valuable to also log the number of results returned or other relevant metrics about the search operation.

Consider extending the Query model and this function to include:

  • Number of results returned
  • Search execution time
  • Filters applied
  • Sort criteria
    This would provide valuable analytics data for improving search functionality.
cognee/modules/search/models/Result.py (3)

6-7: Add a docstring to document the model's purpose.

Adding a docstring would improve code documentation and help other developers understand the purpose and usage of this model.

 class Result(Base):
+    """
+    Represents a search result in the database.
+    
+    This model stores individual search results along with their associated
+    query and user information, enabling search history tracking and analysis.
+    """
     __tablename__ = "results"

9-9: Fix PEP 8 spacing around operators.

Remove spaces around the = operator in keyword arguments to follow PEP 8 style guidelines.

-    id = Column(UUID, primary_key = True, default = uuid4)
+    id = Column(UUID, primary_key=True, default=uuid4)

11-13: Add foreign key constraints and indexes for better data integrity and performance.

Consider the following improvements:

  1. Add foreign key constraints for query_id and user_id to ensure referential integrity
  2. Add indexes for fields that will be frequently queried
-    query_id = Column(UUID)
-    user_id = Column(UUID)
+    query_id = Column(UUID, ForeignKey("queries.id"), index=True)
+    user_id = Column(UUID, ForeignKey("users.id"), index=True)
+
+    # Add relationships for easier querying
+    query = relationship("Query", back_populates="results")
+    user = relationship("User", back_populates="search_results")

Don't forget to import the necessary SQLAlchemy components:

from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
cognee/modules/search/models/Query.py (3)

6-7: Consider using a more specific table name.

The table name "queries" is quite generic and could potentially conflict with other tables in different schemas. Consider using a more specific name like "search_queries" or defining a specific schema for this table.

 class Query(Base):
-    __tablename__ = "queries"
+    __tablename__ = "search_queries"

9-9: Fix spacing around operators to match PEP 8.

While the UUID primary key implementation is good, the spacing around operators should be consistent with PEP 8.

-    id = Column(UUID, primary_key = True, default = uuid4)
+    id = Column(UUID, primary_key=True, default=uuid4)

15-16: Simplify timestamp implementations and fix spacing.

The timestamp columns can be simplified by removing the lambda functions, and the spacing should be consistent with PEP 8.

-    created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
-    updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
+    created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc))
+    updated_at = Column(DateTime(timezone=True), onupdate=datetime.now(timezone.utc))
cognee/modules/search/operations/get_queries.py (1)

6-17: Consider performance optimizations for scalability.

  1. Consider implementing cursor-based pagination for better performance with large datasets.
  2. Select only necessary fields instead of retrieving all columns, especially if the Query model contains large fields.

Example optimization:

select(Query.id, Query.text, Query.created_at)  # Select specific fields

Also, consider adding an index on (user_id, created_at) to optimize the query performance.

cognee/modules/search/operations/get_results.py (1)

9-15: Consider adding pagination support.

While the limit parameter helps control the result set size, implementing proper pagination would be beneficial for handling large datasets efficiently.

Consider adding offset or cursor-based pagination to allow clients to fetch results in chunks. This would improve performance and resource utilization when dealing with large result sets.

cognee/api/v1/search/search_v2.py (1)

44-44: Keep parameter naming consistent across functions.

For consistency with the main search function, consider renaming the query parameter to query_text.

-async def specific_search(query_type: SearchType, query: str, user) -> list:
+async def specific_search(query_type: SearchType, query_text: str, user) -> list:
cognee/tests/test_library.py (1)

38-43: Enhance test assertions for INSIGHTS search.

While the basic non-empty check is good, consider adding more specific assertions:

  1. Validate the structure and content of search results
  2. Remove or guard print statements behind a debug flag

Example improvement:

     search_results = await cognee.search(SearchType.INSIGHTS, query_text = random_node_name)
-    assert len(search_results) != 0, "The search results list is empty."
-    print("\n\nExtracted sentences are:\n")
-    for result in search_results:
-        print(f"{result}\n")
+    assert len(search_results) > 0, "The search results list is empty."
+    for result in search_results:
+        assert isinstance(result, str), "Search result should be a string"
+        assert len(result.strip()) > 0, "Search result should not be empty"
+        if logging.getLogger().isEnabledFor(logging.DEBUG):
+            print(f"Found insight: {result}")
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)

Line range hint 1-92: Consider architectural improvements for better maintainability

The component currently handles multiple responsibilities. Consider these improvements:

  1. Extract message management logic into a custom hook (e.g., useMessages)
  2. Move API calls to a separate service layer
  3. Implement request debouncing/throttling for search submissions
  4. Add retry mechanism for failed API calls

Example structure:

// hooks/useMessages.ts
export function useMessages() {
  const [messages, setMessages] = useState<Message[]>([]);
  const addMessage = useCallback((message: Message) => {
    setMessages(prev => [...prev, message]);
  }, []);
  return { messages, addMessage };
}

// services/searchService.ts
export class SearchService {
  static async search(query: string, type: string) {
    return fetch('/v1/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ query, searchType: type }),
    });
  }
}
cognee/tests/test_neo4j.py (1)

54-58: Fix typo in print statement.

There's a typo in the print statement: "\n\Extracted" should be "\n\nExtracted" for consistency with other print statements.

-    print("\n\Extracted summaries are:\n")
+    print("\n\nExtracted summaries are:\n")
cognee/tests/test_weaviate.py (1)

41-41: LGTM! Consider enhancing test assertions.

The parameter rename from query to query_text improves clarity. While the basic non-empty assertions are good, consider adding more specific assertions to verify:

  • Result content/format matches expected search type
  • Result relevance to the search query
  • Result ordering/scoring if applicable

Example enhancement for INSIGHTS search:

def test_insight_result_format(result):
    assert isinstance(result, str), "Insight should be a string"
    assert len(result.split()) >= 5, "Insight should be a complete sentence"
    assert random_node_name.lower() in result.lower(), "Result should contain search term"

for result in search_results:
    test_insight_result_format(result)

Also applies to: 47-47, 53-53

cognee/tests/test_pgvector.py (4)

71-71: Fix spacing around the assignment operator.

The parameter rename from query to query_text improves clarity, but the spacing around = should be fixed to comply with PEP 8.

-    search_results = await cognee.search(SearchType.INSIGHTS, query_text = random_node_name)
+    search_results = await cognee.search(SearchType.INSIGHTS, query_text=random_node_name)

77-77: Fix spacing around the assignment operator.

The parameter rename improves clarity, but the spacing needs to be fixed.

-    search_results = await cognee.search(SearchType.CHUNKS, query_text = random_node_name)
+    search_results = await cognee.search(SearchType.CHUNKS, query_text=random_node_name)

83-83: Fix spacing around the assignment operator.

The parameter rename improves clarity, but the spacing needs to be fixed.

-    search_results = await cognee.search(SearchType.SUMMARIES, query_text = random_node_name)
+    search_results = await cognee.search(SearchType.SUMMARIES, query_text=random_node_name)

Line range hint 10-91: Consider restructuring tests using pytest fixtures and separate test functions.

The current structure combines multiple test scenarios into a single function, making it harder to maintain and debug. Consider breaking this into separate test functions using pytest fixtures for common setup.

Example structure:

import pytest

@pytest.fixture
async def setup_cognee():
    # Current setup code (config, data directory, etc.)
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    # Add test data
    yield
    # Cleanup if needed

@pytest.mark.asyncio
async def test_insights_search(setup_cognee):
    # Test INSIGHTS search
    
@pytest.mark.asyncio
async def test_chunks_search(setup_cognee):
    # Test CHUNKS search
    
@pytest.mark.asyncio
async def test_summaries_search(setup_cognee):
    # Test SUMMARIES search
    
@pytest.mark.asyncio
async def test_search_history(setup_cognee):
    # Test search history
cognee/api/client.py (1)

Line range hint 326-353: Add logging to the search endpoint.

To fulfill the PR objective of logging search queries and results, the search endpoint should also include logging.

Here's the suggested implementation:

 @app.post("/api/v1/search", response_model = list)
 async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)):
     """ This endpoint is responsible for searching for nodes in the graph."""
     from cognee.api.v1.search import search as cognee_search
 
+    logger.info(f"Processing search request - Type: {payload.search_type}, Query: {payload.query}, User: {user.id}")
     try:
         results = await cognee_search(payload.search_type, payload.query, user)
+        logger.info(f"Search completed - Found {len(results)} results")
         return results
     except Exception as error:
+        logger.error(f"Search failed - Error: {str(error)}")
         return JSONResponse(
             status_code = 409,
             content = {"error": str(error)}
         )
🧰 Tools
🪛 Ruff

372-372: Module level import not at top of file

(E402)

notebooks/cognee_demo.ipynb (1)

Line range hint 803-848: Add example for viewing search history

Since this PR introduces search logging functionality, it would be valuable to include an example demonstrating how to retrieve and display the search history.

Consider adding a new cell with something like:

# Demonstrate how to access search history
history = await cognee.getHistory()
print("\nSearch History:\n")
for entry in history:
    print(f"Query: {entry.query}")
    print(f"Type: {entry.search_type}")
    print(f"Timestamp: {entry.timestamp}\n")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1088b58 and e73243f.

📒 Files selected for processing (21)
  • cognee-frontend/src/modules/chat/getHistory.ts (1 hunks)
  • cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (3 hunks)
  • cognee/init.py (1 hunks)
  • cognee/api/client.py (2 hunks)
  • cognee/api/v1/search/init.py (1 hunks)
  • cognee/api/v1/search/get_search_history.py (1 hunks)
  • cognee/api/v1/search/search_v2.py (3 hunks)
  • cognee/modules/search/models/Query.py (1 hunks)
  • cognee/modules/search/models/Result.py (1 hunks)
  • cognee/modules/search/operations/init.py (1 hunks)
  • cognee/modules/search/operations/get_history.py (1 hunks)
  • cognee/modules/search/operations/get_queries.py (1 hunks)
  • cognee/modules/search/operations/get_results.py (1 hunks)
  • cognee/modules/search/operations/log_query.py (1 hunks)
  • cognee/modules/search/operations/log_result.py (1 hunks)
  • cognee/tests/test_library.py (1 hunks)
  • cognee/tests/test_neo4j.py (1 hunks)
  • cognee/tests/test_pgvector.py (1 hunks)
  • cognee/tests/test_qdrant.py (1 hunks)
  • cognee/tests/test_weaviate.py (1 hunks)
  • notebooks/cognee_demo.ipynb (3 hunks)
🔥 Files not summarized due to errors (1)
  • notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
✅ Files skipped from review due to trivial changes (1)
  • cognee/modules/search/operations/init.py
🔇 Additional comments (26)
cognee/api/v1/search/__init__.py (1)

2-2: LGTM! Clean import that follows best practices.

The import statement is well-organized and makes the new search history functionality available at the package level, which aligns with the PR's objective of logging search queries and results.

cognee-frontend/src/modules/chat/getHistory.ts (1)

4-6: Verify authentication requirements.

The endpoint might require authentication. Let's verify if authentication headers are needed.

✅ Verification successful

Authentication headers are automatically added by the fetch utility

The custom fetch utility in utils/fetch.ts automatically injects the Authorization header with a Bearer token for all API calls. No explicit authentication headers are needed in the getHistory.ts implementation as they are handled by the utility wrapper.

  • The fetch utility adds Authorization: Bearer ${localStorage.getItem('access_token')} to all requests
  • All API calls are prefixed with http://127.0.0.1:8000/api base URL
  • Server errors are automatically handled via handleServerErrors
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if other API calls in the codebase use authentication headers
rg -l "fetch.*headers.*Authorization" --type ts

Length of output: 49


Script:

#!/bin/bash
# Let's try a broader search for fetch calls with headers
rg "fetch.*headers" --type ts -A 3

# Also search for any authentication-related code
rg "Authorization.*Bearer" --type ts

# Check if there's any auth middleware or interceptors
fd -e ts -e js auth

Length of output: 459


Script:

#!/bin/bash
# Let's check the fetch utility implementation since we found it
cat cognee-frontend/src/utils/fetch.ts

# Also check if the search endpoint is used elsewhere with headers
rg "/v1/search" --type ts -A 3

Length of output: 1050

cognee/__init__.py (1)

5-5: LGTM! Clean addition of the search history function import.

The import is well-organized, grouped logically with related search functionality, and follows the existing import patterns.

Let's verify that the imported function exists in the source module:

✅ Verification successful

Import verified and correctly implemented

The get_search_history function is properly defined in cognee/api/v1/search/get_search_history.py and correctly exported through cognee/api/v1/search/__init__.py. The function is also actively used across multiple test files, confirming its integration into the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of get_search_history function in the search module
# Expected: The function should be defined in the search module

ast-grep --pattern 'def get_search_history'

Length of output: 45


Script:

#!/bin/bash
# Let's try a different approach to locate the function definition
# First, let's find the search.py file
fd "search.py" --type f

# Then let's examine its content to find the function
rg "get_search_history" -A 3

Length of output: 2032

cognee/api/v1/search/get_search_history.py (1)

1-3: LGTM! Well-organized imports.

The imports are properly structured and all are utilized in the implementation.

cognee/modules/search/operations/log_result.py (1)

1-3: LGTM! Clean and well-organized imports.

The imports are properly structured with standard library imports first, followed by internal infrastructure and models.

cognee/modules/search/operations/log_query.py (2)

1-3: LGTM! Imports are well-organized.

The imports follow the standard pattern of stdlib first, followed by local imports, and all imports are utilized in the code.


5-19: Verify database index for query lookups.

Ensure that appropriate indexes exist for efficient query retrieval, especially if you plan to analyze search patterns or implement search history features.

Let's check if indexes exist for the Query table:

cognee/modules/search/models/Result.py (1)

1-4: LGTM! All necessary imports are present.

cognee/modules/search/models/Query.py (2)

1-4: LGTM! Imports are well-organized and specific.

The imports are correctly structured and include all necessary dependencies for the model.


1-16: Verify no duplicate query logging implementations exist.

Let's ensure this new query logging implementation doesn't duplicate existing functionality.

✅ Verification successful

Let me check the content of the query logging related files to understand their implementation.


Let me check one final thing to see if there are any other tables or models that might be storing queries in a different way.


Based on the comprehensive analysis of the codebase, I can now provide the final response:

No duplicate query logging implementations found - implementation is well organized

The current Query model works together with the Result model in a clean architecture where:

  • Query model stores the search queries
  • Result model stores the corresponding results
  • log_query.py and log_result.py provide the logging operations

The implementation is properly isolated in the search module without any duplicate functionality elsewhere in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other potential query logging implementations
echo "Searching for other query-related models..."
ast-grep --pattern 'class $_(Base):
  $$$
  query = Column($$$)
  $$$'

echo "Searching for query logging related code..."
rg -l 'log.*query|query.*log' --type py

Length of output: 656


Script:

#!/bin/bash
# Check the implementation of query logging operations
echo "Content of log_query.py:"
cat cognee/modules/search/operations/log_query.py

echo -e "\nContent of log_result.py:"
cat cognee/modules/search/operations/log_result.py

# Search for any other Query model definitions
echo -e "\nSearching for Query class definitions:"
rg "class Query" -A 5

Length of output: 1697


Script:

#!/bin/bash
# Look for any other database tables or models that might store queries
echo "Searching for table definitions:"
rg "__tablename__.*=.*" -B2 -A2

# Look for any other query-related columns in models
echo -e "\nSearching for query-related columns:"
ast-grep --pattern "Column($$$query$$$)"

Length of output: 7986

cognee/modules/search/operations/get_queries.py (1)

1-4: LGTM! Imports are well-organized.

The imports follow a logical order and include all necessary dependencies.

cognee/modules/search/operations/get_results.py (1)

1-4: LGTM: Imports are well organized.

The imports follow a good organization pattern (stdlib -> third-party -> local) and include all necessary dependencies.

cognee/modules/search/operations/get_history.py (3)

1-6: LGTM! Imports are well-organized and complete.

The imports cover all necessary dependencies for database operations and model access.


10-16: LGTM! Query construction is secure and well-structured.

The query properly selects necessary fields and includes source identification through the "user" label.


18-24: LGTM! Results query matches queries structure for proper union.

The field mapping and filtering are correctly implemented for union compatibility.

cognee/api/v1/search/search_v2.py (2)

19-19: LGTM! Parameter names are more descriptive now.

The renaming of parameters from search_type to query_type and query to query_text improves code clarity by better reflecting their purpose.


40-40: Verify JSON serialization of search results.

The json.dumps(filtered_search_results) call might fail if the results contain non-serializable objects (e.g., datetime objects, custom classes).

Let's check the search results structure:

✅ Verification successful

JSON serialization of search results is safe

Based on the code analysis:

  1. The filtered_search_results is constructed from search_results which only contains simple dictionary objects with basic types:
    • document_id is converted to UUID or None
    • Other fields are accessed through dictionary access
  2. The query functions return simple data structures:
    • query_summaries returns a list of payloads
    • query_chunks returns a list of dictionaries
    • query_graph_connections returns a list of string tuples
  3. No complex objects (like datetime) are included in the search results structure

The JSON serialization is safe as the data structure contains only basic types that are JSON-serializable.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential non-serializable types in search results

# Look for return type hints or docstrings in the query functions
echo "Checking query functions return types..."
rg -A 5 "def query_(summaries|graph_connections|chunks)"

# Look for actual result construction
echo "Checking result construction in query functions..."
rg "return.*{.*}" --type py

Length of output: 1728


Script:

#!/bin/bash
# Let's check the actual construction of filtered_search_results and the search results

# Check the filtered_search_results construction
echo "Checking filtered_search_results construction..."
rg -B 5 -A 2 "filtered_search_results" 

# Check the actual return values from query functions
echo "Checking return values from query functions..."
rg -B 3 "return" cognee/tasks/summarization/query_summaries.py cognee/tasks/graph/query_graph_connections.py cognee/tasks/chunking/query_chunks.py

# Check if there are any datetime or complex object usages
echo "Checking for datetime and complex object usage..."
rg "datetime|date|time" --type py

Length of output: 24297

cognee/tests/test_library.py (1)

38-38: Verify consistent parameter naming across codebase.

The parameter rename from query to query_text should be consistently applied across all search function calls in the codebase.

Also applies to: 44-44, 50-50

✅ Verification successful

Let me verify the implementation details of the search function to ensure the parameter naming is consistent.


Let me try a different approach to check the search function implementation.


Parameter naming is consistent across the codebase

The search implementation in cognee/api/v1/search/search_v2.py and all test files correctly use query_text parameter. The old query parameter in search.legacy.py is part of the legacy implementation and doesn't affect the current codebase. All vector database adapters and search modules consistently use query_text when needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining instances of the old parameter name
echo "Checking for old parameter usage..."
rg -l 'cognee\.search\([^)]*query\s*=' --type py

echo "Verifying new parameter usage..."
rg -l 'cognee\.search\([^)]*query_text\s*=' --type py

Length of output: 406


Script:

#!/bin/bash
# Check the search function implementation and its parameter name
ast-grep --pattern 'async def search($$$) {$$$}'

Length of output: 50


Script:

#!/bin/bash
# Search for the search function definition and its usage
echo "Checking search function implementation..."
rg -p "async def search" --type py -A 5

echo "Checking all search function calls..."
rg "\.search\(" --type py -A 1

Length of output: 10831

cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)

9-9: LGTM: Clean import addition

The new import for getHistory is properly structured and follows the project's module organization.

cognee/tests/test_qdrant.py (2)

43-43: LGTM! Parameter renaming improves API clarity.

The consistent renaming of query to query_text across all search types makes the API more explicit about the expected input type.

Also applies to: 49-49, 55-55


61-63: ⚠️ Potential issue

Improve test robustness and documentation for search history assertion.

The assertion uses a magic number (6) without explaining why this exact number is expected. This makes the test brittle and harder to maintain.

Consider these improvements:

     history = await cognee.get_search_history()
 
-    assert len(history) == 6, "Search history is not correct."
+    # We expect 2 entries per search operation (query + results)
+    expected_history_length = 2 * 3  # 3 search operations
+    assert len(history) == expected_history_length, f"Expected {expected_history_length} history entries from 3 search operations, but got {len(history)}"

Additionally, consider adding test cases to verify:

  1. The content/structure of history entries
  2. How history behaves with failed searches
  3. History entry order (newest first/last)

Let's check if other test files have similar assertions:

cognee/tests/test_neo4j.py (3)

42-46: LGTM! Parameter rename improves clarity.

The change from query to query_text makes the parameter's purpose more explicit.


48-52: LGTM! Consistent with previous parameter rename.


60-62: 🛠️ Refactor suggestion

Improve search history verification robustness.

The current test has several potential issues:

  1. Using a magic number (6) makes the test brittle and unclear
  2. Only verifying the count without checking the actual history content
  3. No cleanup of search history between test runs

Let's verify the search history implementation:

Consider improving the test:

     history = await cognee.get_search_history()
-    assert len(history) == 6, "Search history is not correct."
+    # Verify history length matches the number of searches performed
+    expected_entries = 3  # Number of search calls in this test
+    assert len(history) == expected_entries, f"Expected {expected_entries} history entries, got {len(history)}"
+    
+    # Verify history content
+    assert any(entry["query"] == random_node_name for entry in history), "Search query not found in history"
+    assert all(entry["search_type"] in [SearchType.INSIGHTS, SearchType.CHUNKS, SearchType.SUMMARIES] 
+              for entry in history), "Invalid search type in history"
cognee/tests/test_weaviate.py (1)

59-59: 🛠️ Refactor suggestion

Verify search history entry structure.

Since this PR's objective is to log search queries and results, we should verify that each history entry contains all required logging fields.

Consider adding structure verification:

def verify_history_entry(entry):
    required_fields = {"type", "query", "timestamp", "results"}
    assert all(field in entry for field in required_fields), \
        f"History entry missing required fields. Expected {required_fields}, got {entry.keys()}"
    assert isinstance(entry["timestamp"], (int, float)), "Timestamp should be numeric"
    assert isinstance(entry["results"], list), "Results should be a list"

for entry in history:
    verify_history_entry(entry)
cognee/api/client.py (1)

18-18: LGTM!

The import statement is correctly placed and follows Python naming conventions.

Comment on lines +4 to +6
return fetch(
'/v1/search',
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding query parameters for pagination and filtering.

Search history could grow large. Consider implementing pagination and filtering capabilities.

Here's a suggested implementation:

+interface HistoryParams {
+  page?: number;
+  limit?: number;
+  startDate?: string;
+  endDate?: string;
+}
+
-export default function getHistory() {
+export default function getHistory(params?: HistoryParams) {
+  const queryParams = new URLSearchParams();
+  if (params?.page) queryParams.append('page', params.page.toString());
+  if (params?.limit) queryParams.append('limit', params.limit.toString());
+  if (params?.startDate) queryParams.append('startDate', params.startDate);
+  if (params?.endDate) queryParams.append('endDate', params.endDate);
+
   return fetch(
-    '/v1/search',
+    `/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`,
   )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return fetch(
'/v1/search',
)
interface HistoryParams {
page?: number;
limit?: number;
startDate?: string;
endDate?: string;
}
export default function getHistory(params?: HistoryParams) {
const queryParams = new URLSearchParams();
if (params?.page) queryParams.append('page', params.page.toString());
if (params?.limit) queryParams.append('limit', params.limit.toString());
if (params?.startDate) queryParams.append('startDate', params.startDate);
if (params?.endDate) queryParams.append('endDate', params.endDate);
return fetch(
`/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`,
)

cognee/api/v1/search/get_search_history.py Show resolved Hide resolved
cognee/api/v1/search/get_search_history.py Show resolved Hide resolved
Comment on lines +5 to +15
async def log_result(query_id: UUID, result: str, user_id: UUID):
db_engine = get_relational_engine()

async with db_engine.get_async_session() as session:
session.add(Result(
value = result,
query_id = query_id,
user_id = user_id,
))

await session.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and input validation.

While the basic implementation works, several important aspects need to be addressed:

  1. Add error handling for database operations
  2. Validate the result string (e.g., check for empty strings, max length)
  3. Add proper transaction handling with rollback support
  4. Include logging for operation status and errors

Here's a suggested implementation:

 from uuid import UUID
+from typing import Optional
+import logging
 from cognee.infrastructure.databases.relational import get_relational_engine
 from ..models.Result import Result
+from sqlalchemy.exc import SQLAlchemyError

+logger = logging.getLogger(__name__)
+
+async def log_result(query_id: UUID, result: str, user_id: UUID) -> Optional[UUID]:
+    if not result or not result.strip():
+        raise ValueError("Result cannot be empty")
+    
+    if len(result) > 10000:  # adjust max length as needed
+        raise ValueError("Result exceeds maximum length")
 
-async def log_result(query_id: UUID, result: str, user_id: UUID):
     db_engine = get_relational_engine()
 
     async with db_engine.get_async_session() as session:
-        session.add(Result(
-            value = result,
-            query_id = query_id,
-            user_id = user_id,
-        ))
+        async with session.begin():
+            try:
+                new_result = Result(
+                    value=result.strip(),
+                    query_id=query_id,
+                    user_id=user_id,
+                )
+                session.add(new_result)
+                await session.commit()
+                logger.info(f"Successfully logged result for query {query_id}")
+                return new_result.id
+            except SQLAlchemyError as e:
+                await session.rollback()
+                logger.error(f"Failed to log result for query {query_id}: {str(e)}")
+                raise

The changes include:

  • Input validation for the result string
  • Proper error handling with rollback support
  • Logging for operation tracking
  • Return value for operation confirmation
  • Transaction management using session.begin()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def log_result(query_id: UUID, result: str, user_id: UUID):
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
session.add(Result(
value = result,
query_id = query_id,
user_id = user_id,
))
await session.commit()
from uuid import UUID
from typing import Optional
import logging
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models.Result import Result
from sqlalchemy.exc import SQLAlchemyError
logger = logging.getLogger(__name__)
async def log_result(query_id: UUID, result: str, user_id: UUID) -> Optional[UUID]:
if not result or not result.strip():
raise ValueError("Result cannot be empty")
if len(result) > 10000: # adjust max length as needed
raise ValueError("Result exceeds maximum length")
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
async with session.begin():
try:
new_result = Result(
value=result.strip(),
query_id=query_id,
user_id=user_id,
)
session.add(new_result)
await session.commit()
logger.info(f"Successfully logged result for query {query_id}")
return new_result.id
except SQLAlchemyError as e:
await session.rollback()
logger.error(f"Failed to log result for query {query_id}: {str(e)}")
raise

cognee/tests/test_weaviate.py Show resolved Hide resolved
Comment on lines +89 to +91
history = await cognee.get_search_history()

assert len(history) == 6, "Search history is not correct."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve the search history verification test.

The current test has several potential issues:

  1. The magic number 6 is not explained (3 searches × 2?)
  2. The assertion message could be more descriptive
  3. The exact count check might be brittle

Consider making the test more robust:

     history = await cognee.get_search_history()
 
-    assert len(history) == 6, "Search history is not correct."
+    # We expect history entries for INSIGHTS, CHUNKS, and SUMMARIES searches
+    expected_searches = 3
+    assert len(history) >= expected_searches, f"Search history should contain at least {expected_searches} entries (one per search type), but found {len(history)}"
+    
+    # Verify that all search types are present in the history
+    search_types = {entry['type'] for entry in history}
+    assert search_types.issuperset({SearchType.INSIGHTS, SearchType.CHUNKS, SearchType.SUMMARIES}), "Not all search types were recorded in history"

Committable suggestion was skipped due to low confidence.

cognee/api/client.py Outdated Show resolved Hide resolved
Comment on lines 360 to 370
@app.get("/api/v1/search", response_model = list[SearchHistoryItem])
async def get_search_history(user: User = Depends(get_authenticated_user)):
try:
history = await get_history(user.id)

return history
except Exception as error:
return JSONResponse(
status_code = 500,
content = {"error": str(error)}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance error handling and add logging.

Several improvements could be made to align with the codebase patterns and PR objectives:

  1. Error response structure should match other endpoints (e.g., see /api/v1/datasets).
  2. Add logging to track API usage and errors.
  3. Consider using more specific error handling.

Here's the suggested implementation:

 @app.get("/api/v1/search", response_model = list[SearchHistoryItem])
 async def get_search_history(user: User = Depends(get_authenticated_user)):
+    logger.info(f"Retrieving search history for user: {user.id}")
     try:
         history = await get_history(user.id)
+        logger.info(f"Retrieved {len(history)} search history items")
         return history
     except Exception as error:
+        logger.error(f"Error retrieving search history: {str(error)}")
-        return JSONResponse(
-            status_code = 500,
-            content = {"error": str(error)}
-        )
+        raise HTTPException(
+            status_code = 500,
+            detail = f"Error retrieving search history: {str(error)}"
+        ) from error

Committable suggestion was skipped due to low confidence.

Comment on lines 845 to 848
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n",
"print(\"\\n\\nExtracted sentences are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent terminology in print message

The print message says "Extracted sentences" while the search type is INSIGHTS. This could be confusing for users following the demo.

-                "print(\"\\n\\nExtracted sentences are:\\n\")\n",
+                "print(\"\\n\\nExtracted insights are:\\n\")\n",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n",
"print(\"\\n\\nExtracted sentences are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n",
"print(\"\\n\\nExtracted insights are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
cognee/api/v1/search/routers/get_search_router.py (4)

1-8: Consider reorganizing imports for better readability

Consider organizing imports into standard library, third-party, and local application imports with a blank line between each group:

# Standard library
from uuid import UUID
from datetime import datetime

# Third-party
from fastapi import Depends, APIRouter
from fastapi.responses import JSONResponse

# Local application
from cognee.api.v1.search import SearchType
from cognee.api.DTO import InDTO, OutDTO
from cognee.modules.users.models import User
from cognee.modules.search.operations import get_history
from cognee.modules.users.methods import get_authenticated_user

19-24: Add field constraints and documentation

Consider adding:

  1. Field constraints (e.g., max length for text)
  2. Class docstring describing the purpose
  3. Field descriptions using Field(..., description="...")
class SearchHistoryItem(OutDTO):
+    """Data transfer object representing a search history item."""
     id: UUID
-    text: str
+    text: str = Field(..., max_length=1000, description="The search query text")
     user: str
     created_at: datetime

Line range hint 37-51: Standardize error handling across endpoints

The search endpoint uses status code 409 while the history endpoint uses 500. Consider standardizing error handling:

  1. Use consistent status codes for similar error types
  2. Implement a common error response structure
from enum import Enum
from pydantic import BaseModel

class ErrorCode(Enum):
    SEARCH_ERROR = "search_error"
    HISTORY_ERROR = "history_error"

class ErrorResponse(BaseModel):
    code: ErrorCode
    message: str
    details: dict | None = None

Line range hint 25-51: Add rate limiting to protect endpoints

Both search and history endpoints could benefit from rate limiting to prevent abuse. Consider implementing rate limiting using FastAPI's dependencies:

from fastapi import Request
from fastapi.middleware.throttling import ThrottlingMiddleware

async def rate_limit(request: Request):
    # Implement rate limiting logic
    pass

@router.get("/", dependencies=[Depends(rate_limit)])
async def get_search_history(...):
    ...

@router.post("/", dependencies=[Depends(rate_limit)])
async def search(...):
    ...
notebooks/cognee_demo.ipynb (2)

794-794: Consider implementing search result caching

Since the same node_name is being used across multiple search types, implementing a caching mechanism could improve performance and reduce redundant searches.

Example implementation:

from functools import lru_cache
from typing import List, Any

class SearchCache:
    @lru_cache(maxsize=1000)
    async def get_search_results(self, search_type: SearchType, query: str) -> List[Any]:
        return await self._perform_search(search_type, query)

Also applies to: 815-815, 836-836


794-794: Add structured logging with metadata

To make search analytics more valuable, consider adding structured logging with additional metadata such as timestamp, user context, and search performance metrics.

Example log structure:

{
    "timestamp": "2024-11-20T10:00:00Z",
    "search_type": "SUMMARIES",
    "query": "node_name",
    "results_count": len(results),
    "execution_time_ms": execution_time,
    "user_id": user_context.id
}

Also applies to: 815-815, 836-836

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e73243f and 9274faf.

📒 Files selected for processing (11)
  • cognee/api/v1/search/routers/get_search_router.py (3 hunks)
  • cognee/api/v1/search/search_v2.py (3 hunks)
  • cognee/infrastructure/pipeline/models/Operation.py (2 hunks)
  • cognee/modules/search/operations/get_queries.py (1 hunks)
  • cognee/modules/search/operations/get_results.py (1 hunks)
  • cognee/tests/test_library.py (1 hunks)
  • cognee/tests/test_neo4j.py (1 hunks)
  • cognee/tests/test_pgvector.py (1 hunks)
  • cognee/tests/test_qdrant.py (1 hunks)
  • cognee/tests/test_weaviate.py (1 hunks)
  • notebooks/cognee_demo.ipynb (3 hunks)
🔥 Files not summarized due to errors (1)
  • notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
🚧 Files skipped from review as they are similar to previous changes (8)
  • cognee/api/v1/search/search_v2.py
  • cognee/modules/search/operations/get_queries.py
  • cognee/modules/search/operations/get_results.py
  • cognee/tests/test_library.py
  • cognee/tests/test_neo4j.py
  • cognee/tests/test_pgvector.py
  • cognee/tests/test_qdrant.py
  • cognee/tests/test_weaviate.py
🔇 Additional comments (3)
cognee/infrastructure/pipeline/models/Operation.py (2)

1-1: LGTM! Good practice adding explicit timezone import.

The addition of the timezone import supports proper timezone handling in the datetime operations.


27-27: Verify handling of timezone-aware timestamps in dependent code.

The change from datetime.utcnow to datetime.now(timezone.utc) is a good improvement as it provides timezone-aware timestamps. However, this change requires verification of dependent code that might assume naive datetimes.

Let's verify the usage of this timestamp in queries and dependent code:

notebooks/cognee_demo.ipynb (1)

794-794: Verify error handling for search operations

The search operations should include proper error handling to ensure failed searches are also logged appropriately.

Comment on lines +25 to +35
@router.get("/", response_model = list[SearchHistoryItem])
async def get_search_history(user: User = Depends(get_authenticated_user)):
try:
history = await get_history(user.id)

return history
except Exception as error:
return JSONResponse(
status_code = 500,
content = {"error": str(error)}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and add pagination

  1. The generic exception handling could mask specific errors. Consider handling specific exceptions:
     try:
         history = await get_history(user.id)
         return history
-    except Exception as error:
+    except ValueError as error:
         return JSONResponse(
             status_code = 500,
             content = {"error": str(error)}
         )
+    except Exception as error:
+        # Log the unexpected error
+        return JSONResponse(
+            status_code = 500,
+            content = {"error": "Internal server error"}
+        )
  1. Consider adding pagination to handle large result sets:
@router.get("/", response_model=Page[SearchHistoryItem])
async def get_search_history(
    user: User = Depends(get_authenticated_user),
    page: int = Query(1, ge=1),
    size: int = Query(20, ge=1, le=100)
):

@@ -791,7 +791,7 @@
"node = (await vector_engine.search(\"Entity_name\", \"sarah.nguyen@example.com\"))[0]\n",
"node_name = node.payload[\"text\"]\n",
"\n",
"search_results = await cognee.search(SearchType.SUMMARIES, query = node_name)\n",
"search_results = await cognee.search(SearchType.SUMMARIES, query_text = node_name)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider implementing search result logging middleware

Given that this PR aims to log search queries and results, and there are multiple search types (SUMMARIES, CHUNKS, INSIGHTS), consider implementing a middleware or decorator to handle logging consistently across all search operations.

Example implementation:

from functools import wraps
from typing import Any, Callable
import logging

def log_search(func: Callable) -> Callable:
    @wraps(func)
    async def wrapper(self, search_type: SearchType, query_text: str, *args, **kwargs) -> Any:
        try:
            results = await func(self, search_type, query_text, *args, **kwargs)
            await self._log_search_operation(
                search_type=search_type,
                query=query_text,
                results=results,
                status="success"
            )
            return results
        except Exception as e:
            await self._log_search_operation(
                search_type=search_type,
                query=query_text,
                error=str(e),
                status="error"
            )
            raise
    return wrapper

# Usage:
# @log_search
# async def search(self, search_type: SearchType, query_text: str) -> List[Any]:

Also applies to: 815-815, 836-836

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
cognee/modules/search/models/Result.py (1)

15-16: Fix PEP 8 spacing while maintaining correct timezone handling.

The timestamp fields correctly use timezone-aware DateTime with UTC, which is great! However, there are spacing issues to fix:

-    created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
-    updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
+    created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
+    updated_at = Column(DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc))
cognee/modules/users/models/User.py (1)

Line range hint 25-34: Consider enhancing schema documentation

The comment "Keep these schemas in sync with User model" could be more specific about what needs to be synchronized. Consider documenting:

  • Which fields from the User model should be reflected in each schema
  • When synchronization is required (e.g., after adding new User fields)
-# Keep these schemas in sync with User model
+# These Pydantic schemas must be updated when adding/modifying fields in the User model:
+# - UserRead: Fields exposed in API responses
+# - UserCreate: Fields required when creating a new user
+# - UserUpdate: Fields that can be modified after user creation
cognee/modules/users/models/ACL.py (1)

Line range hint 23-28: Consider updating other relationship type hints for consistency.

Good update using Mapped type hint for the resources relationship. For consistency, consider updating the other relationships (principal and permission) to also use the Mapped type hint:

-    principal = relationship("Principal")
-    permission = relationship("Permission")
+    principal: Mapped["Principal"] = relationship("Principal")
+    permission: Mapped["Permission"] = relationship("Permission")

This follows SQLAlchemy 2.0 style guidelines and improves type safety.

cognee/modules/data/models/Data.py (1)

Line range hint 34-34: Consider cleaning up commented code in to_json method.

The commented line for datasets serialization should either be implemented or removed to maintain code cleanliness.

-            # "datasets": [dataset.to_json() for dataset in self.datasets]

If you need the datasets in the JSON representation, consider implementing it with proper lazy loading control:

            "datasets": [dataset.to_json() for dataset in self.datasets] if self.datasets.loaded else None
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9274faf and 4efdeda.

📒 Files selected for processing (19)
  • cognee/api/v1/search/search_v2.py (3 hunks)
  • cognee/infrastructure/databases/relational/__init__.py (0 hunks)
  • cognee/infrastructure/databases/relational/data_types/UUID.py (0 hunks)
  • cognee/modules/data/models/Data.py (1 hunks)
  • cognee/modules/data/models/Dataset.py (1 hunks)
  • cognee/modules/data/models/DatasetData.py (1 hunks)
  • cognee/modules/pipelines/models/Pipeline.py (1 hunks)
  • cognee/modules/pipelines/models/PipelineRun.py (1 hunks)
  • cognee/modules/pipelines/models/PipelineTask.py (1 hunks)
  • cognee/modules/search/models/Query.py (1 hunks)
  • cognee/modules/search/models/Result.py (1 hunks)
  • cognee/modules/users/models/ACL.py (1 hunks)
  • cognee/modules/users/models/ACLResources.py (1 hunks)
  • cognee/modules/users/models/Group.py (1 hunks)
  • cognee/modules/users/models/Permission.py (1 hunks)
  • cognee/modules/users/models/Principal.py (1 hunks)
  • cognee/modules/users/models/Resource.py (1 hunks)
  • cognee/modules/users/models/User.py (1 hunks)
  • cognee/modules/users/models/UserGroup.py (1 hunks)
💤 Files with no reviewable changes (2)
  • cognee/infrastructure/databases/relational/init.py
  • cognee/infrastructure/databases/relational/data_types/UUID.py
✅ Files skipped from review due to trivial changes (9)
  • cognee/modules/data/models/DatasetData.py
  • cognee/modules/pipelines/models/PipelineRun.py
  • cognee/modules/pipelines/models/PipelineTask.py
  • cognee/modules/users/models/ACLResources.py
  • cognee/modules/users/models/Group.py
  • cognee/modules/users/models/Permission.py
  • cognee/modules/users/models/Principal.py
  • cognee/modules/users/models/Resource.py
  • cognee/modules/users/models/UserGroup.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • cognee/api/v1/search/search_v2.py
  • cognee/modules/search/models/Query.py
🔇 Additional comments (11)
cognee/modules/search/models/Result.py (3)

1-4: LGTM! Imports are well-organized and complete.


6-7: LGTM! Class definition follows SQLAlchemy conventions.


9-9: Fix PEP 8 spacing violations.

Remove spaces around the = operator in keyword arguments.

-    id = Column(UUID, primary_key = True, default = uuid4)
+    id = Column(UUID, primary_key=True, default=uuid4)
cognee/modules/pipelines/models/Pipeline.py (3)

3-5: LGTM! Good improvement in import organization

The switch to using SQLAlchemy's UUID type directly instead of a custom implementation improves standardization and reduces infrastructure dependencies. The imports are also well-organized following the standard pattern (stdlib → third-party → local).


Line range hint 13-13: LGTM! Well-structured primary key definition

The UUID primary key with automatic uuid4 generation is a good practice for distributed systems and data portability.


Line range hint 21-24: Verify the bidirectional relationship configuration

The relationship with Task model is well-defined using modern SQLAlchemy 2.0 style with proper type hints. Let's verify the corresponding relationship in the Task model and the association table structure.

✅ Verification successful

Let me try a different search pattern to find the relationship configuration.


Based on the search results, I can now generate the final response as I have found all the necessary relationship configurations.

The bidirectional relationship is correctly configured

The relationship configuration is properly set up between Pipeline and Task models through the PipelineTask association table:

  • Pipeline model has tasks relationship with back_populates="pipeline"
  • Task model has datasets relationship with back_populates="task"
  • PipelineTask association table has correct foreign keys:
    • pipeline_id = Column("pipeline", UUID, ForeignKey("pipeline.id"))
    • task_id = Column("task", UUID, ForeignKey("task.id"))

However, there's a minor inconsistency in the Task model where the relationship is named datasets instead of pipelines, which might be confusing but doesn't affect functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify relationship configuration across related models

# Check Task model for corresponding relationship
echo "Checking Task model for 'pipeline' relationship..."
ast-grep --pattern 'class Task($_):
  $$$
  pipeline = $$$relationship($$$)
  $$$'

# Check PipelineTask association table structure
echo "Checking PipelineTask association table definition..."
ast-grep --pattern 'class PipelineTask($_):
  $$$
  pipeline_id = Column(UUID, ForeignKey($$$))
  task_id = Column(UUID, ForeignKey($$$))
  $$$'

Length of output: 465


Script:

#!/bin/bash
# Search for Task and PipelineTask class definitions with more relaxed patterns
echo "Searching for Task class definition..."
rg "class Task" -A 20

echo "\nSearching for PipelineTask class definition..."
rg "class PipelineTask" -A 20

# Also search for any relationship definitions involving Pipeline or Task
echo "\nSearching for relationship definitions..."
rg "relationship\(" -A 2 -B 2

Length of output: 8492

cognee/modules/users/models/User.py (3)

1-8: LGTM! Clear separation of UUID types

The imports correctly distinguish between uuid.UUID for schema type hints and sqlalchemy.UUID for column definitions, following FastAPI+SQLAlchemy best practices.


Line range hint 10-22: LGTM! Well-structured SQLAlchemy 2.0 relationship

The groups relationship is properly defined using:

  • Type-safe Mapped[list["Group"]] annotation
  • Correct secondary table reference
  • Bi-directional relationship with back_populates

Line range hint 10-22: Verify search history relationship requirements

Given that this PR implements search query logging, should the User model include a relationship to track search history?

Let's check if there's a separate search history model that references users:

cognee/modules/users/models/ACL.py (1)

4-5: LGTM! Good practice using direct imports from sqlalchemy.

The change to import UUID directly from sqlalchemy instead of through an intermediate module is a good practice as it reduces indirection and potential dependency issues.

cognee/modules/data/models/Data.py (1)

5-5: LGTM! Verify database compatibility.

The change to import UUID directly from SQLAlchemy aligns with best practices and the project's standardization efforts.

Let's verify if this change affects any existing migrations or data:

✅ Verification successful

Import change is safe and consistent with codebase patterns

The change to import UUID directly from SQLAlchemy is verified to be safe because:

  1. The Data model uses UUID consistently as a primary key type (id = Column(UUID, primary_key = True, default = uuid4))
  2. Other models in the codebase consistently import and use UUID from SQLAlchemy for database columns (e.g., Result, Query, Pipeline models)
  3. The pattern of importing UUID from SQLAlchemy for column definitions and from uuid for type hints/operations is well established across the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any UUID-related migrations and verify column type consistency

# Look for existing migrations containing UUID definitions
rg -l "UUID" | grep -i "migration"

# Check for any UUID column definitions in models
ast-grep --pattern 'Column($_, UUID, $$$)'

Length of output: 347


Script:

#!/bin/bash
# Let's check the original UUID import and usage in the Data model
rg -A 5 "class Data" cognee/modules/data/models/Data.py

# Check for any other models using UUID to understand the pattern
rg "from.*UUID" --type py

Length of output: 5275

Comment on lines +11 to +13
value = Column(Text)
query_id = Column(UUID)
user_id = Column(UUID, index = True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding a foreign key constraint for query_id.

The query_id field appears to reference another table but lacks a foreign key constraint. This could lead to data integrity issues.

-    query_id = Column(UUID)
+    query_id = Column(UUID, ForeignKey("queries.id"), nullable=False)

Also, fix PEP 8 spacing in user_id definition:

-    user_id = Column(UUID, index = True)
+    user_id = Column(UUID, index=True)

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1 to +16
from datetime import datetime, timezone
from uuid import uuid4
from sqlalchemy import Column, DateTime, Text, UUID
from cognee.infrastructure.databases.relational import Base

class Result(Base):
__tablename__ = "results"

id = Column(UUID, primary_key = True, default = uuid4)

value = Column(Text)
query_id = Column(UUID)
user_id = Column(UUID, index = True)

created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add a foreign key constraint for query_id referencing Query.id

The Query model exists at cognee/modules/search/models/Query.py with a matching UUID primary key, but the Result model's query_id column lacks a proper foreign key constraint. You should modify the query_id column definition to:

query_id = Column(UUID, ForeignKey("queries.id"))
  • Location: cognee/modules/search/models/Result.py, line 12
🔗 Analysis chain

Verify Query model exists for the foreign key relationship.


Let me gather more information about the Query model structure to verify the relationship properly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Query model exists and its structure

# Look for Query model definition
echo "Searching for Query model definition..."
rg -l "class Query.*Base" --type py

# Check Query model structure
echo "Checking Query model structure..."
ast-grep --pattern 'class Query {
  $$$
  id = Column($$$)
  $$$
}'

Length of output: 311


Script:

#!/bin/bash
# Get the content of Query model to verify its structure
echo "Content of Query model:"
cat cognee/modules/search/models/Query.py

# Check if there are any existing relationships between Query and Result
echo -e "\nChecking for existing relationships:"
rg "relationship|ForeignKey.*results|ForeignKey.*queries" --type py

Length of output: 11320

Comment on lines +5 to +6
from sqlalchemy import Column, Text, DateTime, UUID
from cognee.infrastructure.databases.relational import Base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Standardize UUID imports across remaining model files

Two model files still use the local UUID import:

  • cognee/modules/pipelines/models/TaskRun.py
  • cognee/modules/pipelines/models/Task.py

These files should be updated to import UUID directly from SQLAlchemy for consistency with the rest of the codebase.

🔗 Analysis chain

LGTM! Good standardization of UUID type handling.

The change to import UUID directly from SQLAlchemy instead of a local module is a good practice as it:

  • Uses SQLAlchemy's native type system
  • Ensures consistent UUID handling across different database backends
  • Reduces dependency on local implementations

Let's verify this change is consistently applied across all model files:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining local UUID imports in model files
# and verify all UUID column definitions are using SQLAlchemy's UUID type

# Check for any remaining imports from local UUID module
echo "Checking for remaining local UUID imports..."
rg "from cognee\.infrastructure\.databases\.relational import.*UUID" "cognee/modules"

# Verify UUID column definitions in model files
echo "Checking UUID column definitions..."
rg "Column\(UUID" "cognee/modules"

Length of output: 3343

@0xideas 0xideas force-pushed the feat/COG-341-log-interactions branch 2 times, most recently from 4efdeda to 09f27e8 Compare November 14, 2024 09:46
@0xideas 0xideas force-pushed the feat/COG-341-log-interactions branch from bcf9965 to e1a3241 Compare November 14, 2024 10:12
Copy link
Contributor

@0xideas 0xideas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions on naming, but looks good to me overall!

@@ -14,15 +17,17 @@ class SearchType(Enum):
INSIGHTS = "INSIGHTS"
CHUNKS = "CHUNKS"

async def search(search_type: SearchType, query: str, user: User = None) -> list:
async def search(query_type: SearchType, query_text: str, user: User = None) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe rename SearchType to QueryType?

from cognee.infrastructure.databases.relational import Base

class Result(Base):
__tablename__ = "results"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename results to query_results?

Copy link

gitguardian bot commented Nov 17, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
13947935 Triggered Generic High Entropy Secret 0e14b0c cognee/shared/utils.py View secret
9573981 Triggered Generic Password 8b2f8c2 notebooks/cognee_llama_index.ipynb View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@borisarzentar borisarzentar merged commit d8b6eed into main Nov 17, 2024
27 of 28 checks passed
@borisarzentar borisarzentar deleted the feat/COG-341-log-interactions branch November 17, 2024 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants