This documentation presents the Graph-based Framework for NLP, a system enhancing NLP through graph techniques, inspired by GNNs and graph transformer models. It features a dynamic graph where entities as nodes are linked by contextual relevance, offering improved relationship and context understanding beyond linear text analysis. The database, built from three years of detailed data collection, ensures quality. The framework includes Command Line Interfaces (CLI) for straightforward database management and supports customizable graph search algorithms for flexible data exploration. This integration provides a powerful tool for NLP, combining graph-based analysis with dynamic algorithm customization for diverse analytical needs.
Warning
This project is already available for use but is currently under review for improvements.
The following documentation provides an exhaustive study of all components of the system, especially those related with the CLIs interaction.
-
Database Preparation
Cleaning Initial Node Database
-
SK Components Dessign
Design of Node ClassDesign of NodeSetDesign of Graph
-
Implementation of Search Algorithms
Design of Centrality AlgorithmDevelopment of Density Search AlgorithmOptional Design of Shortest Path Algorithm
-
Organizing All Files into a Single Directory -
Development of Main Command Line Interface (CLI)
Implementation of Basic Commands (cd, ls, etc.)Development of Helper FunctionsCreation of User Interfaces for Specific FunctionsDesign of Auxiliary CLIsLS_InterfaceVG_InterfaceNW_InterfaceGB_Interface
Integration of Centrality, Density Search, and Shortest Path AlgorithmsEdge cases testing.
-
AI for data processing
-
Study existing heuristic methods applicable to network analysis.
-
Create a test corpus aligned with the principles of "atomicity" and "proximity".
-
Large Language Model (LLM) Integration
- Research and select an appropriate LLM for the task.
- Dessign the display to rate the success (granularity + contextual placement).
- Testing and Validation
- Implement the heuristic model on a subset of data.
- Monitor performance and adjust prompting to maximize success rate.
-
-
CLI Integration
- Dessign a pipeline that connects AI generative capabilities with a gateway on CLI.
- Test success rate with updated tooling.