Files
clm-system/PLANNING/design.md

1.8 KiB

CLM System Architecture Design

Design Patterns

1. Monolithic Architecture

Single FastAPI application with modular components:

  • Document Ingestion Module: Handles multiple file formats (PDF, DOCX, TXT)
  • RAG Module: Manages vector embeddings and retrieval
  • AI Agent Module: Daily contract monitoring and reporting
  • Chatbot Module: User interface for contract queries

2. Direct File Operations

  • Simple utility functions for file I/O
  • Direct file system operations for document storage
  • No abstraction layer needed for this scope

3. Direct File Processing

  • Simple file type detection and processing functions
  • Direct embedding generation using selected model

4. Strategy Pattern

  • ChunkingStrategy: Basic fixed-size chunking
  • EmbeddingModel: Single model (OpenAI or local)

5. Chain of Responsibility

  • Document processing pipeline:
    1. FileValidator → 2. OCRProcessor → 3. TextExtractor → 4. Chunker → 5. Embedder → 6. VectorStore

6. Singleton Pattern

  • ConfigurationManager: Global config access
  • VectorDatabaseConnection: Single connection
  • Logger: Basic error logging

Data Flow

  1. Document Ingestion: File → Validation → Processing → Storage
  2. Query Processing: User Query → RAG Pipeline → Context Retrieval → Response Generation
  3. Daily Monitoring: Scheduled Trigger → Contract Scan → Conflict Detection → Report Generation

Technology Stack

  • Framework: FastAPI (async support, automatic docs)
  • Vector DB: ChromaDB (lightweight, easy setup)
  • LLM Framework: LangChain
  • Container: Docker + Docker Compose

Implementation Priority

  1. Document ingestion and indexing
  2. Basic RAG pipeline
  3. AI agent for daily reports
  4. Simple chatbot interface
  5. Document similarity function