1.8 KiB
1.8 KiB
CLM System Architecture Design
Design Patterns
1. Monolithic Architecture
Single FastAPI application with modular components:
- Document Ingestion Module: Handles multiple file formats (PDF, DOCX, TXT)
- RAG Module: Manages vector embeddings and retrieval
- AI Agent Module: Daily contract monitoring and reporting
- Chatbot Module: User interface for contract queries
2. Direct File Operations
- Simple utility functions for file I/O
- Direct file system operations for document storage
- No abstraction layer needed for this scope
3. Direct File Processing
- Simple file type detection and processing functions
- Direct embedding generation using selected model
4. Strategy Pattern
ChunkingStrategy: Basic fixed-size chunkingEmbeddingModel: Single model (OpenAI or local)
5. Chain of Responsibility
- Document processing pipeline:
FileValidator→ 2.OCRProcessor→ 3.TextExtractor→ 4.Chunker→ 5.Embedder→ 6.VectorStore
6. Singleton Pattern
ConfigurationManager: Global config accessVectorDatabaseConnection: Single connectionLogger: Basic error logging
Data Flow
- Document Ingestion: File → Validation → Processing → Storage
- Query Processing: User Query → RAG Pipeline → Context Retrieval → Response Generation
- Daily Monitoring: Scheduled Trigger → Contract Scan → Conflict Detection → Report Generation
Technology Stack
- Framework: FastAPI (async support, automatic docs)
- Vector DB: ChromaDB (lightweight, easy setup)
- LLM Framework: LangChain
- Container: Docker + Docker Compose
Implementation Priority
- Document ingestion and indexing
- Basic RAG pipeline
- AI agent for daily reports
- Simple chatbot interface
- Document similarity function