# CLM System Architecture Design ## Design Patterns ### 1. Monolithic Architecture Single FastAPI application with modular components: - **Document Ingestion Module**: Handles multiple file formats (PDF, DOCX, TXT) - **RAG Module**: Manages vector embeddings and retrieval - **AI Agent Module**: Daily contract monitoring and reporting - **Chatbot Module**: User interface for contract queries ### 2. Direct File Operations - Simple utility functions for file I/O - Direct file system operations for document storage - No abstraction layer needed for this scope ### 3. Direct File Processing - Simple file type detection and processing functions - Direct embedding generation using selected model ### 4. Strategy Pattern - `ChunkingStrategy`: Basic fixed-size chunking - `EmbeddingModel`: Single model (OpenAI or local) ### 5. Chain of Responsibility - Document processing pipeline: 1. `FileValidator` → 2. `OCRProcessor` → 3. `TextExtractor` → 4. `Chunker` → 5. `Embedder` → 6. `VectorStore` ### 6. Singleton Pattern - `ConfigurationManager`: Global config access - `VectorDatabaseConnection`: Single connection - `Logger`: Basic error logging ## Data Flow 1. **Document Ingestion**: File → Validation → Processing → Storage 2. **Query Processing**: User Query → RAG Pipeline → Context Retrieval → Response Generation 3. **Daily Monitoring**: Scheduled Trigger → Contract Scan → Conflict Detection → Report Generation ## Technology Stack - **Framework**: FastAPI (async support, automatic docs) - **Vector DB**: ChromaDB (lightweight, easy setup) - **LLM Framework**: LangChain - **Container**: Docker + Docker Compose ## Implementation Priority 1. Document ingestion and indexing 2. Basic RAG pipeline 3. AI agent for daily reports 4. Simple chatbot interface 5. Document similarity function