Initial implementation by kimi k2 0905

2025-09-06 10:47:22 +05:30
commit bfb761238f
35 changed files with 8037 additions and 0 deletions
--- a/PLANNING/design.md
+++ b/PLANNING/design.md
@@ -0,0 +1,53 @@
+# CLM System Architecture Design
+
+## Design Patterns
+
+### 1. Monolithic Architecture
+Single FastAPI application with modular components:
+- **Document Ingestion Module**: Handles multiple file formats (PDF, DOCX, TXT)
+- **RAG Module**: Manages vector embeddings and retrieval
+- **AI Agent Module**: Daily contract monitoring and reporting
+- **Chatbot Module**: User interface for contract queries
+
+### 2. Direct File Operations
+- Simple utility functions for file I/O
+- Direct file system operations for document storage
+- No abstraction layer needed for this scope
+
+### 3. Direct File Processing
+- Simple file type detection and processing functions
+- Direct embedding generation using selected model
+
+### 4. Strategy Pattern
+- `ChunkingStrategy`: Basic fixed-size chunking
+- `EmbeddingModel`: Single model (OpenAI or local)
+
+### 5. Chain of Responsibility
+- Document processing pipeline:
+  1. `FileValidator` → 2. `OCRProcessor` → 3. `TextExtractor` → 4. `Chunker` → 5. `Embedder` → 6. `VectorStore`
+
+### 6. Singleton Pattern
+- `ConfigurationManager`: Global config access
+- `VectorDatabaseConnection`: Single connection
+- `Logger`: Basic error logging
+
+## Data Flow
+
+1. **Document Ingestion**: File → Validation → Processing → Storage
+2. **Query Processing**: User Query → RAG Pipeline → Context Retrieval → Response Generation
+3. **Daily Monitoring**: Scheduled Trigger → Contract Scan → Conflict Detection → Report Generation
+
+## Technology Stack
+
+- **Framework**: FastAPI (async support, automatic docs)
+- **Vector DB**: ChromaDB (lightweight, easy setup)
+- **LLM Framework**: LangChain
+- **Container**: Docker + Docker Compose
+
+## Implementation Priority
+
+1. Document ingestion and indexing
+2. Basic RAG pipeline
+3. AI agent for daily reports
+4. Simple chatbot interface
+5. Document similarity function