# CLM System Architecture Design

## Design Patterns

### 1. Monolithic Architecture
Single FastAPI application with modular components:
- **Document Ingestion Module**: Handles multiple file formats (PDF, DOCX, TXT)
- **RAG Module**: Manages vector embeddings and retrieval
- **AI Agent Module**: Daily contract monitoring and reporting
- **Chatbot Module**: User interface for contract queries

### 2. Direct File Operations
- Simple utility functions for file I/O
- Direct file system operations for document storage
- No abstraction layer needed for this scope

### 3. Direct File Processing
- Simple file type detection and processing functions
- Direct embedding generation using selected model

### 4. Strategy Pattern
- `ChunkingStrategy`: Basic fixed-size chunking
- `EmbeddingModel`: Single model (OpenAI or local)

### 5. Chain of Responsibility
- Document processing pipeline:
  1. `FileValidator` → 2. `OCRProcessor` → 3. `TextExtractor` → 4. `Chunker` → 5. `Embedder` → 6. `VectorStore`

### 6. Singleton Pattern
- `ConfigurationManager`: Global config access
- `VectorDatabaseConnection`: Single connection
- `Logger`: Basic error logging

## Data Flow

1. **Document Ingestion**: File → Validation → Processing → Storage
2. **Query Processing**: User Query → RAG Pipeline → Context Retrieval → Response Generation
3. **Daily Monitoring**: Scheduled Trigger → Contract Scan → Conflict Detection → Report Generation

## Technology Stack

- **Framework**: FastAPI (async support, automatic docs)
- **Vector DB**: ChromaDB (lightweight, easy setup)
- **LLM Framework**: LangChain
- **Container**: Docker + Docker Compose

## Implementation Priority

1. Document ingestion and indexing
2. Basic RAG pipeline
3. AI agent for daily reports
4. Simple chatbot interface
5. Document similarity function