CLM System - Low Level Design

Minimal Folder Structure (Python + Streamlit)

clm-system/
├── app.py                 # Main Streamlit chat interface
├── requirements.txt       # Dependencies
├── config.py             # Configuration settings
├── data/                 # Synthetic contract documents
│   ├── contracts/        # PDF, DOCX, TXT files
│   └── metadata/         # Document metadata
├── src/
│   ├── __init__.py
│   ├── ingestion.py       # Document processing & indexing
│   ├── rag.py            # RAG pipeline
│   ├── agent.py          # Manual trigger agent
│   └── utils.py          # Helper functions
├── scripts/
│   ├── manual_scan.py     # Manual trigger script
│   └── generate_reports.py # Report generation script
└── tests/                # Basic tests
    └── test_ingestion.py

Setup Instructions

Create the module with: uv init clm-system --module

Core Components

1. Streamlit Interface (app.py)

Chat interface for contract queries
Document similarity search
Upload new contracts
Manual trigger button for daily scan

2. Document Ingestion (src/ingestion.py)

File validation and type detection
OCR for scanned PDFs
Text extraction from PDF/DOCX/TXT
LanceDB vector storage
Basic chunking strategy

3. RAG Pipeline (src/rag.py)

LangChain retrieval
Context-aware querying
Source citation (document name, page)
Embedding generation

4. Manual Agent (src/agent.py)

Manual trigger via script
Expiration date detection (30-day alert)
Conflict identification
Email report generation

5. Manual Triggers

scripts/manual_scan.py: Run daily scan
scripts/generate_reports.py: Generate reports
Both can be run via cron or manually

Technology Stack

Framework: Streamlit (chat interface)
Vector DB: LanceDB (lightweight, local)
LLM Framework: LangChain
File Processing: PyPDF2, python-docx
OCR: pytesseract
Email: smtplib

Data Flow

Ingestion: File → Validation → Processing → LanceDB
Query: User Input → RAG → Context Retrieval → Response
Manual Scan: Trigger → Contract Scan → Analysis → Email Report

2.2 KiB Raw Permalink Blame History

CLM System - Low Level Design

Minimal Folder Structure (Python + Streamlit)

Setup Instructions

Core Components

1. Streamlit Interface (app.py)

2. Document Ingestion (src/ingestion.py)

3. RAG Pipeline (src/rag.py)

4. Manual Agent (src/agent.py)

5. Manual Triggers

Technology Stack

Data Flow

2.2 KiB

Raw Permalink Blame History