Files
clm-system/AGENTS.md

2.0 KiB

CLM System - Agent Guidelines

Important Notes

  • Always use uv with --active flag for dependency management
  • Read docs from context7 whenever in doubt or needs confirmation on how to do things the right way

Build/Run Commands

# Install dependencies
uv add --active streamlit langchain langchain-community pypdf2 python-docx pytesseract lancedb

# Run Streamlit app
streamlit run app.py

# Manual scan
python scripts/manual_scan.py

# Generate reports
python scripts/generate_reports.py

Code Style

  • Framework: Streamlit + LangChain + LanceDB
  • Structure: Monolithic with modular components in src/
  • Imports: Standard library first, then third-party, then local modules
  • Naming: snake_case for functions/variables, PascalCase for classes
  • Error Handling: Use try/except blocks with logging to Logger singleton
  • Types: Use type hints where beneficial, focus on readability

Key Patterns

  • Document Processing Pipeline: FileValidator → OCRProcessor → TextExtractor → Chunker → Embedder → VectorStore
  • Singletons: ConfigurationManager, VectorDatabaseConnection, Logger
  • Strategy Pattern: ChunkingStrategy (basic fixed-size), EmbeddingModel (single model)
  • Direct File Operations: Simple utility functions for file I/O

Testing

# Run basic tests
python -m pytest tests/

# Test single component
python -m pytest tests/test_ingestion.py -v

Linting and Type Checking

# Run ruff linter (auto-fix issues)
ruff check --fix .

# Run pyright type checker
pyright

# Run both after making changes
cd clm-system && ruff check --fix . && pyright

Vector DB Choice

Use LanceDB - lightweight, local, no server setup required for this scope

STRICT RULES

  • Do not make sys.path.append fixes to any code. Always understand where you are executing codes from.
  • Do not make use of pathlib or os.path always use importlib.resources and define resources in pyproject.toml.