2.0 KiB
2.0 KiB
CLM System - Agent Guidelines
Important Notes
- Always use
uvwith--activeflag for dependency management - Read docs from context7 whenever in doubt or needs confirmation on how to do things the right way
Build/Run Commands
# Install dependencies
uv add --active streamlit langchain langchain-community pypdf2 python-docx pytesseract lancedb
# Run Streamlit app
streamlit run app.py
# Manual scan
python scripts/manual_scan.py
# Generate reports
python scripts/generate_reports.py
Code Style
- Framework: Streamlit + LangChain + LanceDB
- Structure: Monolithic with modular components in
src/ - Imports: Standard library first, then third-party, then local modules
- Naming: snake_case for functions/variables, PascalCase for classes
- Error Handling: Use try/except blocks with logging to
Loggersingleton - Types: Use type hints where beneficial, focus on readability
Key Patterns
- Document Processing Pipeline: FileValidator → OCRProcessor → TextExtractor → Chunker → Embedder → VectorStore
- Singletons: ConfigurationManager, VectorDatabaseConnection, Logger
- Strategy Pattern: ChunkingStrategy (basic fixed-size), EmbeddingModel (single model)
- Direct File Operations: Simple utility functions for file I/O
Testing
# Run basic tests
python -m pytest tests/
# Test single component
python -m pytest tests/test_ingestion.py -v
Linting and Type Checking
# Run ruff linter (auto-fix issues)
ruff check --fix .
# Run pyright type checker
pyright
# Run both after making changes
cd clm-system && ruff check --fix . && pyright
Vector DB Choice
Use LanceDB - lightweight, local, no server setup required for this scope
STRICT RULES
- Do not make
sys.path.appendfixes to any code. Always understand where you are executing codes from. - Do not make use of
pathliboros.pathalways useimportlib.resourcesand define resources inpyproject.toml.