# CLM System - Agent Guidelines ## Important Notes - **Always use `uv` with `--active` flag** for dependency management - **Read docs from context7** whenever in doubt or needs confirmation on how to do things the right way ## Build/Run Commands ```bash # Install dependencies uv add --active streamlit langchain langchain-community pypdf2 python-docx pytesseract lancedb # Run Streamlit app streamlit run app.py # Manual scan python scripts/manual_scan.py # Generate reports python scripts/generate_reports.py ``` ## Code Style - **Framework**: Streamlit + LangChain + LanceDB - **Structure**: Monolithic with modular components in `src/` - **Imports**: Standard library first, then third-party, then local modules - **Naming**: snake_case for functions/variables, PascalCase for classes - **Error Handling**: Use try/except blocks with logging to `Logger` singleton - **Types**: Use type hints where beneficial, focus on readability ## Key Patterns - **Document Processing Pipeline**: FileValidator → OCRProcessor → TextExtractor → Chunker → Embedder → VectorStore - **Singletons**: ConfigurationManager, VectorDatabaseConnection, Logger - **Strategy Pattern**: ChunkingStrategy (basic fixed-size), EmbeddingModel (single model) - **Direct File Operations**: Simple utility functions for file I/O ## Testing ```bash # Run basic tests python -m pytest tests/ # Test single component python -m pytest tests/test_ingestion.py -v ``` ## Linting and Type Checking ```bash # Run ruff linter (auto-fix issues) ruff check --fix . # Run pyright type checker pyright # Run both after making changes cd clm-system && ruff check --fix . && pyright ``` ## Vector DB Choice Use LanceDB - lightweight, local, no server setup required for this scope # STRICT RULES - Do not make `sys.path.append` fixes to any code. Always understand where you are executing codes from. - Do not make use of `pathlib` or `os.path` always use `importlib.resources` and define resources in `pyproject.toml`.