Files
clm-system/AGENTS.md

64 lines
2.0 KiB
Markdown

# CLM System - Agent Guidelines
## Important Notes
- **Always use `uv` with `--active` flag** for dependency management
- **Read docs from context7** whenever in doubt or needs confirmation on how to do things the right way
## Build/Run Commands
```bash
# Install dependencies
uv add --active streamlit langchain langchain-community pypdf2 python-docx pytesseract lancedb
# Run Streamlit app
streamlit run app.py
# Manual scan
python scripts/manual_scan.py
# Generate reports
python scripts/generate_reports.py
```
## Code Style
- **Framework**: Streamlit + LangChain + LanceDB
- **Structure**: Monolithic with modular components in `src/`
- **Imports**: Standard library first, then third-party, then local modules
- **Naming**: snake_case for functions/variables, PascalCase for classes
- **Error Handling**: Use try/except blocks with logging to `Logger` singleton
- **Types**: Use type hints where beneficial, focus on readability
## Key Patterns
- **Document Processing Pipeline**: FileValidator → OCRProcessor → TextExtractor → Chunker → Embedder → VectorStore
- **Singletons**: ConfigurationManager, VectorDatabaseConnection, Logger
- **Strategy Pattern**: ChunkingStrategy (basic fixed-size), EmbeddingModel (single model)
- **Direct File Operations**: Simple utility functions for file I/O
## Testing
```bash
# Run basic tests
python -m pytest tests/
# Test single component
python -m pytest tests/test_ingestion.py -v
```
## Linting and Type Checking
```bash
# Run ruff linter (auto-fix issues)
ruff check --fix .
# Run pyright type checker
pyright
# Run both after making changes
cd clm-system && ruff check --fix . && pyright
```
## Vector DB Choice
Use LanceDB - lightweight, local, no server setup required for this scope
# STRICT RULES
- Do not make `sys.path.append` fixes to any code. Always understand where you are executing codes from.
- Do not make use of `pathlib` or `os.path` always use `importlib.resources` and define resources in `pyproject.toml`.