64 lines
2.0 KiB
Markdown
64 lines
2.0 KiB
Markdown
# CLM System - Agent Guidelines
|
|
|
|
## Important Notes
|
|
- **Always use `uv` with `--active` flag** for dependency management
|
|
- **Read docs from context7** whenever in doubt or needs confirmation on how to do things the right way
|
|
|
|
## Build/Run Commands
|
|
```bash
|
|
# Install dependencies
|
|
uv add --active streamlit langchain langchain-community pypdf2 python-docx pytesseract lancedb
|
|
|
|
# Run Streamlit app
|
|
streamlit run app.py
|
|
|
|
# Manual scan
|
|
python scripts/manual_scan.py
|
|
|
|
# Generate reports
|
|
python scripts/generate_reports.py
|
|
```
|
|
|
|
## Code Style
|
|
- **Framework**: Streamlit + LangChain + LanceDB
|
|
- **Structure**: Monolithic with modular components in `src/`
|
|
- **Imports**: Standard library first, then third-party, then local modules
|
|
- **Naming**: snake_case for functions/variables, PascalCase for classes
|
|
- **Error Handling**: Use try/except blocks with logging to `Logger` singleton
|
|
- **Types**: Use type hints where beneficial, focus on readability
|
|
|
|
## Key Patterns
|
|
- **Document Processing Pipeline**: FileValidator → OCRProcessor → TextExtractor → Chunker → Embedder → VectorStore
|
|
- **Singletons**: ConfigurationManager, VectorDatabaseConnection, Logger
|
|
- **Strategy Pattern**: ChunkingStrategy (basic fixed-size), EmbeddingModel (single model)
|
|
- **Direct File Operations**: Simple utility functions for file I/O
|
|
|
|
## Testing
|
|
```bash
|
|
# Run basic tests
|
|
python -m pytest tests/
|
|
|
|
# Test single component
|
|
python -m pytest tests/test_ingestion.py -v
|
|
```
|
|
|
|
## Linting and Type Checking
|
|
```bash
|
|
# Run ruff linter (auto-fix issues)
|
|
ruff check --fix .
|
|
|
|
# Run pyright type checker
|
|
pyright
|
|
|
|
# Run both after making changes
|
|
cd clm-system && ruff check --fix . && pyright
|
|
```
|
|
|
|
## Vector DB Choice
|
|
Use LanceDB - lightweight, local, no server setup required for this scope
|
|
|
|
|
|
# STRICT RULES
|
|
- Do not make `sys.path.append` fixes to any code. Always understand where you are executing codes from.
|
|
- Do not make use of `pathlib` or `os.path` always use `importlib.resources` and define resources in `pyproject.toml`.
|