106 lines
3.3 KiB
Markdown
106 lines
3.3 KiB
Markdown
# VISION
|
|
|
|
Need a skill for factory droid which can launch `droid exec` for multiple things.
|
|
|
|
## Available Models
|
|
|
|
| Model Name | Alias |
|
|
|------------------|-----------------|
|
|
| `opus_4.6` | Opus 4.6 |
|
|
| `gpt_5.2` | GPT 5.2 |
|
|
| `gpt_5.3_codex` | GPT 5.3 Codex |
|
|
| `kimi_k2.5` | Kimi k2.5 |
|
|
|
|
## Model Selection Criteria
|
|
|
|
| Role | Recommended Model | Reason |
|
|
|-----------------------------|---------------------------|-------------------------------------|
|
|
| The workhorse | `kimi_k2.5` | Fast and cost-effective |
|
|
| The critic | `opus_4.6` | Good at reviewing and finding issues|
|
|
| The brainy one | `gpt_5.2` | Highest code intelligence |
|
|
| The coder | `gpt_5.3_codex` | Specialized for code generation |
|
|
| The fast one | `kimi_k2.5` | Fastest response time |
|
|
| Good Instructions Following | `kimi_k2.5`, `gpt_5.3_codex` | Strong adherence to requirements |
|
|
| The vision model | `kimi_k2.5` | Fast vision processing |
|
|
|
|
## Coding Task Breakdown
|
|
|
|
1. Code exploration
|
|
2. Planning/spec generation
|
|
3. Code generation
|
|
4. Formatting, linting, typecheck and other quality checks
|
|
5. Review and find bugs
|
|
6. Build or test or run the code
|
|
|
|
## Model Rejection Criteria
|
|
|
|
### `gpt_5.2` and `gpt_5.3_codex`
|
|
|
|
- Too slow and expensive for the workhorse role
|
|
- Not at all suggested for exploration or tool calls
|
|
- Strictly for planning/spec gen and code gen
|
|
|
|
### `opus_4.6`
|
|
|
|
- Very buggy and looks for shortcuts in code gen
|
|
- Can be a good critic and reviewer
|
|
- Never use for code gen
|
|
|
|
### `kimi_k2.5`
|
|
|
|
- OK in all areas and fast
|
|
- Never be primary for large code gen
|
|
- Can be used for a second opinion
|
|
|
|
## Model Performance Comparison
|
|
|
|
### Cost (High to Low)
|
|
|
|
| Rank | Model |
|
|
|------|------------------|
|
|
| 1 | `opus_4.6` |
|
|
| 2 | `gpt_5.2` |
|
|
| 3 | `gpt_5.3_codex` |
|
|
| 4 | `kimi_k2.5` |
|
|
|
|
### Speed (Fast to Slow)
|
|
|
|
| Rank | Model |
|
|
|------|------------------|
|
|
| 1 | `kimi_k2.5` |
|
|
| 2 | `opus_4.6` |
|
|
| 3 | `gpt_5.3_codex` |
|
|
| 4 | `gpt_5.2` |
|
|
|
|
### Code Intelligence (High to Low)
|
|
|
|
| Rank | Model |
|
|
|------|------------------|
|
|
| 1 | `gpt_5.2` |
|
|
| 2 | `gpt_5.3_codex` |
|
|
| 3 | `opus_4.6` |
|
|
| 4 | `kimi_k2.5` |
|
|
|
|
### Overthinking (High to Low)
|
|
|
|
| Rank | Model |
|
|
|------|------------------|
|
|
| 1 | `gpt_5.2` |
|
|
| 2 | `gpt_5.3_codex` |
|
|
| 3 | `opus_4.6` |
|
|
| 4 | `kimi_k2.5` |
|
|
|
|
## Flow
|
|
|
|
-> Start with good instruction follower (kimi_k2.5 or gpt_5.3_codex).
|
|
User asks a question or give a task.
|
|
-> Make a todo list.
|
|
-> exploration is always needed. launch multiple explorer droid with kimi_k2.5 asking question in natural language.
|
|
-> After exploration, evaluate context with spec droid with gpt_5.2.
|
|
-> Confirm spec with user.
|
|
-> For code gen, use gpt_5.3_codex for large code gen, or kimi_k2.5 for small code gen.
|
|
-> After code gen, run quality check droid with kimi_k2.5.
|
|
-> Run review droid with opus_4.6 to find bugs and issues.
|
|
-> Run build/test/run droid with kimi_k2.5.
|
|
-> Provide summary
|