new models
This commit is contained in:
45
VISION.md
45
VISION.md
@@ -6,9 +6,9 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
|
||||
| Model Name | Alias |
|
||||
|------------------|-----------------|
|
||||
| `opus_4.5` | Opus 4.5 |
|
||||
| `opus_4.6` | Opus 4.6 |
|
||||
| `gpt_5.2` | GPT 5.2 |
|
||||
| `gpt_5.2_codex` | GPT 5.2 Codex |
|
||||
| `gpt_5.3_codex` | GPT 5.3 Codex |
|
||||
| `kimi_k2.5` | Kimi k2.5 |
|
||||
|
||||
## Model Selection Criteria
|
||||
@@ -16,11 +16,12 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
| Role | Recommended Model | Reason |
|
||||
|-----------------------------|---------------------------|-------------------------------------|
|
||||
| The workhorse | `kimi_k2.5` | Fast and cost-effective |
|
||||
| The critic | `opus_4.5` | Good at reviewing and finding issues|
|
||||
| The critic | `opus_4.6` | Good at reviewing and finding issues|
|
||||
| The brainy one | `gpt_5.2` | Highest code intelligence |
|
||||
| The coder | `gpt_5.2_codex` | Specialized for code generation |
|
||||
| The coder | `gpt_5.3_codex` | Specialized for code generation |
|
||||
| The fast one | `kimi_k2.5` | Fastest response time |
|
||||
| Good Instructions Following | `kimi_k2.5`, `gpt_5.2_codex` | Strong adherence to requirements |
|
||||
| Good Instructions Following | `kimi_k2.5`, `gpt_5.3_codex` | Strong adherence to requirements |
|
||||
| The vision model | `kimi_k2.5` | Fast vision processing |
|
||||
|
||||
## Coding Task Breakdown
|
||||
|
||||
@@ -33,13 +34,13 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
|
||||
## Model Rejection Criteria
|
||||
|
||||
### `gpt_5.2` and `gpt_5.2_codex`
|
||||
### `gpt_5.2` and `gpt_5.3_codex`
|
||||
|
||||
- Too slow and expensive for the workhorse role
|
||||
- Not at all suggested for exploration or tool calls
|
||||
- Strictly for planning/spec gen and code gen
|
||||
|
||||
### `opus_4.5`
|
||||
### `opus_4.6`
|
||||
|
||||
- Very buggy and looks for shortcuts in code gen
|
||||
- Can be a good critic and reviewer
|
||||
@@ -57,9 +58,9 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
|
||||
| Rank | Model |
|
||||
|------|------------------|
|
||||
| 1 | `opus_4.5` |
|
||||
| 1 | `opus_4.6` |
|
||||
| 2 | `gpt_5.2` |
|
||||
| 3 | `gpt_5.2_codex` |
|
||||
| 3 | `gpt_5.3_codex` |
|
||||
| 4 | `kimi_k2.5` |
|
||||
|
||||
### Speed (Fast to Slow)
|
||||
@@ -67,8 +68,8 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
| Rank | Model |
|
||||
|------|------------------|
|
||||
| 1 | `kimi_k2.5` |
|
||||
| 2 | `opus_4.5` |
|
||||
| 3 | `gpt_5.2_codex` |
|
||||
| 2 | `opus_4.6` |
|
||||
| 3 | `gpt_5.3_codex` |
|
||||
| 4 | `gpt_5.2` |
|
||||
|
||||
### Code Intelligence (High to Low)
|
||||
@@ -76,8 +77,8 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
| Rank | Model |
|
||||
|------|------------------|
|
||||
| 1 | `gpt_5.2` |
|
||||
| 2 | `gpt_5.2_codex` |
|
||||
| 3 | `opus_4.5` |
|
||||
| 2 | `gpt_5.3_codex` |
|
||||
| 3 | `opus_4.6` |
|
||||
| 4 | `kimi_k2.5` |
|
||||
|
||||
### Overthinking (High to Low)
|
||||
@@ -85,6 +86,20 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
|
||||
| Rank | Model |
|
||||
|------|------------------|
|
||||
| 1 | `gpt_5.2` |
|
||||
| 2 | `gpt_5.2_codex` |
|
||||
| 3 | `opus_4.5` |
|
||||
| 2 | `gpt_5.3_codex` |
|
||||
| 3 | `opus_4.6` |
|
||||
| 4 | `kimi_k2.5` |
|
||||
|
||||
## Flow
|
||||
|
||||
-> Start with good instruction follower (kimi_k2.5 or gpt_5.3_codex).
|
||||
User asks a question or give a task.
|
||||
-> Make a todo list.
|
||||
-> exploration is always needed. launch multiple explorer droid with kimi_k2.5 asking question in natural language.
|
||||
-> After exploration, evaluate context with spec droid with gpt_5.2.
|
||||
-> Confirm spec with user.
|
||||
-> For code gen, use gpt_5.3_codex for large code gen, or kimi_k2.5 for small code gen.
|
||||
-> After code gen, run quality check droid with kimi_k2.5.
|
||||
-> Run review droid with opus_4.6 to find bugs and issues.
|
||||
-> Run build/test/run droid with kimi_k2.5.
|
||||
-> Provide summary
|
||||
|
||||
Reference in New Issue
Block a user