new models

2026-02-06 09:36:51 +05:30
parent 10027abf0b
commit ac213793b0
3 changed files with 261 additions and 15 deletions
--- a/DROIDS.md
+++ b/DROIDS.md
@@ -0,0 +1,138 @@
 # Factory Droids
 A system for orchestrating AI droids to handle complex coding tasks through specialized roles.
 ## Overview
 Factory Droids uses `droid exec` to run AI agents non-interactively, each specializing in different aspects of software development.
 ## Available Commands
 ```bash
 droid exec --help              # Show exec command options (includes model list)
 droid --help                   # Show all droid commands
 droid exec --list-tools        # List available tools for a model
 ```
 > **Tip:** Run `droid exec --help` to see all available models including BYOK custom models.
 ## Quick Start
 ```bash
 # Read-only analysis (default)
 droid exec "analyze the codebase structure"
 # With file input
 droid exec -f prompt.txt
 # With specific model
 droid exec --model custom:kimi-k2.5 "explore the project"
 # Low autonomy - safe file operations
 droid exec --auto low "add JSDoc comments"
 # Medium autonomy - development tasks
 droid exec --auto medium "install deps and run tests"
 # High autonomy - production operations
 droid exec --auto high "fix, test, commit and push"
 ```
 ## Available Models (BYOK)
 | Model ID                          | Name                 | Reasoning |
 |-----------------------------------|----------------------|-----------|
 | `custom:kimi-k2.5`                | Kimi K2.5            | Yes       |
 | `custom:claude-opus-4.6`          | Claude Opus 4.6      | Yes       |
 | `custom:gpt-5.3-codex`            | GPT 5.3 Codex        | Yes       |
 | `custom:gpt-5.2`                  | GPT 5.2              | Yes       |
 ## Droid Roles
 | Droid      | Model                         | Purpose                               | Auto Level |
 |------------|-------------------------------|---------------------------------------|------------|
 | Explorer   | `custom:kimi-k2.5`            | Code exploration and research         | high       |
 | Spec       | `custom:gpt-5.2`              | Planning and specification generation | high       |
 | Coder      | `custom:gpt-5.3-codex`        | Large code generation                 | high       |
 | Coder-lite | `custom:kimi-k2.5`            | Small code generation and fixes       | high       |
 | Quality    | `custom:kimi-k2.5`            | Formatting, linting, type checking    | high       |
 | Reviewer   | `custom:claude-opus-4-6`      | Code review and bug finding           | high       |
 | Runner     | `custom:kimi-k2.5`            | Build, test, and execution            | high       |
 ## Workflow
 1. **Start** with a good instruction follower (`custom:kimi-k2.5` or `custom:gpt-5.3-codex`)
 2. **Make** a todo list
 3. **Explore** - Launch multiple explorer droids with `custom:kimi-k2.5` in parallel
 4. **Spec** - Evaluate context with spec droid using `custom:gpt-5.2`
 5. **Confirm** spec with user
 6. **Code** - Use `custom:gpt-5.3-codex` for large code gen, `custom:kimi-k2.5` for small
 7. **Quality** - Run quality check droid with `custom:kimi-k2.5 --auto high`
 8. **Review** - Run review droid with `custom:claude-opus-4-6 --auto high`
 9. **Run** - Run build/test droid with `custom:kimi-k2.5 --auto high`
 10. **Summarize** - Provide final summary
 ## Autonomy Levels
 | Level   | Flag           | Description                                           |
 |---------|----------------|-------------------------------------------------------|
 | Default | (none)         | Read-only - safest for reviewing planned changes      |
 | Low     | `--auto low`   | Basic file operations, no system changes              |
 | Medium  | `--auto medium`| Development ops - install packages, build, git local  |
 | High    | `--auto high`  | Production ops - git push, deploy, migrations         |
 | Unsafe  | `--skip-permissions-unsafe` | Bypass all checks - DANGEROUS!         |
 ## Command Options
 ```
 Usage: droid exec [options] [prompt]
 Arguments:
  prompt                      The prompt to execute
 Options:
  -o, --output-format <format>    Output format (default: "text")
  --input-format <format>         Input format: stream-json for multi-turn
  -f, --file <path>               Read prompt from file
  --auto <level>                  Autonomy level: low|medium|high
  --skip-permissions-unsafe       Skip ALL permission checks (unsafe)
  -s, --session-id <id>           Existing session to continue
  -m, --model <id>                Model ID (default: claude-opus-4-5-20251101)
  -r, --reasoning-effort <level>  Reasoning effort (model-specific)
  --enabled-tools <ids>           Enable specific tools
  --disabled-tools <ids>          Disable specific tools
  --cwd <path>                    Working directory path
  --log-group-id <id>             Log group ID for filtering logs
  --list-tools                    List available tools and exit
  -h, --help                      Display help
 ```
 ## Authentication
 Create API key: https://app.factory.ai/settings/api-keys
 ```bash
 export FACTORY_API_KEY=fk-... && droid exec "fix the bug"
 ```
 ## Examples
 ```bash
 # Analysis (read-only)
 droid exec "Review the codebase for security vulnerabilities"
 # Documentation
 droid exec --auto low "add JSDoc comments to all functions"
 droid exec --auto low "fix typos in README.md"
 # Development
 droid exec --auto medium "install deps, run tests, fix issues"
 droid exec --auto medium "update packages and resolve conflicts"
 # Production
 droid exec --auto high "fix bug, test, commit, and push to main"
 droid exec --auto high "deploy to staging after running tests"
 # Continue session
 droid exec -s <session-id> "continue previous task"
 ```
--- a/VISION.md
+++ b/VISION.md
@@ -6,9 +6,9 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Model Name       | Alias           |
 |------------------|-----------------|
-| `opus_4.5`       | Opus 4.5        |
+| `opus_4.6`       | Opus 4.6        |
 | `gpt_5.2`        | GPT 5.2         |
-| `gpt_5.2_codex`  | GPT 5.2 Codex   |
+| `gpt_5.3_codex`  | GPT 5.3 Codex   |
 | `kimi_k2.5`      | Kimi k2.5       |
 ## Model Selection Criteria
@@ -16,11 +16,12 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Role                        | Recommended Model         | Reason                              |
 |-----------------------------|---------------------------|-------------------------------------|
 | The workhorse               | `kimi_k2.5`               | Fast and cost-effective             |
-| The critic                  | `opus_4.5`                | Good at reviewing and finding issues|
+| The critic                  | `opus_4.6`                | Good at reviewing and finding issues|
 | The brainy one              | `gpt_5.2`                 | Highest code intelligence           |
-| The coder                   | `gpt_5.2_codex`           | Specialized for code generation     |
+| The coder                   | `gpt_5.3_codex`           | Specialized for code generation     |
 | The fast one                | `kimi_k2.5`               | Fastest response time               |
-| Good Instructions Following | `kimi_k2.5`, `gpt_5.2_codex` | Strong adherence to requirements |
+| Good Instructions Following | `kimi_k2.5`, `gpt_5.3_codex` | Strong adherence to requirements |
 | The vision model            | `kimi_k2.5`               | Fast vision processing              |
 ## Coding Task Breakdown
@@ -33,13 +34,13 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 ## Model Rejection Criteria
-### `gpt_5.2` and `gpt_5.2_codex`
+### `gpt_5.2` and `gpt_5.3_codex`
 - Too slow and expensive for the workhorse role
 - Not at all suggested for exploration or tool calls
 - Strictly for planning/spec gen and code gen
-### `opus_4.5`
+### `opus_4.6`
 - Very buggy and looks for shortcuts in code gen
 - Can be a good critic and reviewer
@@ -57,9 +58,9 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Rank | Model            |
 |------|------------------|
-| 1    | `opus_4.5`       |
+| 1    | `opus_4.6`       |
 | 2    | `gpt_5.2`        |
-| 3    | `gpt_5.2_codex`  |
+| 3    | `gpt_5.3_codex`  |
 | 4    | `kimi_k2.5`      |
 ### Speed (Fast to Slow)
@@ -67,8 +68,8 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Rank | Model            |
 |------|------------------|
 | 1    | `kimi_k2.5`      |
-| 2    | `opus_4.5`       |
+| 2    | `opus_4.6`       |
-| 3    | `gpt_5.2_codex`  |
+| 3    | `gpt_5.3_codex`  |
 | 4    | `gpt_5.2`        |
 ### Code Intelligence (High to Low)
@@ -76,8 +77,8 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Rank | Model            |
 |------|------------------|
 | 1    | `gpt_5.2`        |
-| 2    | `gpt_5.2_codex`  |
+| 2    | `gpt_5.3_codex`  |
-| 3    | `opus_4.5`       |
+| 3    | `opus_4.6`       |
 | 4    | `kimi_k2.5`      |
 ### Overthinking (High to Low)
@@ -85,6 +86,20 @@ Need a skill for factory droid which can launch `droid exec` for multiple things
 | Rank | Model            |
 |------|------------------|
 | 1    | `gpt_5.2`        |
-| 2    | `gpt_5.2_codex`  |
+| 2    | `gpt_5.3_codex`  |
-| 3    | `opus_4.5`       |
+| 3    | `opus_4.6`       |
 | 4    | `kimi_k2.5`      |
 ## Flow
 -> Start with good instruction follower (kimi_k2.5 or gpt_5.3_codex).
 User asks a question or give a task.
 -> Make a todo list.
 -> exploration is always needed. launch multiple explorer droid with kimi_k2.5 asking question in natural language.
 -> After exploration, evaluate context with spec droid with gpt_5.2.
 -> Confirm spec with user.
 -> For code gen, use gpt_5.3_codex for large code gen, or kimi_k2.5 for small code gen.
 -> After code gen, run quality check droid with kimi_k2.5.
 -> Run review droid with opus_4.6 to find bugs and issues.
 -> Run build/test/run droid with kimi_k2.5.
 -> Provide summary
--- a/settings.json
+++ b/settings.json
@@ -0,0 +1,93 @@
 {
  "logoAnimation": "off",
  "customModels": [
    {
      "model": "Kimi-K2.5",
      "id": "custom:Kimi-K2.5-(BYOK)-0",
      "index": 0,
      "baseUrl": "http://localhost:8383",
      "apiKey": "sk-abcd",
      "displayName": "Kimi K2.5 (BYOK)",
      "maxOutputTokens": 131072,
      "noImageSupport": false,
      "provider": "anthropic"
    },
    {
      "model": "Kimi-for-Coding",
      "id": "custom:Kimi-for-Coding-(BYOK)-1",
      "index": 1,
      "baseUrl": "http://localhost:8383",
      "apiKey": "sk-abcd",
      "displayName": "Kimi for Coding (BYOK)",
      "noImageSupport": false,
      "provider": "anthropic"
    },
    {
      "model": "Opus-4.5",
      "id": "custom:Opus-4.5-(BYOK)-2",
      "index": 2,
      "baseUrl": "http://localhost:8383",
      "apiKey": "sk-abcd",
      "displayName": "Opus 4.5 (BYOK)",
      "maxOutputTokens": 128000,
      "extraArgs": {
        "parallel_tool_calls": true,
        "thinking": {
          "type": "enabled",
          "budget_tokens": 120000
        }
      },
      "noImageSupport": true,
      "provider": "anthropic"
    },
    {
      "model": "Gpt-5.3-Codex",
      "id": "custom:Gpt-5.3-Codex-(BYOK)-3",
      "index": 3,
      "baseUrl": "http://localhost:8383/v1",
      "apiKey": "sk-abcd",
      "displayName": "Gpt 5.3 Codex (BYOK)",
      "maxOutputTokens": 128000,
      "extraArgs": {
        "parallel_tool_calls": true,
        "reasoning": {
          "effort": "xhigh"
        }
      },
      "noImageSupport": true,
      "provider": "openai"
    },
    {
      "model": "Gpt-5.2",
      "id": "custom:Gpt-5.2-(BYOK)-4",
      "index": 4,
      "baseUrl": "http://localhost:8383/v1",
      "apiKey": "sk-abcd",
      "displayName": "Gpt 5.2 (BYOK)",
      "maxOutputTokens": 128000,
      "extraArgs": {
        "parallel_tool_calls": true,
        "reasoning": {
          "effort": "xhigh"
        }
      },
      "noImageSupport": true,
      "provider": "openai"
    }
  ],
  "sessionDefaultSettings": {
    "model": "custom:Gpt-5.3-Codex-(BYOK)-3",
    "autonomyMode": "auto-low",
    "specModeReasoningEffort": "none",
    "reasoningEffort": "none"
  },
  "cloudSessionSync": false,
  "ideAutoConnect": true,
  "includeCoAuthoredByDroid": false,
  "showTokenUsageIndicator": true,
  "showThinkingInMainView": true,
  "allowBackgroundProcesses": true,
  "ideExtensionPromptedAt": {
    "vscode": 1769532708384
  }
 }