Skip to content
View Article Network

A Brief Discussion on Copilot CLI's Autopilot and YOLO Mode Mechanisms and Quota Pitfalls

I recently used an old project to test the Copilot CLI, and just as GPT released GPT-5.4 mini, I thought my quota was sufficient, so I gave it a try. I ended up misunderstanding the Autopilot quota deduction mechanism, which led to an unexpected "quota minefield." I decided to look up relevant information to verify if my understanding was correct.

Features You Need to Know for Automated Execution

When an AI Agent performs tasks, it defaults to pausing and waiting for user input when it encounters actions that require confirmation. This is reasonable for security, but if you want it to run through an entire process, you need to configure its execution behavior.

WARNING

Automated execution carries risks. Before running, ensure your code is under version control, and exercise caution if the code involves external interfaces or database connections.

YOLO Mode

YOLO (You Only Live Once) mode controls whether the system "auto-approves" all high-risk actions, including read/write, delete, and terminal execution requests.

  • How to enable:
    • Add the parameter at startup: gh copilot --allow-all (or the --yolo parameter commonly used by the community).
    • If the Copilot interface is already open, enter the slash command: /yolo or /allow-all.
  • How it works:
    • Normally, even if the AI decides the next step is to run rm -rf, the system will default to popping up a confirmation window.
    • Once YOLO is enabled, these confirmations are bypassed silently.

I am accustomed to using "Add Copilot CLI Session" in VS Code, which presents the interface in tabs rather than a separate window, making it easier to track which window belongs to which workspace. Since I am already logged in when entering this mode, I usually just type /yolo in the interface to enable it.

Execution Modes

In the Copilot CLI interactive interface, you can cycle through the following three modes using Shift + Tab:

  • Standard: The default interactive mode where the user provides instructions step-by-step. The AI responds and waits for the next input, with the pace of task progression controlled by the user.
  • Plan: The AI first clarifies the scope of the requirement through questions, then creates a structured implementation plan. Execution only begins after the plan is confirmed. This is suitable for cross-file or complex logic tasks.
  • Autopilot: The AI enters an autonomous loop, without waiting for user input at every step, until the task is completed, an error is encountered, the user manually presses Ctrl+C, or the maximum number of continuations is reached. If full tool permissions are not granted, actions requiring approval will be automatically rejected, which may prevent the task from completing. You can use the --max-autopilot-continues parameter to limit the maximum number of autonomous executions. Official Documentation: Autopilot Mode Details

VS Code also has a similar setting called chat.agent.maxRequests, but there are differences in the positioning and billing methods of the two:

--max-autopilot-continueschat.agent.maxRequests
ToolCopilot CLIVS Code
Limit TargetAutopilot's autonomous continuation countAgent's AI model call rounds
Billing TimingEach autonomous continuation step deducts one premium requestOnly user-issued prompts are billed; tool calls and clicking "Continue" are not billed separately
After reaching the limitExecution stops immediatelyAsks whether to continue
Design PurposePrevents infinite loopsPrevents the agent from executing in the wrong direction, keeping the developer in control

Currently, there is no corresponding setting for chat.agent.maxRequests in the Copilot CLI.

Autopilot's Quota Pitfalls

The mechanism of Autopilot is: when it is time for user confirmation, if the user does not respond, it will reply on your behalf and continue execution. Each "reply on your behalf" round consumes quota.

GPT-related models have a habit (other models might as well, but GPT is quite proactive): after a task is completed, it will actively ask if you want to perform further actions. Under normal circumstances, you can decide whether to continue, but when paired with Autopilot, it will directly reply for you and trigger the next step.

The scenario I encountered was: low-tier GPT model + low "thinking" + just asking a question (not actually performing a task). In this combination, the model replied without thinking carefully, and after replying, it wanted to confirm again from another angle. It kept looping, and I saw Continuing autonomously (0.33 premium requests) appear 5 to 6 times. This scenario is relatively easy to reproduce. Although it consumes the quota of a low-tier model, the loss is limited, but it still feels bad Q.Q

What is more noteworthy is the other direction: if you switch to a high-billing model like Claude Opus, the cost of each meaningless trigger becomes much higher when Autopilot cannot terminate correctly.

In fact, many users on the internet have reported that Autopilot fails to terminate correctly after a task is completed, leading to a large amount of quota being burned in the background:

Summary

When the quota is sufficient, providing enough context for the AI to judge the direction on its own, combined with a model with strong execution capabilities like GPT-5.4, makes it worth considering enabling YOLO + Autopilot to let it optimize autonomously. However, in most scenarios, YOLO is enough, and Autopilot is not necessarily needed. If you are just asking questions rather than executing tasks, adding Autopilot is more likely to cause unnecessary quota consumption.

Changelog

    • Initial document creation.