A Brief Discussion on Copilot CLI's Autopilot and YOLO Mode Mechanisms and Quota Pitfalls
I recently used an old project to test the Copilot CLI. Just as I noticed GPT had released GPT-5.4 mini, I thought my quota was sufficient, so I gave it a try. It turned out I had misunderstood the Autopilot quota deduction mechanism, leading me to accidentally step into a quota trap. I decided to look up the relevant information to verify if my understanding was correct.
Features You Need to Know for Automated Execution
When an AI Agent executes tasks, it defaults to pausing and waiting for user input when it encounters actions that require confirmation. This is reasonable for security, but if you want it to run through an entire process, you need to configure some execution behaviors.
WARNING
Automated execution carries risks. Before running, ensure your code is under version control and carefully evaluate if there are external interfaces or database connections involved.
YOLO Mode
YOLO (You Only Live Once) mode controls whether the system "auto-approves" all high-risk actions, including read/write, delete, and terminal execution requests.
- How to enable:
- Add the parameter at startup:
gh copilot --allow-all(or the community-standard--yoloparameter). - If the copilot interface is already open, enter the slash command:
/yoloor/allow-all.
- Add the parameter at startup:
- Actual operation:
- In normal circumstances, even if the AI decides the next step is to run
rm -rf, the system will default to popping up a confirmation window. - After enabling YOLO, the aforementioned confirmations are silently approved.
- In normal circumstances, even if the AI decides the next step is to run
I personally prefer using "New Copilot CLI Session" in VS Code, which displays the interface in a tab rather than a separate window, making it easier to track which window belongs to which workspace. Since I am already logged in when entering this way, I usually enable it by typing /yolo directly in the interface.
Execution Modes
In the Copilot CLI interactive interface, you can cycle through the following three modes using Shift + Tab:
- Standard: The default interactive mode where the user provides instructions step-by-step. The AI responds and waits for the next input, with the pace of task progression controlled by the user.
- Plan: The AI first clarifies questions to confirm the scope of requirements, then creates a structured implementation plan. It only executes after the plan is confirmed, making it suitable for cross-file or complex logical tasks.
- Autopilot: The AI enters an autonomous loop, without waiting for user input at every step, until the task is completed, an error is encountered, the user manually presses Ctrl+C, or the continuation limit is reached. If full tool permissions are not granted, operations requiring approval will be automatically rejected, which may prevent the task from completing. You can use the
--max-autopilot-continuesparameter to limit the maximum number of autonomous executions. Official Documentation: Autopilot Mode Details
VS Code also has a similar setting called chat.agent.maxRequests, but there are differences in the positioning and billing methods of the two:
--max-autopilot-continues | chat.agent.maxRequests | |
|---|---|---|
| Tool | Copilot CLI | VS Code |
| Restricted Object | Autonomous continuation count of Autopilot | AI model call turns for the Agent |
| Billing Timing | Each autonomous continuation step deducts one premium request | Only user-issued prompts are billed; tool calls and clicking "Continue" are not counted separately |
| After reaching the limit | Execution stops immediately | Asks whether to continue |
| Design Purpose | Prevent infinite loops | Prevent the agent from executing in the wrong direction, keeping the developer in control |
Currently, I have not seen a corresponding setting for chat.agent.maxRequests in the Copilot CLI.
The Autopilot Quota Trap
The mechanism of Autopilot is: when it is time for user confirmation, if the user does not respond, it will reply on your behalf and continue execution. Each "reply on your behalf" round-trip deducts from your quota.
GPT-related models have a habit (other models might too, but GPT is quite proactive): after a task is completed, it will actively ask if you want to perform further actions. Under normal circumstances, you can decide for yourself whether to continue, but when paired with Autopilot, it will directly reply for you and trigger the next step.
The scenario I encountered was: low-tier GPT model + low reasoning + just asking a question (not actually executing a task). Under this combination, the model replied without thinking carefully, and after replying, it wanted to confirm again from another angle. It kept looping, and I saw Continuing autonomously (0.33 premium requests) appear 5 to 6 times. This scenario is relatively easy to reproduce. Although it deducted the quota of a low-tier model, the loss was limited, but it felt quite bad Q.Q
What is more noteworthy is the other direction: if you switch to a high-billing model like Claude Opus, when Autopilot cannot end properly, the cost of each meaningless trigger is much higher.
In fact, many users online have reported that Autopilot cannot end correctly after a task is completed, leading to a large amount of quota being burned in the background:
- GitHub Issue #1532: Infinite loop issue in Autopilot mode
- GitHub Issue #1477: Discussion on quota consumption for follow-up requests
Summary
When the quota is sufficient, providing enough context for the AI to judge the direction on its own, combined with a model with strong execution capabilities like GPT-5.4, you can consider enabling YOLO + Autopilot to let it optimize autonomously. However, in most scenarios, using YOLO is enough; you don't necessarily need to enable Autopilot. If you are just asking questions rather than executing tasks, adding Autopilot is more likely to cause unnecessary quota consumption.
Changelog
- 2026-03-22 Initial document creation.
