google 的新模型，智能文字修图，效果实在是很炸裂。

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

• 请不要在回答技术问题时复制粘贴 AI 生成的内容

我就用了一个提示词，"把图片中的猫咪修改成柴犬"。

能达到这种效果，是我完全没想到的。

目前 API 这个功能已经上线，免费，并且可以直接命令行来使用！ google 真是大善人。

但是我想吐糟一句，google 把所有图片都转成了 base64 ，导致 api json 请求返回巨大。而且很不好写流式图片的加载代码（服务器返回的是 png ，如果是 jpg ，还能边解压边显示）。

智能文字修图

base64

API

14 条回复 • 2025-03-15 16:29:35 +08:00

tool3d

15 小时 37 分钟前

说一下如何调用 API ，官网暂时没写。

先访问 https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:streamGenerateContent?key=%s

把上传的猫咪图片压缩成 base64, 塞进 json 里的 inline_data 的 data 里。

然后请求的 json 添加这句："generationConfig":{"response_modalities":["Text","Image"]} 返回的结果，就能生成图文模式了。

注：暂时没有对应的 openai 兼容调用，直接调用 openrouter 这类 API 中转，应该是没办法生成图片和修图的。必须直接调官方的 google api 。

leighton

14 小时 46 分钟前

```
但是我想吐糟一句，google 把所有图片都转成了 base64 ，导致 api json 请求返回巨大。而且很不好写流式图片的加载代码（服务器返回的是 png ，如果是 jpg ，还能边解压边显示）。
```

理想的设计是什么样的呢

77158158

14 小时 42 分钟前

这个功能，感觉适合电商批量修图？

bskfz

14 小时 10 分钟前

@tool3d https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:streamGenerateContent?key=%s 这个链接打不开

crackidz

9 小时 51 分钟前

你用 AI Studio 的话，右上角点击“Get Code” 就可以了吧

binux

9 小时 22 分钟前

你可以先用 file API https://ai.google.dev/api/files 上传再用 FileData 在 prompt 里引用就好

iorilu

8 小时 55 分钟前

有模型可以做到把视频里面嵌入得字幕去掉吗

kneo

8 小时 43 分钟前 via Android

这不是啥新功能啊，豆包 app 里老早就可以了。api 调用不清楚。

mayli

6 小时 43 分钟前

api 不错，尤其可以直接白嫖这一点就遥遥领先了

timelessg

2 小时 33 分钟前

# This is a sample Python script.
import base64

# Press ⌃R to execute it or replace it with your code.
# Press Double ⇧ to search everywhere for classes, files, tool windows, actions, and settings.

import requests
import json
import mimetypes
import os

BASE_URL = "https://generativelanguage.googleapis.com"
GEMINI_API_KEY = ""
IMG_PATH_2 = ""
DISPLAY_NAME = "TEXT"

# 获取 MIME 类型和文件大小
MIME_TYPE, _ = mimetypes.guess_type(IMG_PATH_2)
NUM_BYTES = os.path.getsize(IMG_PATH_2)

# 发送初始 resumable 上传请求
data = {"file": {"display_name": DISPLAY_NAME}}
headers = {
"X-Goog-Upload-Protocol": "resumable",
"X-Goog-Upload-Command": "start",
"X-Goog-Upload-Header-Content-Length": str(NUM_BYTES),
"X-Goog-Upload-Header-Content-Type": MIME_TYPE,
"Content-Type": "application/json",
}

response = requests.post(
f"{BASE_URL}/upload/v1beta/files?key={GEMINI_API_KEY}",
headers=headers,
json=data
)

# 提取上传 URL
upload_url = response.headers.get("X-Goog-Upload-URL")
if not upload_url:
print("Failed to get upload URL.")
exit(1)

# 读取文件数据并上传
with open(IMG_PATH_2, "rb") as f:
file_data = f.read()

headers = {
"Content-Length": str(NUM_BYTES),
"X-Goog-Upload-Offset": "0",
"X-Goog-Upload-Command": "upload, finalize"
}

response = requests.post(upload_url, headers=headers, data=file_data)
file_info = response.json()
file_uri = file_info.get("file", {}).get("uri")

if not file_uri:
print("Failed to get file URI.")
exit(1)

print(f"file_uri={file_uri}")

# 生成内容请求
data = {
"contents": [{
"parts": [
{"text": "把全部人脸替换成猫头"},
{"file_data": {"mime_type": "image/jpeg", "file_uri": file_uri}}
]
}],
"generationConfig": {"response_modalities": ["Text", "Image"]}
}
headers = {"Content-Type": "application/json"}

response = requests.post(
f"{BASE_URL}/v1beta/models/gemini-2.0-flash-exp:streamGenerateContent?key={GEMINI_API_KEY}",
headers=headers,
json=data
)

response_json = response.json()
with open("response.json", "w", encoding="utf-8") as f:
json.dump(response_json, f, ensure_ascii=False, indent=4)

try:
for item in response_json: # 遍历列表
candidates = item.get("candidates", [])
for candidate in candidates:
content = candidate.get("content", {})
parts = content.get("parts", [])

for part in parts:
inline_data = part.get("inlineData")
if inline_data and "data" in inline_data:
base64_data = inline_data["data"]
mime_type = inline_data.get("mimeType", "image/png")

# 生成对应的文件扩展名
ext = "jpg" if "jpeg" in mime_type else "png"
output_file = f"output.{ext}"

# 解码 Base64 并保存为图片
image_data = base64.b64decode(base64_data)
with open(output_file, "wb") as img_file:
img_file.write(image_data)

print(f"图片已保存为 {output_file}")
break # 找到第一张就退出
else:
continue
break
else:
continue
break

except Exception as e:
print(f"处理 Base64 数据时出错: {e}")

metalvest

2 小时 24 分钟前

@kneo 豆包杂七杂八的功能太多了，宣传没跟上

PositionZero

2 小时 8 分钟前

传了个二次元女角色，一直报 unsafe content

Haku

1 小时 17 分钟前

@PositionZero 这种肯定有 nsfw 的限制吧

PositionZero

1 小时 4 分钟前

@Haku 肯定 SFW 的，把 safety settings 全关了也不行。可能因为图片里疑似未成年？