kevan's recent timeline updates

kevan

6t6t.com 风之轩 Easy Listening ...

V2EX member #21701, joined on 2012-05-31 17:34:09 +08:00

Today's activity rank 12393

www.6t6t.com Geo

Shanghai

KevanTM

6t6t.com 风之轩 Easy Listening

kevan 提问技术话题好玩工作信息交易信息城市相关

出京东 plus 2 小时家政 40 Y2tja2xvaw==

二手交易 • kevan • 19 days ago • Lastly replied by kevan

6T6T.COM 出

域名 • kevan • 6 days ago • Lastly replied by w3sun

出京东 plus 家政 2 小时

二手交易 • kevan • Feb 2 • Lastly replied by imsoso

刚才在微信群发个 URL： antigravity.google 微信立即被迫掉线，重登后需要确认社区规则条款，服了。

微信 • kevan • Dec 25, 2025 • Lastly replied by luojun09211

推广 • kevan • Dec 12, 2025

推广 • kevan • Dec 13, 2025 • Lastly replied by stinkytofux

出 JD 家政 2 小时

二手交易 • kevan • Dec 9, 2025

石头扫地机可以 APP 唤醒机器人来我当前的位置，是通过什么技术实现定位的。

问与答 • kevan • Dec 3, 2025 • Lastly replied by kevan

求助，我笔记本的 4060 跑语言模型很困难吗？

Local LLM • kevan • Oct 14, 2024 • Lastly replied by chronos

Hi, 欢迎光临我的音乐小站风之轩 Easy Listening

音乐 • kevan • May 31, 2012 • Lastly replied by kevan

» More topics by kevan

kevan's recent replies

11h 33m ago

Replied to a topic by haonanaaaaaa › 程序员 › Gemini 要重新做教育认证了，之前薅了谷歌的羊毛的，谷歌喊你们 5.28 前做教育认证了

我两个号也收到该邮件了, 这 google 开始收网行动了

Please complete the re-verification process by May 28, 2026. If you don't re-verify by this date, your student trial will end and you'll lose access to these benefits.

5 days ago

Replied to a topic by longxinglink › Google › Google AI Pro 付费会员在支持地区会自动获得 YouTube Premium Lite

我的 pro 没有...

YouTube and YouTube Music ad-free, offline, and in the background
1-month trial for ¥0 • Then ¥1,280⁠/⁠month • Cancel anytime

Try 1 month for ¥0

5 days ago

Replied to a topic by Lcy0128 › OpenAI › 问一下佬们 plus 的购买渠道

@wwter123 怎么拼

16 days ago

Replied to a topic by babymonster › Local LLM › 我自己的电脑是 5070Ti，总感觉跑一些模型算力不够

@iovekkk 我也是这个配置,跑 27b 都费劲.要想流畅体验只能 9B

19 days ago

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

还是用不了

Apr 28

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 下班回家马上试用反馈，我跟进了好久，哈哈哈，必须支持

Apr 28

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

能优化一个 50 系能用的版本吗?

Apr 25

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 大佬,为什么我下载 0.1.6 版本还是不行啊??????

>kaiwu run Qwen3-30B-A3B-UD-Q3_K_XL.gguf --reset

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.6 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5070 Ti (SM120, 16303 MB VRAM, 896 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64
⚠️ CUDA 13.2 detected — known bug with low-bit quantization
If you see garbled output, downgrade driver to CUDA 13.1
Warning: RTX 50 series with CUDA 13.2 detected
Kaiwu will use CUDA 12.4 binary for stability.

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 29B total / 2B active)
Quant: Q3_K_M (12.9 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
⚠ RTX 50 系首次启动需要 JIT 编译 (~30s)，请稍候...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
已清除缓存，重新探测
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5070 Ti: 16303 MB VRAM
模型 Qwen3-30B-A3B: ~13189 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~14309 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--reset 清除缓存，重新 warmup 探测最优参数

C:\Kevan\AI\kaiwu-windows-amd64>

Apr 25

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 本来没什么欲望的，看到你的介绍感觉焕发新生一样，傻瓜式安装，能不能推荐一下具体安装方法，你说的那些太资深

» More replies by kevan