kevan's recent timeline updates
kevan

kevan

6t6t.com 风之轩 Easy Listening ...
V2EX member #21701, joined on 2012-05-31 17:34:09 +08:00
Today's activity rank 10261
6t6t.com 风之轩 Easy Listening
kevan's recent replies
@KaiWuBOSS 下班回家马上试用反馈,我跟进了好久,哈哈哈,必须支持
能优化一个 50 系能用的版本吗?
@KaiWuBOSS 大佬,为什么我下载 0.1.6 版本还是不行啊??????

>kaiwu run Qwen3-30B-A3B-UD-Q3_K_XL.gguf --reset

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.6 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5070 Ti (SM120, 16303 MB VRAM, 896 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64
⚠️ CUDA 13.2 detected — known bug with low-bit quantization
If you see garbled output, downgrade driver to CUDA 13.1
Warning: RTX 50 series with CUDA 13.2 detected
Kaiwu will use CUDA 12.4 binary for stability.

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 29B total / 2B active)
Quant: Q3_K_M (12.9 GB)
Mode: full_gpu
Accel: Flash Attention

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
⚠ RTX 50 系首次启动需要 JIT 编译 (~30s),请稍候...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
已清除缓存,重新探测
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5070 Ti: 16303 MB VRAM
模型 Qwen3-30B-A3B: ~13189 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~14309 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数


C:\Kevan\AI\kaiwu-windows-amd64>
@KaiWuBOSS 本来没什么欲望的,看到你的介绍感觉焕发新生一样,傻瓜式安装,能不能推荐一下具体安装方法,你说的那些太资深
kaiwu.exe run Qwen3-30B-A3B

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.2 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5070 Ti (SM120, 16303 MB VRAM, 0 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 30B total / 3B active)
Quant: ud-q3-k-xl (14.0 GB)
Mode: full_gpu
Accel: Flash Attention + MTP (native)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
llama-server 不支持 iso3 ,回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=8K ... OOM
Probe 2: ctx=4K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足,降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败,即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5070 Ti: 16303 MB VRAM
模型 Qwen3-30B-A3B: ~14336 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~15456 MB

建议:
1. 选择更小的量化 (Q2_K)
2. 选择更小的模型
3. 使用 MoE offload 模型( experts 放 CPU RAM )
Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小( 0=自动)
--fast Skip warmup, use cached profile
-h, --help help for run
--llama-server string 使用自定义 llama-server 二进制(完整路径)
--reset 清除缓存,重新 warmup 探测最优参数

醉了?怎么和介绍的不一样呢?
介绍说 8GB 显存都能跑,我 16G 显存怎么不行啊?
大佬,问一下我 5070TI 16GB + 32GB 内存 用哪个模型比较合适? 想用来跑小龙虾
Apr 23
Replied to a topic by intoext 业界八卦 大忽悠贾跃亭回国了?
回来保证被围剿
@mnoputd20adfadf3 我注册了 ID: am95bXVzaWNAMTYzLmNvbQo=
@stark123 请问这个 pdd 上的播放器是啥型号, 我搜不到 谢谢
@baoshu 水垢.
About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   1116 Online   Highest 6679   ·     Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 · 23ms · UTC 17:31 · PVG 01:31 · LAX 10:31 · JFK 13:31
♥ Do have faith in what you're doing.