rechardwong0522

V2EX member #701629, joined on 2024-07-19 12:43:19 +08:00

rechardwong0522 提问技术话题好玩工作信息交易信息城市相关

Per rechardwong0522's settings, the topics list is hidden

Deals info, including closed deals, is not hidden

rechardwong0522's recent replies

5 days ago

Replied to a topic by huangyin0514 › 推广 › [送$10] 折腾出来的“满血版” Claude 4.7 / GPT-5.5/Gemini 接口站， Link-AI 邀 V 友内测，不仅是稳定，更是为了不降智

id 120
谢谢老板

5 days ago

Replied to a topic by l534891619 › 推广 › [压力测试] Codex GPT-5.5 新中转站开业，人人免费领 3 亿 token，评论立送 300 美刀/月会员

FRE-93e5b055
老板发大财！

5 days ago

Replied to a topic by cxzweb › 推广 › # GPT-5.4 / 5.5 / 5.3-codex / image2 中转站，评论送 15 美刀

id：467
谢谢老板，恭喜发财！

Apr 27

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最新版本，还是不行呢。

PS E:\kaiwu-windows-amd64> .\kaiwu.exe run .\Qwen3-30B-A3B-UD-Q3_K_XL.gguf

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.2.3 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce GTX 1070 Ti (SM61, 8192 MB VRAM, 256 GB/s)
RAM: 31 GB UNKNOWN
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 29B total / 2B active)
Quant: Q3_K_M (12.9 GB)
Mode: moe_offload (experts on CPU)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
iso3 不可用（ MinSM61 或非 turboquant binary ），回退到 q8_0/q4_0
✓ VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=32K ... OOM
Probe 2: ctx=16K ... OOM
Probe 3: ctx=8K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce GTX 1070 Ti: 8192 MB VRAM
模型 Qwen3-30B-A3B: ~13189 MB
KV cache (4K, q4_0): ~96 MB
预估总需: ~14309 MB

差额: 6117 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--host string 监听地址（默认 127.0.0.1 ，用 0.0.0.0 开放局域网） (default "127.0.0.1")
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--reset 清除缓存，重新 warmup 探测最优参数

Apr 23

Replied to a topic by drooloo › 职场话题 › 公司的 AI 客服被真人干掉了

个人认为 AI 客服其实不需要用到大模型这样的算力，垂直领域语料有限，再怎么微调也难以给出准确答案，特别是多轮对话场景就更难了。另外，大模型带来的幻觉会严重消耗用户的信任度。就像很多人说的，接电话的第一句就是转人工，目前也没有看到比较好的落地方案。

Apr 22

Replied to a topic by rechardwong0522 › 新手求助 › 这个网站（https://global.v2ex.co/）是假冒的吗？

@sddyzm 好的，谢谢

Apr 2

Replied to a topic by gefangshuai › 分享创造 › 痛定思痛，经过很长一段时间的考虑，决定将 PasteMemo — macOS 智能剪贴板管理器，今天正式开源了！

感谢老哥开源。对于 Swift 初学者来说，PasteMemo 的架构和难度适合用来学习吗？

Feb 9

Replied to a topic by Moyyyyyyyyyyye › 分享创造 › 花 4 个月和 3 万刀做了个 Agent 网页支持工具 coolvibe.io，手机/PC 都能看，支持自部署，订阅免费送！

Y3UJ55YR5Y 试试，谢谢

» More replies by rechardwong0522