LLM 评估 - V2EX

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 78 天前的主题，其中的信息可能已经有所发展或是发生改变。

今天看到一个有意思的 repo,作者用一个 DSL 来评估 LLM 的能力

DSL

"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> \
    LLMRun("What flag is shown in this image?") >> \
        (SubstringEvaluator("United States") | SubstringEvaluator("USA") | SubstringEvaluator("America"))

其中有一个 python convert to c(这个我第一次没反应过来哈哈) 问题很有意思,所有的 llm 都错了.

Question


def foo(x):
    sum = 0
    for i in range(x):
        x += i
        sum += x
    return sum

LLM OUTPUT

#include <stdio.h>

int foo(int x) {
    int sum = 0;
    for (int i = 0; i < x; i++) {
        x += i;
        sum += x;
    }
    return sum;
}

int main() {
    int result = foo(5); // Example call, replace 5 with any integer to test with different values
    printf("Result: %d\n", result);
    re

Reference

python_to_c_loop_update

1 条回复 • 2024-02-21 19:38:03 +08:00

devliu1

78 天前

不错