Postgres 中的 upsert 方法为什么在冲突 DO NOTHING 的话就不返回 row?

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 460 天前的主题，其中的信息可能已经有所发展或是发生改变。

我知道 Postgres 没有提供专门的 upsert 方法，但是提供了等效的方法：on conflict。

但是，在冲突发生的时候，如果没有修改该记录，reuturning是没有记录返回的。下面的操作记录，第一次插入没有冲突，有记录插入，返回了记录的 id ；第二次插入的时候冲突，没有修改记录，所以没有返回任何记录。

create table mytab(id serial primary key, name text unique);

dbtests=$ insert into mytab(name) values('aaron') on conflict(name) do nothing returning id;
 id
----
  1
(1 row)

INSERT 0 1

dbtests=$ insert into mytab(name) values('aaron') on conflict(name) do nothing returning id;
 id
----
(0 rows)

INSERT 0 0

发现如果冲突后，强制将 mytab.id 刷新一下，又造成了写负担；如果 union 查询一下，又造成了读负担。难道没有一个可以没有额外开销的语句么？

10 条回复 • 2023-10-06 10:24:59 +08:00

h19981126g

2023-10-05 11:37:51 +08:00

用 ON CONFLICT UPDATE
https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql

zhwguest

2023-10-05 11:43:16 +08:00

UPDATE 会造成写操作；
重新查询会造成读负担；
想咨询有没有好一点的方法，为什么设计成这样，是为了效率么？

dw2693734d

2023-10-05 11:57:48 +08:00

@zhwguest 这能有多大的负担，如果影响读的负担了，直接读 replica 就行了啊

adoal

2023-10-05 12:06:45 +08:00

你这个疑问隐含的假设是 retunring 和 on conflict 差不多是同期出现在 pg 语法里的，出于某些你不知道（所以在这里问）的原因，设计成 returning 只返回实际修改的行而非有可能出现的有冲突的行。

实际上，returning 早在 7.2 就有了，而 on conflict 是在 9.5 才出现的。returning 一开始的语义就是 modified rows ，那时候还没有 on cofclict ，自然不会考虑。等有了 on conflict ，要保持语义的一致性，自然不会改变默认行为。

你想要的大概是一个扩展语法，类似 on conflict do nothing returning conflicted id 这样的，引入一个弱关键词。就看社区能说得上话的大佬们有多少同样想法了😃

zhwguest

2023-10-05 13:01:19 +08:00

@dw2693734d 感谢大神指导...受益匪浅

zhwguest

2023-10-05 13:03:40 +08:00

@adoal 谢谢，我的确是接触 pg 不久，原来是这么个历史原因，受教了。看样子得想别的办法了，多查询一次和最初设计的 io 负荷不匹配，得仔细点考虑。搜索了一下，类似的提问还不少，希望有代表民意的大佬们。再次感谢~！

CRVV

2023-10-05 14:02:49 +08:00

这里有另外一个原因
比如
create table test(x int unique, y int unique, z int);
insert into test select 1, 1, 1;
insert into test select 2, 2, 2;
insert into test select 3, 3, 3;

然后执行 upsert ，insert into test select 1, 2, 3 on conflict do nothing;
这样当然没问题

如果要 returning 冲突的结果，insert into test select 1, 2, 3 on conflict do nothing returning *;
你希望它返回这个结果么？通常不期望这个结果，在 unique 的东西上只返回一行比较符合直觉。另外要返回这个结果的话，数据库需要把所有的 unique constraint 都扫一遍。
x | y | z
---+---+---
1 | 1 | 1
2 | 2 | 2
(2 rows)

如果要 update ，insert into test select 1, 2 on conflict do update set z = 0;（当然这一句执行不了，do update 需要有一个指定的 unique constraint ，叫 conflict_target ）
你希望它把表修改成这样么？同上，通常也不期望这个结果。
x | y | z
---+---+---
1 | 1 | 0
2 | 2 | 0
3 | 3 | 3
(3 rows)

这个功能的设计应该是说在 do update 的时候只修改一行，所以必须指定 conflict_target
要写成 insert into test select 1, 2, 0 on conflict (x) do update set z = 0;
表被修改成
x | y | z
---+---+---
2 | 2 | 2
3 | 3 | 3
1 | 1 | 0
(3 rows)

当然楼主写的 SQL 里面有指定 conflict_target ，但有没有 conflict_target 的结果需要做成一致的，所以只有现在这样应该是唯一的合理结果。

pger

2023-10-06 03:48:11 +08:00

同为程序员，我很理解你的这种简洁的癖好，和对极致性能的追求。

创建个 PL/pgSQL 函数，包装一下插入逻辑：
insert into mytab(name) values('aaron') on conflict(name) do nothing returning id;
如果返回的 id 值为 NULL ，再使用 SELECT 查询对应 name 的 id ，作为函数返回值；

参考：
创建 PL/pgSQL 函数:
https://www.rockdata.net/zh-cn/tutorial/plpgsql-create-function/

使用 INSERT ON CONFLICT 语句进行更新插入:
https://www.rockdata.net/zh-cn/tutorial/dml-upsert/

pger

2023-10-06 03:51:45 +08:00

当然 UNION 也是一种选择：

insert into mytab(name) values('aaron') on conflict(name) do nothing returning id
UNION
select id from mytab where name = 'aaron';

参考：
UNION: 组合多个查询的结果集
https://www.rockdata.net/zh-cn/tutorial/dml-union/

zhwguest

2023-10-06 10:24:59 +08:00

@pger 感谢热心回复。

我这里因为是一些工业场合，追求的是固定配置下的大 io 容量。（不像互联网领域，用户量上来了就增加配置，比较弹性）

最后讨论的方案是，锁表然后先查，没有的情况下再插入。这种方式在 conflict 的情况下，只需要查询一次。

再次感谢~