求 SQL 优化建议

This topic created in 687 days ago, the information mentioned may be changed or developed.

A 表 40w 数据 B 表 42W 左右, A 和 B 是一对多的关系，现在需要分页查询所以如果用 Aleft join B 数据会变多，group by 的话速度很慢，目前用的 select * from A where exists （ select 1 from B where a.order_id = b.order_id ）页面有些查询条件存在 A 有的存在 B ，查询条件速度很慢，B 的查询条件页面响应速度在 6s 左右，看 sql 执行需要 2.5s

SQL

优化

分页

36 replies • 2024-08-27 11:00:28 +08:00

fengpan567

Aug 26, 2024

不要用子查询，用 join B ，再加个<if>条件判断，如果没有 B 的查询字段，直接跳过 join 只查 A 表

ZZ74

Aug 26, 2024

查询条件只有 order_id ？优化 sql 不如优化索引

Geon97

Aug 26, 2024

数据库架构是什么？什么索引？什么引擎？ group by 的字段是索引吗？使用 join 的方式呢？

utop1a

Aug 26, 2024

这个要看具体情况，比如你查出来的列表不需要 b ，只需要 a 的字段，可以考虑下面这两种，结合执行计划看下索引命中情况。如果 ab 联表查询慢，但是单独查 b 的情况不慢，也可以程序中先单独根据条件查符合 b 的 order_id 再执行 a 的查询

select distinct a.xx, a.xx,a.xx
from a
join b on a.order_id = b.order_id
where
a.xx = ?
and b.xx=?
或者
select a.xx, a.xx,a.xx
from a
where a.order_id in (
select order_id from b where order_id is not null
and xxx=? and xxx=?
)

Vesc

Aug 26, 2024

@fengpan567 #1 多数的情况 A 和 B 都有查询条件

Vesc

Aug 26, 2024

@ZZ74 #2 索引加了的，能加的都加了

Vesc

Aug 26, 2024

@Geon97 #3 mysql InnerDB 引擎，group by 的字段是主键

Vesc

Aug 26, 2024

@jov1 #4 目前是想使用单独查 b 的方法，但是又有一些查询条件既需要 A 又需要 B 的列，而且是 or 的关系

utop1a

Aug 26, 2024

@Vesc 如果打算程序先查 B ，然后再查 A ，这个没关系，比如原来可能是这样
select
a.row1 ,a. row2
from a
join b on b.order_id = a.order_id
<where>
<if test="name != null and namelength()>0">

</where>
where
那你程序无非就是先一天 sql 查询

venicid

Aug 26, 2024

我们经常这么使用，仅供参考
select * from
(
select * from A
where A.xx = xxx
) as t
left join b on t.order_id = b.order_id

utop1a

Aug 26, 2024

@Vesc 如果打算程序先查 B ，然后再查 A ，这个没关系，比如原来可能是这样
select
a.row1 ,a. row2
from a
join b on b.order_id = a.order_id
<where>
a.age > 18
<if test="name != null and namelength()>0">
and (a.name = 'xx' or b.name = 'xx')
</if>
</where>

那你程序就是先单独查 b 的,比如 name 是需要 ab 表都查的，，再将这个 orderIds 作为条件二次查询 a
如果 name 不为空的情况下，就先查 b ，然后返回 order_id(看你描述，ab 是通过 order_id 关联的)
select order_id
from b
where .name = 'xx'

然后
select
a.row1 ,a. row2
from a
<where>
a.age > 18
<if test="name != null and namelength()>0">
and (a.name = 'xx' or a.order_id in
<if test="orderIds != null and orderIds.size() > 0">
<foreach collection="orderIds" item="orderId" separator="," open="(" close=")">
#{orderId,jdbcType=BIGINT}
</foreach>
</if>
)
</if>

</where>

flyfanc

Aug 26, 2024

可能是 mysql 优化的锅，慢的时候执行一下 analyze table 有奇效

flyfanc

Aug 26, 2024

@flyfanc 忽略吧，回复错主题了

Gravitysrainbow

Aug 26, 2024

where exists （ select 1 from B where a.order_id = b.order_id ）

Gravitysrainbow

Aug 26, 2024

where exists （ select 1 from B where a.order_id = b.order_id ）这是个相关子查询，执行慢是因为 B 的查询依赖于 A 的结果，这个时候会先执行 A ，如果 A 的结果剩下 20w ，B 的子查询就要执行 20W 次，可以参考这个文档： https://blog.csdn.net/weixin_43997319/article/details/123713513 ；最简单的优化方案就是使用 join： select
A.*
from A
inner join (
select A.id
from A
left join （ select 1 from B where ） BX ON a.order_id = b.order_id
<where>
<if test="name != null and namelength()>0">

</where>
) AX on AX.id = A.id
where 1=1

Gravitysrainbow

Aug 26, 2024

要确保 select A.id
from A 以及 select 1 from B where 这两个单独的子 sql 都能命中索引，命中后获取到 A 的 ID ，在 inner join 一次手动回表去重，可以避免在 sql 里 distinct 或者 group by 的开销

cccvno1

Aug 26, 2024

建议先分析 sql 的执行计划，再去执行优化。相同的查询在不同版本不同配置下都可能会出现不同的执行计划

wenxueywx

Aug 26, 2024

#15 似乎可行，值得尝试，欢迎 op 测试后回复一下

Vesc

Aug 26, 2024

@Gravitysrainbow #15 感谢大佬我取测试一下速度

Vesc

Aug 26, 2024

@venicid #10 如果 A 的查询条件数据很多和直接连接没区别

Geon97

Aug 26, 2024

@Vesc group by 的字段加索引，或者直接用 join

9dP06m83vIV00l72

Aug 26, 2024

2.5s 对于数据量大、网络状态不好的话，算正常吧。

需要进一步确认时间卡在哪里：
连接？索引？数据汇合？数据传输？客户端显示？

MoYi123

Aug 26, 2024

把 explain 贴出来看看, 如果里面有 hash semi join ,那就说明是机器性能太差了, 否则就是 sql 要改.

nianlifeixing

Aug 26, 2024

exists 不一定最优，A exists B ，A 的数据量大，不要考虑 exists ，大多数情况下一定有条件，不可能查询所有有关联的 A 的信息，可能 A 的数据量有 20w ，不可能统一都给前端把，还是得看场景

Vesc

Aug 26, 2024

@andytao #22 单纯的 sql 执行时间

tbv

Aug 26, 2024

子查询还不如 join 呢

xiaohang427

Aug 26, 2024

可以贴下 SQL ，我现在用 PG ，交易表大概 4000W 条数据，有个终端表大概几 W ，也是用 JOIN 方式。后来想改为 IN 条件，但是条件太多会造成索引失效。还是要具体问题具体对待

KOMA1NIUJUNSHENG

Aug 26, 2024

先查 id 集合，再用 id 查完整数据

zemo

Aug 26, 2024

分页查询，有 limit,关联字段有索引应该不会慢呀，主要就是查处满足条件的 a 数据再做关联，避免全量关联就会快很多，确认走到索引了吗？

aw2350

Aug 26, 2024

先从 B 中查出需要的 A 的 id ，
with queryB as(
select distinct(A.id) from B where xxx)

select * from A INNER JOIN queryB on queryB.AID= A.ID

Xrall

Aug 26, 2024

我也有类似的疑惑。MySQL 中查询

select r.* from t_event_record r inner join t_event_workflow_record wr on r.id = wr.event_id
where wr.user_id = 1 and wr.status= 1
group by r.id order by r.create_time desc limit 10

主表 10w 数据子表 40w+ 就这查询就需要 500ms
count 查询一样的需要 500 ms
explain 一个 index 一个 ref 慢的不得了
就疑惑的很是不是查询就这么慢。