我有一个非常具体的问题,我不完全确定在这里问这个问题是否合适。我们从客户那里获取提交给我们的数据,为了验证该数据的质量,我们运行一组查询来检查数据的一致性。
在我们的数据模型中,我们有路线(线)和结构(点),当结构位于路线上时,应该存在链接(在另一个表中)。为了验证这一点,我运行以下查询:
select s.id as id, r.id as unconnected_route_id
from structure s, route r
WHERE s.batch_number = '%{batch_number}'
and r.batch_number = '%{batch_number}'
and SDO_ANYINTERACT(s.geometry, r.geometry) = 'TRUE'
and not exists (
select * from feature_connectivity
where feature_id = r.id and feature_code=1001
and node1_feature_code = 1003
and (node1_id = s.id or node2_id = s.id)
)
这在测试阶段运行良好,但现在我们正在比较一组约 25000 条结构和一组约 25000 条路线,这实际上需要几个小时。所有索引都已到位,我与我们的 dba 检查了我们如何改进它,但我们无法想出一些东西。
[更新:添加解释计划和数据模型/索引]
数据模型:
- ROUTE 和 STRUCTURE 都有一个 ID、BATCH_NUMBER、GEOMETRY 字段(以及一堆其他不相关的列)
- 两个表都有批号索引和几何空间索引
索引:
CREATE INDEX "INFRA"."ROUTE_IX01" ON "INFRA"."ROUTE" ("BATCH_NUMBER")
CREATE INDEX "INFRA"."ROUTE_SX01" ON "INFRA"."ROUTE" ("GEOMETRY") INDEXTYPE IS "MDSYS"."SPATIAL_INDEX" PARAMETERS ('layer_gtype=curve');
CREATE INDEX "INFRA"."STRUCTURE_IX01" ON "INFRA"."STRUCTURE" ("BATCH_NUMBER")
CREATE INDEX "INFRA"."STRUCTURE_SX01" ON "INFRA"."STRUCTURE" ("GEOMETRY") INDEXTYPE IS "MDSYS"."SPATIAL_INDEX" PARAMETERS ('layer_gtype=point');
FEATURE_CONNECTIVITY 有点不同:
FEATURE_ID, FEATURE_CODE, NODE1_ID, NODE1_FEATURE_CODE, NODE2_ID, NODE2_FEATURE_CODE
具有以下索引:
CREATE INDEX "COMMON"."FEATURE_CONNECTIVITY_IX01" ON "COMMON"."FEATURE_CONNECTIVITY ("FEATURE_ID", "FEATURE_CODE")
CREATE INDEX "COMMON"."FEATURE_CONNECTIVITY_IX02" ON "COMMON"."FEATURE_CONNECTIVITY ("NODE1_ID", "NODE1_FEATURE_CODE")
CREATE INDEX "COMMON"."FEATURE_CONNECTIVITY_IX03" ON "COMMON"."FEATURE_CONNECTIVITY ("NODE2_ID", "NODE2_FEATURE_CODE")
CREATE INDEX "COMMON"."FEATURE_CONNECTIVITY_IX04" ON "COMMON"."FEATURE_CONNECTIVITY" ("BATCH_NUMBER")
create unique index FEATURE_CONNECTIVITY_IX05 on FEATURE_CONNECTIVITY (FEATURE_ID, FEATURE_CODE, NODE1_ID, NODE1_FEATURE_CODE, NODE2_ID, NODE2_FEATURE_CODE)
查询的解释计划如下:
有没有人对我们如何改进它有任何建议?
请您参考如下方法:
尝试强制 Oracle 使用嵌套循环而不是像这样的散列连接:
select /*+ USE_NL(s r)*/s.id as id, r.id as unconnected_route_id
from structure s, route r
WHERE s.batch_number = '%{batch_number}'
and r.batch_number = '%{batch_number}'
and SDO_ANYINTERACT(s.geometry, r.geometry) = 'TRUE'
and not exists (
select /*+ NL_AJ(feature_connectivity)*/ * from feature_connectivity
where feature_id = r.id and feature_code=1001
and node1_feature_code = 1003
and (node1_id = s.id or node2_id = s.id)
)
当服务器没有足够的缓冲区空间时,哈希连接会非常慢。在这种情况下,它会将数据写入磁盘然后再读取,并且文件读写总是很慢。 如果这不起作用,请提供您的表的结构,这可能会有所帮助。