[20221128]再谈防水墙(视图访问性能问题).txt

[20221128]再谈防水墙(视图访问性能问题).txt

1.环境:
SYS> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.问题分析:
--//上个星期看了生产系统awr报表,发现一些问题.原来防水墙不断调用执行如下:
begin :con := "TASSETACL"."QUERYACL"(:sn, :on); end;
--//该语句已经消失,我问了同事,据说升级过该产品,当时awr的记录如下:
SQL ordered by Executions

    %CPU - CPU Time as a percentage of Elapsed Time
    %IO - User I/O Time as a percentage of Elapsed Time
    Total Executions: 23,199,582
    Captured SQL account for 34.2% of Total

Executions      Rows Processed  Rows per Exec   Elapsed Time (s)%CPU    %IO   SQL Id          SQL Module      SQL Text
6,522,522       6,521,177       1.00            1,660.15        99.4    0     190q1sz3ywrd7   xxxxxx.eee      begin :con := "TASSETACL".QUE...
2,228,287       2,228,115       1.00            107.98          99.5    0     g7ytdh9mxt1s0   xxxxxx.eee      select count ( :"SYS_B_0" ) fr...

--//而现在变成了如下:
SQL ordered by Elapsed Time

    Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
    为PL/SQL代码报告的资源包括该代码调用的所有SQL语句所使用的资源。
    % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
    %Total - Elapsed Time as a percentage of Total DB time
    %CPU - CPU Time as a percentage of Elapsed Time
    %IO - User I/O Time as a percentage of Elapsed Time
    Captured SQL account for 65.1% of Total DB Time (s): 33,342
    Captured PL/SQL account for 41.7% of Total DB Time (s): 33,342

Elapsed Time (s)  Executions   Elapsed Time per Exec (s)  %Total  %CPU   %IO  SQL Id         SQL Module                                 SQL Text
5,156.39          33,682       0.15                       15.47   98.35  0.01 3pw59r6rkw9gb  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  declare PRIVS_ERROR exception;...
5,153.22          33,683       0.15                       15.46   98.34  0.01 0xwxau19hznj8  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  begin dbagent.try_login(); end...
5,089.67          134,713      0.04                       15.26   98.34  0.00 8vmu6k690g87k  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  SELECT UPPER(NVL(PROGRAM, 'nul...

--//原来的问题消失了,但是消耗的时间却增加了,很明显对方的产品没有经过严格的测试,就给用户使用了.
--//取出sql语句,主要便于查看并且格式化如下:
SYS> @ sql_id 3pw59r6rkw9gb
--SQL_ID = 3pw59r6rkw9gb
declare
  PRIVS_ERROR exception; --raise error,if rule exception,will trigger privs_error
  pragma exception_init(PRIVS_ERROR, -1031);
begin
  execute immediate 'begin dbagent.try_login(); end;';
exception
  when PRIVS_ERROR then
    raise;
  when others then
    rollback;
end dbagent_logon;;
--//注意后面多了1个分号,是我写的脚本无法区分PL/SQL语句与sql语句造成的问题.

SYS> @sql_id 0xwxau19hznj8
--SQL_ID = 0xwxau19hznj8
begin dbagent.try_login(); end;;

SYS> @ sql_id 8vmu6k690g87k
--SQL_ID = 8vmu6k690g87k
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE SID = SYS_CONTEXT ('userenv', 'sid');

--//134713/33682 = 3.99955.平均执行3pw59r6rkw9gb1次,执行8vmu6k690g87k 4次.
--//可以看出调用顺序3pw59r6rkw9gb,0xwxau19hznj8,8vmu6k690g87k 分别执行次数的比例是1:1:4.
--//很明显开发没有写好程序代码.仅仅需要调用1次8vmu6k690g87k就ok了.
--//如果修改代码可以提高效率4倍. 仅仅需要5156/4 = 1289秒.但是我的问题并不再这里,细节很重要,为什么8vmu6k690g87k的执行有点
--//慢.再仔细看sql_id='8vmu6k690g87k'每次执行需要0.04秒=40ms(估计存在四舍五入问题,因为0.04*4=0.16),有点慢,因为where条件
--//查询使用sid,不应该存在这么"慢"的情况.

--//开始以为都是Module=oraagent.bin执行的,后面我发现实际上几乎全部模块都存在相似调用,看代码可以猜出是登陆数据库时触发器
--//调用执行的代码.这时我突然想起我自己写的登陆触发器用于设置cursor_sharing=force也存在类似的调用v$session视图的情况,很
--//明显我写的代码调用次数应该也有33XXX次数.Elapsed Time至少应该 5089/4 = 1272秒,但是并没有出现在awr报表中.另外我想到是
--//否因为select部分消耗cpu资源太多,我马上否定我的猜测,因为仅仅返回1行,无论如何不应该这么"慢".

--//当我看我写的触发器代码马上明白问题在那里.我是分开写的,可以参考链接
--// http://blog.itpub.net/267265/viewspace-2768591/ => [20210418]查询v$视图问题.txt
--// connor-mcdonald.com/2021/04/12/better-performance-when-querying-the-v-views

3.继续:
$ cat a.txt
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE SID = SYS_CONTEXT ('userenv', 'sid');

--//查看起对应的执行计划:
SYS> @ sl all
alter session set statistics_level = all;
Session altered.

SYS> @ a.txt
SYS> @ dpc '' '' ''
PLAN_TABLE_OUTPUT
---------------------------
Plan hash value: 2422122865
---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |      1 |        |       |     1 (100)|      1 |00:00:00.02 |       |       |          |
|   1 |  MERGE JOIN CARTESIAN     |                 |      1 |      1 |    62 |     0   (0)|      1 |00:00:00.02 |       |       |          |
|   2 |   NESTED LOOPS            |                 |      1 |      1 |    12 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  3 |    FIXED TABLE FULL       | X$KSLWT         |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |      1 |      1 |     4 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   5 |   BUFFER SORT             |                 |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.02 |  2048 |  2048 | 2048  (0)|
|*  6 |    FIXED TABLE FULL       | X$KSUSE         |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.02 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$88122447
   3 - SEL$88122447 / W@SEL$4
   4 - SEL$88122447 / E@SEL$4
   6 - SEL$88122447 / S@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid')))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter((BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INDX"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))
              AND "S"."INST_ID"=USERENV('INSTANCE')))
--//你可以发现ID=3,全表扫描X$KSLWT,而过滤条件是 filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))),也就是无法把
--//SYS_CONTEXT ('userenv', 'sid')当作常量处理.
--//而生产库X$KSLWT 记录的数量不是一般测试环境的数量.我的测试环境仅仅返回24条.而生产系统返回很多.
SYS> select count(*) from X$KSLWT;
  COUNT(*)
----------
      8793
--//访问X$KSUSE也是出现类似的情况.

--//解决很简单,分开写:
--// Get user SID information
SELECT SYS_CONTEXT('userenv', 'sid')   INTO v_sid  FROM dual;

--// Get Program executable,OSUSER Details,Machine Details for this session
SELECT UPPER(NVL(PROGRAM, 'NULL')),
       UPPER(MODULE),
       DECODE(NVL(INSTR(PROCESS, ':'), 0), 0, NVL(PROCESS, 1234), SUBSTR(PROCESS, 1, INSTR(PROCESS, ':') - 1))
       TYPE, OSUSER, MACHINE, SCHEMANAME, USERNAME, SERVICE_NAME, SID, SERIAL#
  FROM SYS.V_$SESSION
  where sid = v_sid;

SYS> SELECT SYS_CONTEXT('userenv', 'sid')    FROM dual;
SYS_CONTEXT('USERENV','SID')
----------------------------
17017

$ cat a.txt
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE
SID= 17017
--SID = SYS_CONTEXT ('userenv', 'sid')
;

SYS> @a.txt
..

SYS> @ dpc '' '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  gxdr0gq6nyums, child number 0
-------------------------------------
Plan hash value: 589956979
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                 |      1 |        |       |     1 (100)|      1 |00:00:00.01 |       |       |          |
|   1 |  NESTED LOOPS              |                 |      1 |      1 |    62 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   2 |   MERGE JOIN CARTESIAN     |                 |      1 |      1 |    58 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  3 |    FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   4 |    BUFFER SORT             |                 |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |  2048 |  2048 | 2048  (0)|
|*  5 |     FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  6 |   FIXED TABLE FIXED INDEX  | X$KSLED (ind:2) |      1 |      1 |     4 |     0   (0)|      1 |00:00:00.01 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$88122447
   3 - SEL$88122447 / S@SEL$4
   5 - SEL$88122447 / W@SEL$4
   6 - SEL$88122447 / E@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("S"."INDX"=17017 AND BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTSID"=17017)
   6 - filter("W"."KSLWTEVT"="E"."INDX")

--//connor-mcdonald.com 链接给出了另外的写法,使用with+materialize提示:
$ cat b.txt
with mysid as ( select /*+ materialize */ userenv('SID') n from dual )
SELECT UPPER(NVL(PROGRAM, 'NULL')),
       UPPER(MODULE),
       DECODE(NVL(INSTR(PROCESS, ':'), 0), 0, NVL(PROCESS, 1234), SUBSTR(PROCESS, 1, INSTR(PROCESS, ':') - 1))
       TYPE, OSUSER, MACHINE, SCHEMANAME, USERNAME, SERVICE_NAME, SID, SERIAL#
  FROM SYS.V_$SESSION,mysid
 WHERE SID = mysid.n;

SYS> @ b.txt
..

Plan hash value: 2230424401
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                        | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                             |      1 |        |       |     4 (100)|          |      1 |00:00:00.03 |      15 |      1 |      1 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION     |                             |      1 |        |       |            |          |      1 |00:00:00.03 |      15 |      1 |      1 |       |       |          |
|   2 |   LOAD AS SELECT               |                             |      1 |        |       |            |          |      0 |00:00:00.02 |       4 |      0 |      1 |   270K|   270K|  270K (0)|
|   3 |    FAST DUAL                   |                             |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|   4 |   NESTED LOOPS                 |                             |      1 |      1 |    75 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   5 |    NESTED LOOPS                |                             |      1 |      1 |    71 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   6 |     NESTED LOOPS               |                             |      1 |      1 |    63 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   7 |      VIEW                      |                             |      1 |      1 |    13 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   8 |       TABLE ACCESS STORAGE FULL| SYS_TEMP_0FD9D6D71_CF54F429 |      1 |      1 |    13 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |  1025K|  1025K|          |
|*  9 |      FIXED TABLE FIXED INDEX   | X$KSUSE (ind:1)             |      1 |      1 |    50 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 10 |     FIXED TABLE FIXED INDEX    | X$KSLWT (ind:1)             |      1 |      1 |     8 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 11 |    FIXED TABLE FIXED INDEX     | X$KSLED (ind:2)             |      1 |      1 |     4 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$71D7A081
   2 - SEL$1
   3 - SEL$1        / DUAL@SEL$1
   7 - SEL$D67CB2D2 / MYSID@SEL$2
   8 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
   9 - SEL$71D7A081 / S@SEL$5
  10 - SEL$71D7A081 / W@SEL$5
  11 - SEL$71D7A081 / E@SEL$5
Predicate Information (identified by operation id):
---------------------------------------------------
   9 - filter((BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INST_ID"=USERENV('INSTANCE') AND "S"."INDX"="MYSID"."N"))
  10 - filter("S"."INDX"="W"."KSLWTSID")
  11 - filter("W"."KSLWTEVT"="E"."INDX")

4.性能对比:
--//我在toad下测试,多次执行观察
--//调用b.txt 大约需要2Xms.
--//调用a.txt 大约需要5Xms.
--//有时候两者差异不大,感觉这样比较不行,没有考虑网络的往返.另外写一篇测试该语句性能的blog.
--//注:我后面测试with+materialize要产生redo日志,不建议在生产系统应用这样的写法,主要原因是执行频率太高了,累积起来redo很大!!

5.为什么调用1次0xwxau19hznj8,执行4次8vmu6k690g87k.
--//我看了一下源代码,源代码这次没有加密.
--//通过过程collect_app_info调用sql_id=8vmu6k690g87k
--//而try_login代码通过4个函数分别取得对应值

--get login app name
function get_app_name return varchar2 is
begin
  collect_app_info;
  return l_appname;
end;
--get login app module name
function get_app_module_name return varchar2 is
begin
  collect_app_info;
  return l_module;
end;

--get login app type
function get_app_type return varchar2 is
begin
  collect_app_info;
  return l_type;
end;

--get app process number
function get_app_process return number is
begin
  collect_app_info;
  return l_process;
end;
--//这样就很好解析为什么调用1次执行4次sql_id=8vmu6k690g87k.而实际上
  procedure collect_app_info is
  begin
    if not app_info_collect_status then
      select upper(nvl(program, 'null')),
             upper(module),
             type,
             decode(nvl(instr(process, ':'), 0),
                    0,
                    nvl(process, 1234),
                    substr(process, 1, instr(process, ':') - 1)),
             osuser,
             machine,
             SCHEMANAME,
             USERNAME,
             SERVICE_NAME,
             sid,
             serial#
        into l_appname,
             l_module,
             l_type,
             l_process,
             l_osuser,
             l_machine,
             l_SCHEMANAME,
             l_username,
             l_service_name,
             l_sid,
             l_serial#
        from sys.v_$session
       where sid = sys_context('userenv', 'sid');
    end if;

  end;

--//已经赋值给对应变量,根本不需要再通过什么函数返回对应值.直接使用对应变量应该可以,当然我没有测试.
--//最简单修改:
  procedure collect_app_info is
  begin
    if not app_info_collect_status then
      select userenv('SID') into l_sid from dual;
      select upper(nvl(program, 'null')),
             upper(module),
             type,
             decode(nvl(instr(process, ':'), 0),
                    0,
                    nvl(process, 1234),
                    substr(process, 1, instr(process, ':') - 1)),
             osuser,
             machine,
             SCHEMANAME,
             USERNAME,
             SERVICE_NAME,
             sid,
             serial#
        into l_appname,
             l_module,
             l_type,
             l_process,
             l_osuser,
             l_machine,
             l_SCHEMANAME,
             l_username,
             l_service_name,
             l_sid,
             l_serial#
        from sys.v_$session
       where sid = l_sid;
    end if;

  end;
--//我的感觉可能分开写效率更高!!因为使用with+materialize会生成一个临时表,会产生日志,我还给测试看看.

6.总结:
--//很明显对方缺乏蛮力测试就把产品做了升级,我曾经对乙方工程师讲过类似的问题,尽量不要让客户先于你发现产品的问题,这样是很
--//丢脸的事情.你内部测试怎么犯错都可以,客户是看不见的,一旦你犯错,对方先发现,对方会........

--//实际上只要大量用户连接数据库,该语句的问题才明显的暴露出来.

热门相关:斗神战帝   刺客之王   刺客之王   时间都知道(唐嫣、窦骁、杨烁主演)   大神你人设崩了