Tip
2010.05.04 11:05

UDP Buffer Tuning 기법

조회 수 30852 추천 수 0 댓글 3

오라클 매거진 2007년 가을호 발췌자료입니다.


일반적으로 RAC의 Inter-Connects의 성능과 가장 연관이 깊은것이 UDP Buffer의 크기이다. 오라클이 공식적으로 권고하는 UDP
Buffer 크기는 256K이고 대부분의 시스템에서 적절한 성능을 보장하고있다. Inter-Connect을 통해 주고 받는 Data량이 많은 경우 UDP Buffer Size를 1M~2M 정도 사용하기도 하며 Transaction에 비해 UDP Buffer Size 가 작을 경우 Packet Loss 현상이 발생할 수 있다. Packet Loss 현상이 자주 발생할 경우“gc cr block lost”,“gc current block lost”Oracle Wait Event가 나타난다. 이는 각 OS 별 Monitoring Tool인 nmon, topas, glance이나 Network 명령인 ifconfig, netstat 등과
Network 진단 Tool을 통해 UDP Packet Receive Error나 Dropped 된비율을 Monitoring 하도록 한다.
[oracle@rac1]$ netstat -s
Ip: ………
Tcp: …….

 


앞의 Network Setting 값들은 System 운영 중에 변경이 가능하며 System Rebooting 시에는 Default 값으로 초기화되므로 System Starting Script나 Kernel Level에서 값을 Settting하여 자동 적용이 되도록 한다.

 

고객사 장애사례

 

Instance Recovery 및 Reconfigure 시간을 최대 단축하는

Hidden Parameter(_imr_max_reconfig_max)와 Instance Recovery 없이 (_imr_active=false) Open 시도를 해도 두 번째 Node의 DB는 약20분 후에 Open됨.

 

DB Mount 상태에서 10046-level 12로 event를 걸어 DB Open하여 trace를 살펴보니 Inter-connects 간 DB Open전에 동기화를 위한 Recursive Call 시 ”global cache cr request”event와 해당 elaspsed time이 굉장히 자주 많이 발생하여 Inter-Connects 간 Network 전송속도에 문제가 있을 것으로 판단하여 OS의 Inter-Connects용 Gigabit-NIC 설
정 값을 확인해보니 100 M Full 설정이 되어 있어 1000 M Full로 변경하여 정상적으로 DB Open 및 업무가 진행이 되었다.

 

 


일반적으로 RAC의 Inter-Connects의 성능과 가장 연관이 깊은것이 UDP Buffer의 크기이다. 오라클이 공식적으로 권고하는 UDP
Buffer 크기는 256K이고 대부분의 시스템에서 적절한 성능을 보장하고있다. Inter-Connect을 통해 주고 받는 Data량이 많은 경우 UDP Buffer Size를 1M~2M 정도 사용하기도 하며 Transaction에 비해 UDP Buffer Size 가 작을 경우 Packet Loss 현상이 발생할 수 있다. Packet Loss 현상이 자주 발생할 경우“gc cr block lost”,“gc current block lost”Oracle Wait Event가 나타난다. 이는 각 OS 별 Monitoring Tool인 nmon, topas, glance이나 Network 명령인 ifconfig, netstat 등과
Network 진단 Tool을 통해 UDP Packet Receive Error나 Dropped 된비율을 Monitoring 하도록 한다.
[oracle@rac1]$ netstat -s
Ip: ………
Tcp: …….

 

OS별 Default UDP Buffer Size와 조절방법(MAX:8M)
kernel Value Default Command
Linex net.core.rmem_max 131071 sysctl -w net.core.rmem_max = 8388608
Solaris udp_max_buf 262144 ndd -set /dev/udp udp_max_buf 8388608
AIX sb_max 1048576 no -o sb_max=8388608
(1048576, 4194304, 83388608 값 중에서만)


앞의 Network Setting 값들은 System 운영 중에 변경이 가능하며 System Rebooting 시에는 Default 값으로 초기화되므로 System Starting Script나 Kernel Level에서 값을 Settting하여 자동 적용이 되도록 한다.

 

고객사 장애사례

 

Instance Recovery 및 Reconfigure 시간을 최대 단축하는

Hidden Parameter(_imr_max_reconfig_max)와 Instance Recovery 없이 (_imr_active=false) Open 시도를 해도 두 번째 Node의 DB는 약20분 후에 Open됨.

 

DB Mount 상태에서 10046-level 12로 event를 걸어 DB Open하여 trace를 살펴보니 Inter-connects 간 DB Open전에 동기화를 위한 Recursive Call 시 ”global cache cr request”event와 해당 elaspsed time이 굉장히 자주 많이 발생하여 Inter-Connects 간 Network 전송속도에 문제가 있을 것으로 판단하여 OS의 Inter-Connects용 Gigabit-NIC 설
정 값을 확인해보니 100 M Full 설정이 되어 있어 1000 M Full로 변경하여 정상적으로 DB Open 및 업무가 진행이 되었다.

OS별 Default UDP Buffer Size와 조절방법(MAX:8M)
kernel Value Default Command
Linex net.core.rmem_max 131071 sysctl -w net.core.rmem_max = 8388608
Solaris udp_max_buf 262144 ndd -set /dev/udp udp_max_buf 8388608
AIX sb_max 1048576 no -o sb_max=8388608
(1048576, 4194304, 83388608 값 중에서만)

  • 고구마 2010.05.04 12:14

    RAC튜닝에서 중요한 내용이네요... 굿

  • 김지훈 2010.05.07 10:12
    Tuning Inter-Instance Performance in RAC and OPS [ID 181489.1]  

      수정 날짜 29-MAR-2009     유형 BULLETIN     상태 PUBLISHED  
    
    PURPOSE
    -------
    
    This note was written to help DBAs and Support Analysts Understand Inter-Instance
    Performance and Tuning in RAC.
    
    
    SCOPE & APPLICATION
    -------------------
    
    Real Application Clusters uses the interconnect to transfer blocks and messages 
    between instances.  If inter-instance performance is bad, almost all database 
    operations can be delayed.  This note describes methods of identifying and 
    resolving inter-instance performance issues.
    
    
    TUNING INTER-INSTANCE PERFORMANCE IN RAC AND OPS
    ------------------------------------------------
    
    
    
    
    SYMPTOMS OF INTER-INSTANCE PERFORMANCE PROBLEMS
    -----------------------------------------------
    
    The best way to monitor inter-instance performance is to take AWR or statspack 
    snaps on each instance (at the same time) at regular intervals.  
    
    If there are severe inter-instance performance issues or hung sessions, you 
    may also want to run the racdiag.sql script from the following note 
    to collect additional RAC specific data:
    
      Note 135714.1 
      Script to Collect RAC Diagnostic Information (racdiag.sql) 
    
    The output of the script has tips for how to read the output.  
    
    Within the AWR, statspack report, or racdiag.sql output, you can use the wait 
    events and global cache statistics to monitor inter-instance performance.  It 
    will be important to look for symptoms of inter-instance performance issues.  
    These symptoms include:
    
    1. The average cr block receive time will be high.  This value is calculated by
    dividing the 'global cache cr block receive time' statistic by the 
    'global cache cr blocks received' statistic:
    
    	global cache cr block receive time
    	----------------------------------
         	 global cache cr blocks received
    
    Multiply this calculation by 10 to find the average number of milliseconds.  In a 
    9.2 statspack report you can also use the following Global Cache Service Workload 
    characteristics:
    
    Ave receive time for CR block (ms):                        4.1
    
    The following query can also be run to monitor the average cr block receive time 
    since the last startup:
    
    set numwidth 20
    column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
    select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED", 
    b1.value "GCS CR BLOCK RECEIVE TIME",
    ((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
    from gv$sysstat b1, gv$sysstat b2
    where b1.name = 'global cache cr block receive time' and
    b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id ;
    
    The average cr block receive time or current block receive time should typically be 
    less than 15 milliseconds depending on your system configuration and volume, is the 
    average latency of a consistent-read request round-trip from the requesting instance 
    to the holding instance and back to the requesting instance. 
    
    Please note that if you are on 9i and the global cache current block receive 
    time is abnormally high and the average wait time for the 'global cache null 
    to x' wait event is low (under 15ms) then you are likely hitting bug 2130923 
    (statistics bug).  This is a problem in the way statstics are reported and does 
    not impact performance.
    
    More about that issue is documented in the following note:
    
      Note 243593.1 
      RAC: Ave Receive Time for Current Block is Abnormally High in Statspack 
    
    2. "Global cache" or "gc" events will be the top wait event.  Some of these wait
    events show the amount of time that an instance has requested a data block for a 
    consistent read or current block via the global cache.  
    
    
    
    When a consistent read buffer cannot be found in the local cache, an attempt is 
    made to find a usable version in another instance. There are 3 possible outcomes, 
    depending on whether any instance in the cluster has the requested data block 
    cached or not: 
    
    a) A cr block is received (i.e. another instance found or managed to produce the 
       wanted version).  The "global cache cr blocks received" statistic is incremented. 
    b) No other instance has the block cached and therefor the requesting instance 
       needs to read from disk, but a shared lock will be granted to the requestor 
       The "global cache gets" statistic is incremented 
    c) 9i RAC+ Only --> A current block is received (the current block is good enough for 
       the query ).  The " global cache current blocks received" statistic is 
       incremented.
    
    In all three cases, the requesting process may wait for global cache cr request.
    The view X$KCLCRST (CR Statistics) may be helpful in debugging 'global cache cr 
    request' wait issues.  It will return the number of requests that were handled for 
    data or undo header blocks, the number of requests resulting in the shipment of a 
    block (cr or current),  and the number of times a read from disk status is returned.
    
    It should be noted that having 'global cache' or 'gc' waits does not always
    indicate an inter-instance performance issue.  Many times this wait is 
    completely normal if data is read and modified concurrently on multiple
    instances.  Global cache statistics should also be examined to determine if 
    there is an inter-instance performance problem.
    
    3. The GES may run out of tickets.  When viewing the racdiag.sql output 
    (Note 135714.1) or querying the gv$ges_traffic_controller or 
    gv$dlm_traffic_controller views, you may find that the TCKT_AVAIL shows '0'.  To 
    find out the available network buffer space we introduce the concepts of tickets.  
    The maximum number of tickets available is a function of the network send buffer 
    size. In the case of lmd and lmon, they always buffer their messages in case of 
    ticket unavailability.  A node relies on messages to come back from the remote 
    node to release tickets for reuse.
    
    4. The above information should be enough to identify an inter-instance performance
    problem but additional calculations can be made to monitor inter-instance 
    performance can be found in the documentation.
    
    
    IDENTIFYING AND RESOLVING INTER-INSTANCE PERFORMANCE PROBLEMS
    -------------------------------------------------------------
    
    Inter-Instance performance issues can be caused by:
    
    1. Under configured network settings at the OS.  Check UDP, or other network protocol 
    settings and tune them.  See your OS specific documentation for instructions on how 
    to do this.  If using UDP, make sure the parameters relating to send buffer space, 
    receive buffer space, send highwater, and receive highwater are set well above the 
    OS default.  The alert.log will indicate what protocol is being used.  Example:
    
    	cluster interconnect IPC version:Oracle RDG
    	IPC Vendor 1 proto 2 Version 1.0
    
    Changing network parameters to optimal values:
    
     Sun (UDP Protcol) 
    	UDP related OS parameters can be queried with the following command:
    		ndd /dev/udp udp_xmit_hiwat
    		ndd /dev/udp udp_recv_hiwat 
                    ndd /dev/udp udp_max_buf 
    	Set the udp_xmit_hiwat and udp_recv_hiwat to the OS maximum with:
    		ndd -set /dev/udp udp_xmit_hiwat <value>
    		ndd -set /dev/udp udp_recv_hiwat <value> 
                    ndd -set /dev/udp udp_max_buf <1M or higher>
     IBM AIX (UDP Protocol)
    	UDP related OS parameters can be queried with the following command:
    		no -a
    	Set the udp_sendspace and udp_recvspace to the OS maximum with:
    		no -o <parameter>
     Linux (edit files)
    	/proc/sys/net/core/rmem_default 
    	/proc/sys/net/core/rmem_max
    	/proc/sys/net/core/wmem_default
    	/proc/sys/net/core/wmem_max 
     HP-UX (HMP Protocol):
    	The file /opt/clic/lib/skgxp/skclic.conf contains the Hyper Messaging Protocol (HMP)
            configuration parameters that are relevant for Oracle:
    	- CLIC_ATTR_APPL_MAX_PROCS Maximum number of Oracle processes. This includes
    	  the background and shadow processes. It does not
    	  include non-IPC processes like SQL client processes.
    	- CLIC_ATTR_APPL_MAX_NQS This is a derivative of the first parameter; it will 
              be removed in the next release. For the time being, this should be set to 
              the value of CLIC_ATTR_APPL_MAX_PROCS.
    	- CLIC_ATTR_APPL_MAX_MEM_EPTS Maximum number of Buffer descriptors. Oracle 
    	  seems to require about 1500-5000 of them depending on the block size (8K or 
    	  2K). You can choose the maximum value indicated above.
    	- CLIC_ATTR_APPL_MAX_RECV_EPTS Maximum number of Oracle Ports. Typically, 
    	  Oracle requires as many ports as there are processes. Thus it should be 
    	  identical to CLIC_ATTR_APPL_MAX_PROCS.
    	- CLIC_ATTR_APPL_DEFLT_PROC_SENDS Maximum number of outstanding sends. You 
    	  can leave it at the default value of 1024.
    	- CLIC_ATTR_APPL_DEFLT_NQ_RECVS Maximum number of outstanding receives on a 
    	  port or buffer. You can leave it at the default value of 1024.
     HP-UX (UDP Protcol):
    	Not tunable before HP-UX 11i Version 1.6
          For HP-UX 11i Version 1.6 or later be able to use below command to set socket_udp_rcvbuf_default & socket_udp_sndbuf_default 
          ndd -set /dev/udp socket_udp_rcvbuf_default 1048576
          echo $?
          ndd -set /dev/udp socket_udp_sndbuf_default 1048576
          echo $? 
     HP Tru64 (RDG Protocol):
    	RDG related OS parameters are queried with the following command:
    		/sbin/sysconfig -q rdg 
    	The most important parameters and settings are:
    	- rdg_max_auto_msg_wires - MUST be set to zero.
    	- max_objs - Should be set to at least <# of Oracle processes * 5> and up to 
    	  the larger of 10240 or <# of Oracle processes * 70>. Example: 5120
    	- msg_size - Needs to set to at least <db_block_size>, but we recommend 
    	  setting it to 32768, since Oracle9i supports different block sizes for each 
    	  tablespace.
    	- max_async_req - Should be set to at least 100 but 256+ may provide better 
    	  performance.
    	- max_sessions - Should be set to at least <# of Oracle processes + 20>, 
    	  example: 500	
     HP TRU64 (UDP Protocol):
    	UDP related OS parameters are queried with the following command:
    		/sbin/sysconfig -q udp 
    	udp_recvspace 
    	udp_sendspace 
    
    
    2. If the interconnect is slow, busy, or faulty, you can look for dropped packets,
    retransmits, or cyclic redundancy check errors (CRC).  You can use netstat commands
    to check the networks.  On Unix, check the man page for netstat for a list of options.  
    Also check the OS logs for any errors and make sure that inter-instance traffic is 
    not routed through a public network.  
    
    
    With most network protcols, you can use 'oradebug ipc' to see which interconnects 
    the database is using:
    
      SQL> oradebug setmypid
      SQL> oradebug ipc
    
    This will dump a trace file to the user_dump_dest.  The output will look something 
    like this:
    
    SSKGXPT 0x1a2932c flags SSKGXPT_READPENDING     info for network 0
            socket no 10    IP 172.16.193.1         UDP 43749
            sflags SSKGXPT_WRITESSKGXPT_UP  info for network 1
            socket no 0     IP 0.0.0.0      UDP 0...
    
    So you can see that we are using IP 172.16.193.1 with a UDP protocol.
    
    
    3. A large number of processes in the run queue waiting for CPU or scheduling
    delays.  If your CPU has limited idle time and your system typically processes 
    long-running queries, then latency may be higher.  Ensure that LMSx processes get 
    enough CPU.
    
    4. Latency can be influenced by a high value for the DB_FILE_MULTIBLOCK_READ_COUNT 
    parameter. This is because a requesting process can issue more than one request 
    for a block depending on the setting of this parameter.  
    
    
    ADDITIONAL RAC AND OPS PERFORMANCE TIPS
    ---------------------------------------
    
    1. Poor SQL or bad optimization paths can cause additional block gets via the
    interconnect just as it would via I/O.  
    
    2. Tuning normal single instance wait events and statistics is also very 
    important.
    
    3. A poor gc_files_to_locks setting can cause problems.  In almost all cases 
    in RAC, gc_files_to_locks does not need to set at all.  
    
    
    4. The use of locally managed tablespaces (instead of dictionary managed) with 
    the 'SEGMENT SPACE MANAGEMENT AUTO' option can reduce dictionary and freelist 
    block contention.  Symptoms of this could include 'buffer busy' waits.  See the 
    following notes for more information:
    
      Note 105120.1
      Advantages of Using Locally Managed vs Dictionary Managed Tablespaces 
    
      Note 103020.1 
      Migration from Dictionary Managed to Locally Managed Tablespaces 
    
      Note 180608.1
      Automatic Space Segment Management in RAC Environments
    
    
    Following these recommendations can help you achieve maximum performance in
    your clustered environment.
    
    
    RELATED DOCUMENTS
    -----------------
    Oracle Documentation
    Note 188135.1 - Documentation Index for Real Application Clusters and Parallel Server 
    Note 94224.1 FAQ- STATSPACK COMPLETE REFERENCE 
    Note 135714.1 - Script to Collect RAC or OPS Diagnostic Information 
    Note 157766.1 - Sessions Wait Forever for 'global cache cr request' Wait Event...
    Note 151051.1 - PARAMETER:CLUSTER_INTERCONNECTS
    Note 120650.1 - Init.ora Parameter "OPS_INTERCONNECTS" Reference Note
    
    
  • 승현짱 2010.05.19 12:22
    좋은자료 감사합니다.

List of Articles
번호 분류 제목 글쓴이 날짜 조회 수
공지 Q&A Oracle관련 게시물만 Sean 2014.04.09 84839
80 Tip 10G rollback시간 예상하기 고구마 2010.04.09 15155
79 Tip SESSION KILL에 대하여 고구마 2010.04.09 13381
78 Tip Trace Event 세팅 고구마 2010.04.09 17285
77 Tip 영역할당해제 고구마 2010.04.09 11469
76 Tip INDEX 사용여부 확인하기 고구마 2010.04.09 19101
75 Tip Log miner 사용방법 1 김준호 2010.04.06 29847
74 Tip 화일의 손상 여부를 확인하는 dbv 사용 방법 김준호 2010.04.06 18477
73 Tip shared pool wait event 2 file 유주환 2010.04.05 28881
72 Tip SHRINK 와 MOVE 의 특징 김준호 2010.03.31 13630
71 Tip Windows Server 2008 (64bit) - Oracle 10g 설치 file 김준호 2010.03.29 20682
70 Tip TABLESPACE FREESPACE 조회하기 1 고구마 2010.05.19 16657
» Tip UDP Buffer Tuning 기법 3 김준호 2010.05.04 30852
68 Tip TM과 TX 락 1 유주환 2010.04.30 19020
67 Tip 오라클 튜닝 세미나 자료(IO) 1 file 고구마 2010.04.28 13927
66 Tip 오라클 튜닝 세미나자료 (메모리) file 고구마 2010.04.28 14662
65 Tip SQL수행 처리 절차 고구마 2010.04.28 20809
64 Tip show space 고구마 2010.04.28 16767
63 Tip (Diagnostics) Oracle10g DB 접속 안될때 sqlplus 에서 SYSTEMSTATE DUMP 받기 1 고구마 2011.05.18 28946
62 Tip 10g standard edition과 enterprise edition의 차이점 2 윤현 2011.05.16 26017
61 Tip 윈도우에서 일정기간 지난 파일 및 폴더 자동삭제 하기 1 송기성 2011.03.31 21211
Board Pagination Prev 1 2 3 4 5 6 7 8 Next
/ 8