ORA FAQ 性能調整系列之——當索引第一列由序列產生，一個逆序索引有什么用？

2024-07-21 02:41:03

字體：大中小

來源：轉載

供稿：網友

　　
ORA FAQ 性能調整系列之——
The Oracle (tm) Users' Co-Operative FAQ
Why would a reverse index be useful when the leading column of the index is generated from a sequence ?
當索引第一列由序列產生，一個逆序索引有什么用？
--------------------------------------------------------------------------------

Author's name: Norman Dunbar
Author's Email: Oracle@bountifulsolutions.co.uk
Date written: 18/04/2004
Oracle version(s): 9.2.0.3

--------------------------------------------------------------------------------

When you store data in an indexed table, certain columns of data are copied into the index alongside the rowid of the data row in the table. The data in the table is stored 'randomly', or at least, not necessarily in the order you put them there.
當你在一個索引表中存儲數據時，數據的某些列同表中數據行的ROWID被拷貝進索引。表中的數據是“隨機地”存放的，或者至少不需要按照你放入它們的順序。

The index entries, on the other hand, must be stored in order, otherwise the usability of the index is removed. If you could walk through the entries in an index, you would see that they are in order, usually ascending, but since 8i, descending also works.
而索引項應當按順序存儲，否則其毫無用途。假如你深入一個索引的各項，會看到他們是有序的，通常升序，但從8i開始，也可以降序。

Entries are stored in order of their internal rePResentation, not necessarily the same as what you see on screen when you SELECT the columns from a table.
項目是根據內部表示的順序存儲的，并不需要與你從表中SELECT列時在屏幕上看到的一致。

If the indexed column(s) contain character data (CHAR, NCHAR, VARCHAR2 or NVARCHR2) then the data will appear on screen exactly as it does in the index. For example, if the column contains 'ORACLE' the index entry will also be 'ORACLE'.
若索引列含有字符數據（CHAR, NCHAR, VARCHAR2或NVARCHR2），那么數據在屏幕上的顯示將和在索引中一致。例如，若列中含有“ORACLE”，那么索引項中也是“ORACLE”。

We can use the DUMP command to show us the internal representation of any data type. This command takes four parameters. The first is the data you wish to dump, the second is the base you wish to dump it in. The default is 10 which means that DUMP will display the characters in decimal, or base 10. The other allowed values are 8 (Octal), 16 (Hexadecimal) or 17 (Characters).
我們可以使用DUMP指令來顯示任何數據類型的內部表示。這一命令有四個參數，第一個是要dump的數據，第二個是要dump出的基數，缺省是10，即DUMP將顯示十進制字符。其他的可選參數是8（八進制），16（十六進制），17（字符）。

The third parameter is the start position in the data you wish to dump from and the final parameter is the amount of data you wish to dump. All but the first parameter have sensible defaults.
第三個參數是數據中要dump的開始位置，最后一個參數是要dump的數據量。除了第一個參數外，其他都有合理的缺省值。

Using DUMP, we can see the individual character codes for our 'ORACLE' data :
使用DUMP，我們可以看到“ORACLE”數據的各個字符碼：

SQL> select dump('ORACLE',10) from dual;

DUMP('ORACLE',10)
-------------------------------
Typ=96 Len=6: 79,82,65,67,76,69

We can prove that this is correct by converting back from decimal character codes to actual characters :
我們可以證實，可以正確的將十進制字符碼轉換回真實字符：

SQL> select chr(79),chr(82),chr(65),chr(67),chr(76),chr(69) from dual;

C C C C C C
- - - - - -
O R A C L E

We could have used base 17 to do the same thing :
我們可以使用17作為基數實現同樣的事情：

SQL> select dump('ORACLE',17) from dual;

DUMP('ORACLE',17)
-------------------------
Typ=96 Len=6: O,R,A,C,L,E

Numeric columns are very mUCh different. The internal format of a number is different from that which appears on screen after a SELECT because the internal format is converted to ASCII format so that it can be displayed. We can see this in the following, first in character format :
數字列是不同的。
數的內部格式與SELECT后在屏幕上的顯示是不同的，內部格式轉換為ASCII格式才能正常顯示出來。我們可以如下察看，首先是字符格式：

SQL> select '1234' as "1234",
2 dump('1234', 17)
3 from dual;

1234 DUMP('1234',17)
---- ---------------------
1234 Typ=96 Len=4: 1,2,3,4

Then in internal format :
然后內部格式：

SQL> select 1234 as "a number",
2 dump(1234, 17)
3 from   dual;

a number   DUMP(1234,17)
---------- --------------------
   1234 Typ=2 Len=3: c2,^M,#

The first columns in both examples look identical, but this is only because SQLPlus has converted the internal format of the number 1,234 into the character format so that the display device (the monitor screen) is able to show it. Binary characters have a nasty tendency to disrupt character devices like computer monitors when running in text mode.
兩個例子的第一列看起來一樣，但這只是因為SQLPlus將數1,234的內部格式轉換為了字符格式，從而顯示設備（顯示器屏幕）可以顯示它。二進制字符總是會討厭的擾亂運行于文本模式的類似計算機顯示器這樣的字符設備。

Take a look at the second column in the above examples and notice the difference. In the first example we see the individual characters '1', '2', '3' and '4' while the second example shows only three bytes in the internal format of the number 1,234. Lets change the DUMP calls slightly, and do the whole lot in one command :
看看上面例子的第二列并注重不同點。第一個例子中我們看到一個一個字符'1', '2', '3'和'4'，而第二個例子只顯示數1,234內部格式中的三個字節。讓我們稍微修改一下DUMP的調用，用一個指令完成全部比較：

SQL> select '1234' as "1234",
2 dump('1234', 10),
3 1234 as "a number",
4 dump(1234, 10)
5 from dual;

1234 DUMP('1234',10)     a number   DUMP(1234,10)
---- ------------------------- ---------- ----------------------
1234 Typ=96 Len=4: 49,50,51,52   1234 Typ=2 Len=3: 194,13,35

This time, we see the actual character codes used internally. Once again columns 2 and 4 differ. Column 4 is showing three bytes and these three bytes are the internal binary representation of the number 1,234.
這次，我們看到內部真正的字符碼。列2和4還是不同。列4顯示了三個字節，這三個字節就是數1,234的內部二進制表示。

It is this binary representation that is used in the index entry when a number column is indexed.
當數字列建立索引時，正是這個二進制表示用于索引項中。

Take a few minutes and eXPeriment with dumping a few other numbers - stick to integers for now as those are what sequences generate.
用幾分鐘來實驗一下dump一些其他數字——現在關注整數，正是序列生成的類型。

SQL> create table test (a number);

Table created.

SQL> begin
2    for x in 1 .. 1e6
3    loop
4     insert into test values (x, substr(dump(x,10), 14));
5    end loop;
6 end;
7 /

PL/SQL procedure successfully completed.

If we have a look at the 'b' column of the table, we can see that each entry is ascending in a similar manner to the 'a' column. Here are the first 20 rows :
假如我們看看表的b列，可以看到每個條目是遞增的，正和a列類似。
這里是前20行：

SQL> col b format a20 wrap
SQL> select a,b from test where a < 21;

   A B
---------- ----------
   1 193,2
   2 193,3
   3 193,4
   4 193,5
   5 193,6
   6 193,7
   7 193,8
   8 193,9
   9 193,10
  10 193,11
  11 193,12
  12 193,13
  13 193,14
  14 193,15
  15 193,16
  16 193,17
  17 193,18
  18 193,19
  19 193,20
  20 193,21

The entries are very similar and all have the same leading byte.
各條目非常類似且均有一個前導字節。

How sequences affect indexes.
As mentioned above, index entries have to be stored in order, however, the table data need not be. If your indexed column is fed by a sequence, the data will be similar to the 20 rows shown above.
序列如何影響索引
如前所述，索引項必須按序存儲，但是表中的數據不需要。若你的索引列來自一個序列，數據將類似于上面顯示的20行。

Similar entries will group together in the index, so the index blocks will split as necessary and new entries will end up all hitting the same block until it too fills up and splits.
相似的項將在索引中同一組中，所以索引塊將按需分割，且新的項將填入同一個塊，直至該塊已滿并分割。

If you have one person running the application, this isn't too much of a problem. If the application is multi-user then it means that every user will tend to write into the same index block and buffer busy waits will be the outcome as transactions 'queue' to write data to the hottest index block around.
若單用戶執行應用程序，這不是什么問題。若應用是多用戶的，則意味著每個用戶將寫入同一個索引塊。事務“排隊”來寫數據到最“熱”的索引塊中將造成緩沖忙等待。

Back in our small test, if you select more data from the test table, you will find that in the 1 million rows, there are only 4 different values for the leading byte on the internal numeric format and even worse, most of the entries in the index have the same leading byte value :
回到我們的小實驗，若從測試表選擇更多的數據，會發現在一百萬行中，內部數據格式的前導（/起始）字節僅有4個不同的值，甚至更糟，索引中大多數項有相同的前導（/起始）字節：

SQL> select substr(b,1,3),count(*)
2 from test
3 group by substr(b,1,3);

SUB COUNT(*)
--- ----------
193   99
194    9900
195 990000
196    1

I cheated and discovered that there was a comma in position 4 of every row in the table that's how I knew to use a three character length in my SUBSTR.
我討了個巧并發現表中每行的第4位是一個逗號，這就是為什么我知道在SUBSTR中使用一個三位字符長度。

What the above shows is that in an index of 1 million sequential entries, the vast majority have the same leading byte and so will all be trying to get into the same block in the index.
上面所示的是在一個一百萬序列項的索引中，絕大多數有一個相同的前導（/起始）字節，從而所有（操作）將使用索引中的同一塊。

How reverse indexes cure the problem.
A reverse key index stores the bytes of the indexed column(s) in reverse order, so the data 'ORACLE' is actually stored in the index as 'ELCARO'. Using a reverse index on a column fed by a sequence spreads the location of sequential numbers across a wider range of leaf blocks and the problem of a single hot block is removed because the index entries are stored in reverse order.
逆序索引如何解決這個問題
一個逆序鍵索引按照逆序存儲索引列的字節，所以數據“ORACLE”在索引中保存為“ELCARO”。
在序列填充的列上使用逆序索引將序列值分攤到一個更廣的葉塊上，由于索引項存儲為逆序，一個單獨的“熱”塊的問題就不存在了。

SQL> alter table test add (c varchar2(30));

Table altered.

SQL> update test set c = substr(dump(reverse(a),10),14);

1000000 rows updated.

SQL> select substr(c,1,instr(c,',')-1),count(*)
2 from test
3 group by substr(c,1,instr(c,',')-1)
4 order by to_number(substr(c,1,instr(c,',')-1))

SUB   COUNT(*)
--- ----------
2   10102
3   10101
4   10101
5   10101

All other numbers between 6 and 95 inclusive, have 10,101 entries each.

96   10101
97   10101
98   10101
99   10101
100 10101

99 rows selected.

This time, our 1 million row index entry has it's leading byte value spread across 99 (100 if you include a value for zero) different values, rather than just 4. In addition, the actual reversed bytes are fairly randomly scattered across each of the different values too.
這次，我們一百萬行的索引項將前導（/起始）字節分攤到99（若包括了0則有100個）個不同的值，而不是僅為4。進一步的，事實上逆序字節還相當隨機的分散在每一個不同的值上。

As more entries are added to the index, blocks will be split to accomodate the new entries in their proper location. As the data is arriving almost 'randomly' by means of the reversing of the actual data bytes for the index, the index itself will be extended to accomodate these new values. However, rather than always being stored in the same single 'hot' index block, new entries will be spread across a number of existing blocks (assuming the index has been around for a while) thus reducing contention. Of course, block splits will still occur on these blocks as new values fill up the existing block but it's happening all over the index not just in one place.
隨著更多的項目添加到索引中，各塊將分割以在合適的位置容納新的項目。由于從索引中數據字節逆序處理來說，數據是幾乎“隨機”到來的，索引自身擴展以容納新值。但是，與總是存儲到同一個“熱”塊中不同，新的項目將分攤到多個已有的塊（假定索引已運作了一段時間）中。當然，當新值填滿這些塊時仍會發生塊分割，但這將在索引的各部分發生，而不是一個地方。

This is the reason why reversing the index when its leading column is fed by a sequence reduces buffer contention, removes the hot block problem and by doing so, reduces the potential for buffer busy waits on a multi-user system.
這正是為何當前導（/起始）列由序列生成時索引的逆序處理可以降低緩存爭用，解決熱塊問題，并且這樣做可以在一個多用戶系統中減少緩存忙等待的機會。

Drawbacks to Reverse Key Indexes
Of course, there are drawbacks as well. By setting up a reverse key index you are increasing the clustering factor of the index. The clustering factor (from USER_INDEXES) is used by the optimiser (CBO) to determine how best to access data in an INDEX RANGE SCAN. If the clustering factor is roughly equal to BLOCKS minus FREE_BLOCKS from USER_TABLES then the chances are that a range scan will read one index block, and locate all (or nearly all) of the data rows in needs in one or more adjacent blocks in the table.
逆序鍵索引的不足點
當然，也有不足之處。
設置一個逆序索引，增加了索引的簇因子（Clustering Factor）。優化器（CBO）利用簇因子（來自USER_INDEXES）來決定在一次INDEX RANGE SCAN中如何最好的訪問數據。若簇因子幾乎與BLOCKS - FREE_BLOCKS（來自USER_TABLES）相近，一個區間掃描將可能讀一個索引塊，然后根據需要在一個或更多的臨近塊中定位所有（或接近所有）數據行。

On the other hand, if the clustering factor is close to NUM_ROWS in USER_TABLES then the chances are that the entries stored together in one index block are likely to be scattered throughout a wide range of table blocks - so the index range scan may not be chosen as a good method of access.
另一方面，若簇因子接近USER_TABLES中的NUM_ROWS，一個索引塊中存儲接近的項目就很可能分散到很多的表塊中——所以索引區間掃描就可能不是一個訪問的好方法了。

Obviously the above applies to an analysed table and index.
顯然上面應用于一個分析了的表和索引。

--------------------------------------------------------------------------------

In a quick test on a table with 100,000 rows loaded using a sequence, a normal index was used for most queries returning up to 30 rows, as was the reverse key index, however, when the number of rows went up to 1,000,000 the reverse key index was never used and a full table scan was used every time.
在一個用一個序列載入并含100,000行的表上做一個快速的測試，分別用普通索引和逆序索引查詢并返回30行數據，當行數增長到1,000,000，將不再使用逆序索引而是每次使用全表掃描。

Further reading:
Oracle reference manual for your version of Oracle.
進一步閱讀：
你的Oracle版本的Oracle參考手冊

--------------------------------------------------------------------------------
本文翻譯自http://www.jlcomp.demon.co.uk/faq/autotrace_pruning.Html 譯者僅保留翻譯版權

上一篇：control_files error ORA-01122 ORA-00214(原創)

下一篇：ORA01034/ORA27101的解決方法