使用插值在序列中查找遺漏的值

2024-07-21 02:38:49

字體：大中小

來源：轉載

供稿：網友

　　在有些應用程序中，序列化值的范圍能夠連續是重要的。當在這個序列新增值的時候，它可能會在序列中優先挑選一個“空的（hole）”或者遺漏的值來填充，而不是取序列最大值之后的下一個值，或者使用一個序號產生器。例如電話號碼、社會安全號碼和ip地址都有非常嚴格的范圍限制，所以說最好使用未用的號碼來填充。
　　
　　當數據庫中有很多值的時候經常會出現這個問題。遺漏的值明顯是不被索引的，所以定位遺漏的數字的最明顯的方法是順序查詢所有的值，然后在碰到期望的值的時候將其返回：
　　
　　REM - table in question is "massive" with a sequence "id"
　　set serveroutput on;
　　set timing on;
　　declare
　　　　l_id　　　　integer := null;
　　begin
　　　　for row in (select id from massive order by id) loop
　　　　　　if l_id is null then
　　　　　　　　l_id := row.id;
　　　　　　else
　　　　　　　　l_id := l_id + 1;
　　　　　　end if;
　　　　　　exit when l_id != row.id;
　　　　end loop;
　　　　dbms_output.put_line('lowest missing value = 'l_id);
　　end;
　　/
　　
　　另外一個方法是使用“分而治之”的策略。假如我們已經知道在一個給定的范圍內應該有多少個值，并且表上建有索引的話，我們可以相當快地取得實際的記錄的條數。假如有一個遺漏的值，實際的記錄條數將會比期望的記錄的條數少。
　　
　　我們可以應用相同的方法檢測在較小的一半是否有遺漏的值。假如有的話，就繼續將其對分。假如沒有遺漏的值，我們檢測另外一半。最后，我們將要檢測的記錄集合的大小能夠正好只找出遺漏的數據：
　　
　　下面是這種技術的PL/SQL示例：
　　
　　set serveroutput on
　　declare
　　　　l_min　　　　　　　 integer;
　　　　l_max　　　　　　　 integer;
　　　　actual_count　　　　integer;
　　　　eXPected_count　　　integer;
　　　　half　　　　　　　　integer;
　　begin
　　　　select max(id),min(id),count(*)
　　　　　into l_max,l_min,actual_count
　　　　　from massive;
　　　　expected_count := l_max - l_min + 1;
　　　　if expected_count = actual_count then
　　　　　　dbms_output.put_line('there are no missing values');
　　　　end if;
　　　　while l_max - l_min >= 1 loop
　　　　　　-- try lower half of range
　　　　　　half := trunc(expected_count/2);
　　　　　　expected_count := expected_count - half;
　　　　　　select count(*)
　　　　　　　into actual_count
　　　　　　　from massive
　　　　　　 where id between l_min and l_max - half;
　　　　　　exit when actual_count = 0;
　　　　　　if actual_count = expected_count then
　　　　　　　　-- missing value must be in upper half
　　　　　　　　l_min := l_min + half;
　　　　　　else
　　　　　　　　l_max := l_max - half;
　　　　　　end if;
　　　　end loop;
　　end;
　　/
　　
　　對于具有一百萬條記錄，并且有一個正規索引的表，假如其中漏值只有一個話，有非正式的結果表明第二個腳本的執行速度是第一個的五倍。

上一篇：在已有冗余的表上加唯一約束

下一篇：外部程序使一切變得簡單