淺談 C# 中的代碼協同 (Coroutine) 執行支持

2024-07-21 02:19:11

字體：大中小

來源：轉載

供稿：網友

幾個月前我曾大致分析過 c# 2.0 中 iterator block 機制的實現原理，《c# 2.0 中iterators的改進與實現原理淺析》，文中簡要介紹了 c# 2.0 是如何在不修改 clr 的前提下由編譯器，通過有限狀態機來實現 iterator block 中 yield 關鍵字。
實際上，這一機制的最終目的是提供一個代碼協同執行的支持機制。
以下內容為程序代碼:

using system.collections.generic;

public class tokens : ienumerable<string>
{
public ienumerator<string> getenumerator()
{
for(int i = 0; i<elements.length; i++)
yield elements[i];
}
...
}

foreach (string item in new tokens())
{
console.writeline(item);
}

在這段代碼執行過程中，foreach 的循環體和 getenumerator 函數體實際上是在同一個線程中交替執行的。這是一種介于線程和順序執行之間的協同執行模式，之所以稱之為協同（coroutine），是因為同時執行的多個代碼塊之間的調度是由邏輯隱式協同完成的。順序執行無所謂并行性，而線程往往是由系統調度程序強制性搶先切換，相對來說win3.x 中的獨占式多任務倒是與協同模型比較類似。
就協同執行而言，從功能上可以分為行為、控制兩部分，控制又可進一步細分為控制邏輯和控制狀態。行為對應著如何處理目標對象，如上述代碼中：行為就是將目標對象打印到控制臺；控制則是如何遍歷這個 elements 數組，可進一步細分為控制邏輯（順序遍歷）和控制狀態（當前遍歷到哪個元素）。下面將按照這個邏輯介紹不同語言中如何實現和模擬這些邏輯。

spark gray 在其 blog 上有一個系列文章介紹了協同執行的一些概念。

iterators in ruby (part - 1)
warming up to using iterators (part 2)

文章第 1, 2 部分以 ruby 語言（語法類似 python）介紹了 iterator 機制是如何簡化遍歷操作的代碼。實際上中心思想就是將行為與控制分離，由語言層面的支持來降低控制代碼的薄記工作。
以下內容為程序代碼:

def textfiles(dir)
dir.chdir(dir)

dir["*"].each do |entry|
yield dir+"/"+entry if /^.*.txt$/ =~ entry

if filetest.directory?(entry)
textfiles(entry){|file| yield dir+"/"+file}
end
end
dir.chdir(".."[img]/images/wink.gif[/img]
end

textfiles(“c:/”){|file|
puts file
}

例如上面這段 ruby 的遞歸目錄處理代碼中，就采用了與 c# 2.0 中完全類似的語法實現協同執行支持。

對 c# 1.0 和 c++ 這類不支持協同執行的語言，協同執行過程中的狀態遷移或者說執行緒的調度工作，需要由庫和使用者自行實現，例如 stl 中的迭代器 (iterator) 自身必須保存了與遍歷容器相關的位置信息。例如在 stl 中實現協同執行：
以下內容為程序代碼:

#include <vector>
#include <algorithm>
#include <iostream>

// the function object multiplies an element by a factor
template <class type>
class multvalue
{
private:
type factor; // the value to multiply by
public:
// constructor initializes the value to multiply by
multvalue ( const type& _val [img]/images/wink.gif[/img] : factor ( _val [img]/images/wink.gif[/img] {
}

// the function call for the element to be multiplied
void operator ( [img]/images/wink.gif[/img] ( type& elem [img]/images/wink.gif[/img] const
{
elem *= factor;
}
};

int main( [img]/images/wink.gif[/img]
{
using namespace std;

vector <int> v1;

//...

// using for_each to multiply each element by a factor
for_each ( v1.begin ( [img]/images/wink.gif[/img] , v1.end ( [img]/images/wink.gif[/img] , multvalue<int> ( -2 [img]/images/wink.gif[/img] [img]/images/wink.gif[/img];
}

雖然 stl 較為成功的通過迭代器、算法和謂詞，將此協同執行邏輯中的行為和控制分離，謂詞表現行為(multvalue<int>、迭代器(v1.being(), v1.end())表現控制狀態、算法表現控制邏輯(for_each)，但仍然存在編寫復雜，使用麻煩，并且語義不連冠的問題。
一個緩解的方法是將謂詞的定義與控制部分合并到一起，就是類似 boost::lambda 的實現思路：
以下內容為程序代碼:

for_each(v.begin(), v.end(), _1 = 1);

for_each(vp.begin(), vp.end(), cout << *_1 << ' ');

通過神奇的模板和宏，可以一定程度降低編寫獨立謂詞來定義行為的復雜度。但控制部分的狀態和邏輯還是需要單獨實現。

而 c# 1.0 中就干脆沒有自帶支持，必須通過《c# 2.0 中iterators的改進與實現原理淺析》一文中所舉例子那樣笨拙的方式完成。
以下內容為程序代碼:

public class tokens : ienumerable
{
public string[] elements;

tokens(string source, char[] delimiters)
{
// parse the string into tokens:
elements = source.split(delimiters);
}

public ienumerator getenumerator()
{
return new tokenenumerator(this);
}

// inner class implements ienumerator interface:
private class tokenenumerator : ienumerator
{
private int position = -1;
private tokens t;

public tokenenumerator(tokens t)
{
this.t = t;
}

// declare the movenext method required by ienumerator:
public bool movenext()
{
if (position < t.elements.length - 1)
{
position++;
return true;
}
else
{
return false;
}
}

// declare the reset method required by ienumerator:
public void reset()
{
position = -1;
}

// declare the current property required by ienumerator:
public object current
{
get // get_current函數
{
return t.elements[position];
}
}
}
...
}

這種笨拙的 ienumerable 接口實現方法，實際上是將 stl 中提供控制狀態的 iterator 完全自行實現，而且控制邏輯還限定于編寫 ienumerable 接口實現時的定義。就算可以通過策略 (strategy) 模式提供一定程度的定制，但其代碼邏輯過于分散，要理解一個簡單調用必須查看四五處分散的代碼。

好在牛人總是不缺的，呵呵。

ajai shankar 在 msdn 上一篇非常出色的文章，coroutines implementing coroutines for .net by wrapping the unmanaged fiber api，里面通過 win32 api 的纖程 (fiber) 支持和 clr 幾個底層 api 的支持，完整的實現了一套可用的協同執行支持機制。
spark gray 的第 4 篇文章中就詳細討論了這種實現方式的利弊：

sicp, fiber api and iterators !(part 4)

纖程 fiber 是 win32 子系統為了移植 unix 下偽線程環境下的程序方便，而提供的一套輕量級并行執行機制，由程序代碼自行控制調度流程。
其使用方法很簡單，在某個線程中調用 convertthreadtofiber(ex) 初始化纖程支持，然后調用 createfiber(ex) 建立多個不同纖程，對新建的纖程和轉換時當前線程缺省纖程，都可以通過 switchtofiber 顯式進行調度。
以下內容為程序代碼:

static int array[3] = { 0, 1, 2 };

static int cur = 0;

void callback fiberproc(pvoid lpparameter)
{
for(int i=0; i<sizeof(array)/sizeof(array[0]); i++)
{
cur = array[i];

switchtofiber(lpparameter);
}
}

lpvoid fibermain = convertthreadtofiber(null);

lpvoid fiberfor = createfiber(0, fiberproc, fibermain);

while(cur >= 0)
{
std::cout << cur << std::endl;

switchtofiber(fiberfor);
}

deletefiber(fiberfor);

上述偽代碼是纖程使用的一個大概流程，可以看出實際上纖程跟上面 ruby 和 c# 2.0 中的協同執行所需功能是非常符合的。而在實現上，纖程實際上是通過在同一線程堆棧中構造出不同的區域(convertthreadtofiber/createfiber)，在 switchtofiber 函數中切換到指定區域，以此區域(纖程)的代碼和寄存器等環境執行，有點類似于 c 代碼庫中 longjmp 的概念。netscape 提供的狀態線程庫 state threads library 就是通過 longjmp 等機制模擬的類似功能。
而在 .net 1.0/1.1 中要使用纖程，則還需要考慮對每個纖程的 managed 環境構造，以及切換調度時的狀態管理等等。有興趣的朋友可以仔細閱讀上述兩篇精彩文章。
以下內容為程序代碼:

class coriter : fiber {
protected override void run() {
object[] array = new object[] {1, 2, 3, 4};
for(int ndx = 0; true; ++ndx)
yield(arr[ndx]);
}
}

coroutine next = new coriter();
object o = next();

可以看到這個代碼已經非常類似 c# 2.0 中的語法了，只是要受到一些細節上的限制。

而 c# 2.0 中，大概是為了保障移植性，使用了將控制邏輯編譯成狀態機的方式實現，并由狀態機自動管理控制狀態。其原理我在《c# 2.0 中iterators的改進與實現原理淺析》一文中已經大概分析過了，有興趣的朋友可以進一步閱讀 spark gray 的第 5 篇文章中的詳細分析。

implementation of iterators in c# 2.0 (part 5)

以及 matt pietrek 的關于 iterator 狀態機的分析文章

fun with iterators and state machines

而為了將行為與控制更緊密地綁定到一起，c# 2.0 也提供了類似 c++ 中 boost::lambda 機制的匿名方法支持。簡要的分析可以參考我以前的一篇文章《clr 中匿名函數的實現原理淺析》，或者spark gray 的第 6 篇文章。

implementation of closures (anonymous methods) in c# 2.0 (part 6)

中國最大的web開發資源網站及技術社區，

上一篇：c#初學之petshop！

下一篇：在 C# 中處理結構內的數組