基于虎書實現LALR(1)分析并生成GLSL編譯器前端代碼(C#)
為了完美解析GLSL源碼,獲取其中的信息(都有哪些in/out/uniform等),我決定做個GLSL編譯器的前端(以后簡稱編譯器或FrontEndParser)。
以前我做過一個CGCompiler,可以自動生成LL(1)文法的編譯器代碼(C#語言的)。于是我從《The OpenGL ® Shading Language》(以下簡稱"PDF")找到一個GLSL的文法,就開始試圖將他改寫為LL(1)文法。等到我重寫了7次后發現,這是不可能的。GLSL的文法超出了LL(1)的范圍,必須用更強的分析算法。于是有了現在的LALR(1)Compiler。
理論來源
《現代編譯原理-c語言描述》(即"虎書")中提供了詳盡的資料。我就以虎書為理論依據。
虎書中的下圖表明了各種類型的文法的范圍。一般正常的程序語言都是符合LALR(1)文法的。
由于LR(0)是SLR的基礎,SLR是LR(1)的基礎;又由于LR(1)是LALR(1)的基礎(這看上去有點奇怪),所以我必須從LR(0)文法開始一步一步實現LALR(1)算法。
輸入
給定文法,這個文法所描述的語言的全部信息就都包含進去了。文法里包含了這個語言的關鍵字、推導結構等所有信息。這也是我覺得YACC那些東西不好的地方:明明有了文法,還得自己整理出各種關鍵字。
下面是一個文法的例子:
1 // 虎書中的文法3-10 2 <S> ::= <V> "=" <E> ; 3 <S> ::= <E> ; 4 <E> ::= <V> ; 5 <V> ::= "x" ; 6 <V> ::= "*" <E> ;
下面是6個符合此文法的代碼:
1 x 2 *x 3 x = x 4 x = * x 5 *x = x 6 **x = **x
輸出
輸出結果是此文法的編譯器代碼(C#)。這主要是詞法分析器LexicalAnalyzer和語法分析器SyntaxParser兩個類。
之后利用C#的CSharpCodeProvider和反射技術來加載、編譯、運行生成的代碼,用一些例子(例如上面的*x = x)測試是否能正常運行。只要能正常生成語法樹,就證明了我的LALR(1)Compiler的實現是正確的。
例如對上述文法的6個示例代碼,LALR(1)Compiler可以分別dump出如下的語法樹:

1 (__S)[S][<S>] 2 └─(__E)[E][<E>] 3 └─(__V)[V][<V>] 4 └─(__xLeave__)[x][x]

1 (__S)[S][<S>] 2 └─(__E)[E][<E>] 3 └─(__V)[V][<V>] 4 ├─(__starLeave__)[*]["*"] 5 └─(__E)[E][<E>] 6 └─(__V)[V][<V>] 7 └─(__xLeave__)[x][x]

1 (__S)[S][<S>] 2 ├─(__V)[V][<V>] 3 │ └─(__xLeave__)[x][x] 4 ├─(__equalLeave__)[=]["="] 5 └─(__E)[E][<E>] 6 └─(__V)[V][<V>] 7 └─(__xLeave__)[x][x]

1 (__S)[S][<S>] 2 ├─(__V)[V][<V>] 3 │ └─(__xLeave__)[x][x] 4 ├─(__equalLeave__)[=]["="] 5 └─(__E)[E][<E>] 6 └─(__V)[V][<V>] 7 ├─(__starLeave__)[*]["*"] 8 └─(__E)[E][<E>] 9 └─(__V)[V][<V>] 10 └─(__xLeave__)[x][x]

1 (__S)[S][<S>] 2 ├─(__V)[V][<V>] 3 │ ├─(__starLeave__)[*]["*"] 4 │ └─(__E)[E][<E>] 5 │ └─(__V)[V][<V>] 6 │ └─(__xLeave__)[x][x] 7 ├─(__equalLeave__)[=]["="] 8 └─(__E)[E][<E>] 9 └─(__V)[V][<V>] 10 └─(__xLeave__)[x][x]

1 (__S)[S][<S>] 2 ├─(__V)[V][<V>] 3 │ ├─(__starLeave__)[*]["*"] 4 │ └─(__E)[E][<E>] 5 │ └─(__V)[V][<V>] 6 │ ├─(__starLeave__)[*]["*"] 7 │ └─(__E)[E][<E>] 8 │ └─(__V)[V][<V>] 9 │ └─(__xLeave__)[x][x] 10 ├─(__equalLeave__)[=]["="] 11 └─(__E)[E][<E>] 12 └─(__V)[V][<V>] 13 ├─(__starLeave__)[*]["*"] 14 └─(__E)[E][<E>] 15 └─(__V)[V][<V>] 16 ├─(__starLeave__)[*]["*"] 17 └─(__E)[E][<E>] 18 └─(__V)[V][<V>] 19 └─(__xLeave__)[x][x]
能夠正確地導出這些結果,就說明整個庫是正確的。其實,只要能導出這些結果而不throw Exception(),就可以斷定結果是正確的了
計劃
所以我的開發步驟如下:
示例
虎書中已有了文法3-1(如下)的分析表和一個示例分析過程,所以先手工實現文法3-1的分析器。從這個分析器的代碼中抽取出所有LR分析器通用的部分,作為LALR(1)Compiler的一部分。
1 // 虎書中的文法3-1 2 <S> ::= <S> ";" <S> ; 3 <S> ::= identifier ":=" <E> ; 4 <S> ::= "print" "(" <L> ")" ; 5 <E> ::= identifier ; 6 <E> ::= number ; 7 <E> ::= <E> "+" <E> ; 8 <E> ::= "(" <S> "," <E> ")" ; 9 <L> ::= <E> ; 10 <L> ::= <L> "," <E> ;
算法
經此之后就對語法分析器的構成心中有數了。下面實現虎書中關于自動生成工具的算法。
最妙的是,即使開始時不理解這些算法的原理,也能夠實現之。實現后通過測試用例debug的過程,就很容易理解這些算法了。
LR(0)
首先有兩個基礎算法。Closure用于補全一個state。Goto用于找到一個state經過某個Node后會進入的下一個state。說是算法,其實卻非常簡單。雖然簡單,要想實現卻有很多額外的工作。例如比較兩個LR(0)Item的問題。
然后就是計算文法的狀態集和邊集(Goto動作集)的算法。這個是核心內容。
用此算法可以畫出文法3-8的狀態圖如下:
1 // 虎書中的文法3-8 2 <S> ::= "(" <L> ")" ; 3 <S> ::= "x" ; 4 <L> ::= <S> ; 5 <L> ::= <L> "," <S> ;
最后就是看圖作文——構造分析表了。有了分析表,語法分析器的核心部分就完成了。
SLR
在A->α.可以被歸約時,只在下一個單詞是Follow(A)時才進行歸約。看起來很有道理的樣子。
LR(1)
LR(1)項(A->α.β,x)指出,序列α在棧頂,且輸入中開頭的是可以從βx導出的符號。看起來更有道理的樣子。
LR(1)的state補全和轉換算法也要調整。
然后又是看圖作文。
LALR(1)
LALR(1)是對LA(1)的化簡處理。他占用空間比LR(1)少,但應用范圍也比LR(1)小了點。
為了實現LALR(1),也為了提高LR(1)的效率,必須優化LR(1)State,不能再單純模仿LR(0)State了。
文法的文法
輸入的是文法,輸出的是編譯器代碼,這個過程也可以用一個編譯器來實現。這個特別的編譯器所對應的文法(即描述文法的文法)如下:(此編譯器命名為ContextfreeGrammarCompiler)
1 // 文法是1到多個產生式 2 <Grammar> ::= <ProductionList> <Production> ; 3 // 產生式列表是0到多個產生式 4 <ProductionList> ::= <ProductionList> <Production> | null ; 5 // 產生式是左部+第一個候選式+若干右部 6 <Production> ::= <Vn> "::=" <Canditate> <RightPartList> ";" ; 7 // 候選式是1到多個結點 8 <Canditate> ::= <VList> <V> ; 9 // 結點列表是0到多個結點 10 <VList> ::= <VList> <V> | null ; 11 // 右部列表是0到多個候選式 12 <RightPartList> ::= "|" <Canditate> <RightPartList> | null ; 13 // 結點是非葉結點或葉結點 14 <V> ::= <Vn> | <Vt> ; 15 // 非葉結點是<>括起來的標識符 16 <Vn> ::= "<" identifier ">" ; 17 // 葉結點是用"引起來的字符串常量或下列內容:null, identifier, number, constString, userDefinedType 18 // 這幾個標識符就是ContextfreeGrammar的關鍵字 19 <Vt> ::= "null" | "identifier" | "number" | "constString" | "userDefinedType"| constString ;
設計
算法看起來還是很簡單的。即使不理解他也能實現他。但是實現過程中還是出現了不少的問題。
Hash緩存
如何判定兩個對象(LR(0)Item)相同?
這是個不可小覷的問題。
必須重寫==、!=運算符,override掉Equals和GetHashCode方法。這樣才能判定兩個內容相同但不是同一個對象的Item、State相等。
對于LR(0)Item的比較,在計算過程中有太多次,這對于實際應用(例如GLSL的文法)是不可接受的。所以必須緩存這類對象的HashCode。

1 /// <summary> 2 /// 緩存一個對象的hash code。提高比較(==、!=、Equals、GetHashCode、Compare)的效率。 3 /// </summary> 4 public abstract class HashCache : IComparable<HashCache> 5 { 6 public static bool operator ==(HashCache left, HashCache right) 7 { 8 object leftObj = left, rightObj = right; 9 if (leftObj == null) 10 { 11 if (rightObj == null) { return true; } 12 else { return false; } 13 } 14 else 15 { 16 if (rightObj == null) { return false; } 17 } 18 19 return left.Equals(right); 20 } 21 22 public static bool operator !=(HashCache left, HashCache right) 23 { 24 return !(left == right); 25 } 26 27 public override bool Equals(object obj) 28 { 29 HashCache p = obj as HashCache; 30 if ((System.Object)p == null) 31 { 32 return false; 33 } 34 35 return this.HashCode == p.HashCode; 36 } 37 38 public override int GetHashCode() 39 { 40 return this.HashCode; 41 } 42 43 private Func<HashCache, string> GetUniqueString; 44 45 private bool dirty = true; 46 47 /// <summary> 48 /// 指明此cache需要更新才能用。 49 /// </summary> 50 public void SetDirty() { this.dirty = true; } 51 52 private int hashCode; 53 /// <summary> 54 /// hash值。 55 /// </summary> 56 public int HashCode 57 { 58 get 59 { 60 if (this.dirty) 61 { 62 Update(); 63 64 this.dirty = false; 65 } 66 67 return this.hashCode; 68 } 69 } 70 71 private void Update() 72 { 73 string str = GetUniqueString(this); 74 int hashCode = str.GetHashCode(); 75 this.hashCode = hashCode; 76 #if DEBUG 77 this.uniqueString = str;// debug時可以看到可讀的信息 78 #else 79 this.uniqueString = string.Format("[{0}]", hashCode);// release后用最少的內存區分此對象 80 #endif 81 } 82 83 // TODO: 功能穩定后應精簡此字段的內容。 84 /// <summary> 85 /// 功能穩定后應精簡此字段的內容。 86 /// </summary> 87 private string uniqueString = string.Empty; 88 89 /// <summary> 90 /// 可唯一標識該對象的字符串。 91 /// 功能穩定后應精簡此字段的內容。 92 /// </summary> 93 public string UniqueString 94 { 95 get 96 { 97 if (this.dirty) 98 { 99 Update(); 100 101 this.dirty = false; 102 } 103 104 return this.uniqueString; 105 } 106 } 107 108 /// <summary> 109 /// 緩存一個對象的hash code。提高比較(==、!=、Equals、GetHashCode、Compare)的效率。 110 /// </summary> 111 /// <param name="GetUniqueString">獲取一個可唯一標識此對象的字符串。</param> 112 public HashCache(Func<HashCache, string> GetUniqueString) 113 { 114 if (GetUniqueString == null) { throw new ArgumentNullException(); } 115 116 this.GetUniqueString = GetUniqueString; 117 } 118 119 public override string ToString() 120 { 121 return this.UniqueString; 122 } 123 124 public int CompareTo(HashCache other) 125 { 126 if (other == null) { return 1; } 127 128 if (this.HashCode < other.HashCode)// 如果用this.HashCode - other.HashCode < 0,就會發生溢出,這個bug讓我折騰了近8個小時。 129 { return -1; } 130 else if (this.HashCode == other.HashCode) 131 { return 0; } 132 else 133 { return 1; } 134 } 135 }
有序集合
如何判定兩個集合(LR(0)State)相同?
一個LR(0)State是一個集合,集合內部的元素是沒有先后順序的區別的。但是為了比較兩個State,其內部元素必須是有序的(這就可以用二分法進行插入和比較)。否則比較兩個State會耗費太多時間。為了盡可能快地比較State,也要緩存State的HashCode。
有序集合的應用廣泛,因此獨立成類。

1 /// <summary> 2 /// 經過優化的列表。插入新元素用二分法,速度更快,但使用者不能控制元素的位置。 3 /// 對于LALR(1)Compiler項目,只需支持“添加元素”的功能,所以我沒有寫修改和刪除元素的功能。 4 /// </summary> 5 /// <typeparam name="T">元素也要支持快速比較。</typeparam> 6 public class OrderedCollection<T> : 7 HashCache // 快速比較兩個OrderedCollection<T>是否相同。 8 , IEnumerable<T> // 可枚舉該集合的元素。 9 where T : HashCache // 元素也要支持快速比較。 10 { 11 private List<T> list = new List<T>(); 12 private string seperator = Environment.NewLine; 13 14 /// <summary> 15 /// 這是一個只能添加元素的集合,其元素是有序的,是按二分法插入的。 16 /// 但是使用者不能控制元素的順序。 17 /// </summary> 18 /// <param name="separator">在Dump到流時用什么分隔符分隔各個元素?</param> 19 public OrderedCollection(string separator) 20 : base(GetUniqueString) 21 { 22 this.seperator = separator; 23 } 24 25 private static string GetUniqueString(HashCache cache) 26 { 27 OrderedCollection<T> obj = cache as OrderedCollection<T>; 28 return obj.Dump(); 29 } 30 31 public virtual bool TryInsert(T item) 32 { 33 if (this.list.TryBinaryInsert(item)) 34 { 35 this.SetDirty(); 36 return true; 37 } 38 else 39 { 40 return false; 41 } 42 } 43 44 public int IndexOf(T item) 45 { 46 return this.list.BinarySearch(item); 47 } 48 49 public bool Contains(T item) 50 { 51 int index = this.list.BinarySearch(item); 52 return (0 <= index && index < this.list.Count); 53 } 54 55 public T this[int index] { get { return this.list[index]; } } 56 57 public IEnumerator<T> GetEnumerator() 58 { 59 foreach (var item in this.list) 60 { 61 yield return item; 62 } 63 } 64 65 public int Count { get { return this.list.Count; } } 66 67 System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() 68 { 69 return this.GetEnumerator(); 70 } 71 72 public override void Dump(System.IO.TextWriter stream) 73 { 74 for (int i = 0; i < this.list.Count; i++) 75 { 76 this.list[i].Dump(stream); 77 if (i + 1 < this.list.Count) 78 { 79 stream.Write(this.seperator); 80 } 81 } 82 } 83 }
其中有個TryBinaryInsert的擴展方法,用于向 IList<T> 插入元素。這個方法我經過嚴格測試。如果有發現此方法的bug向我說明,我愿意獎勵¥100元。

1 /// <summary> 2 /// 嘗試插入新元素。如果存在相同的元素,就不插入,并返回false。否則返回true。 3 /// </summary> 4 /// <typeparam name="T"></typeparam> 5 /// <param name="list"></param> 6 /// <param name="item"></param> 7 /// <returns></returns> 8 public static bool TryBinaryInsert<T>(this List<T> list, T item) 9 where T : IComparable<T> 10 { 11 bool inserted = false; 12 13 if (list == null || item == null) { return inserted; } 14 15 int left = 0, right = list.Count - 1; 16 if (right < 0) 17 { 18 list.Add(item); 19 inserted = true; 20 } 21 else 22 { 23 while (left < right) 24 { 25 int mid = (left + right) / 2; 26 T current = list[mid]; 27 int result = item.CompareTo(current); 28 if (result < 0) 29 { right = mid; } 30 else if (result == 0) 31 { left = mid; right = mid; } 32 else 33 { left = mid + 1; } 34 } 35 { 36 T current = list[left]; 37 int result = item.CompareTo(current); 38 if (result < 0) 39 { 40 list.Insert(left, item); 41 inserted = true; 42 } 43 else if (result > 0) 44 { 45 list.Insert(left + 1, item); 46 inserted = true; 47 } 48 } 49 } 50 51 return inserted; 52 }
迭代到不動點
虎書中的算法大量使用了迭代到不動點的方式。
這個方法雖好,卻仍有可優化的余地。而且這屬于核心的計算過程,也應當優化。
優化方法也簡單,用一個Queue代替"迭代不動點"的方式即可。這就避免了很多不必要的重復計算。

1 /// <summary> 2 /// LR(0)的Closure操作。 3 /// 補全一個狀態。 4 /// </summary> 5 /// <param name="list"></param> 6 /// <param name="state"></param> 7 /// <returns></returns> 8 static LR0State Closure(this RegulationList list, LR0State state) 9 { 10 Queue<LR0Item> queue = new Queue<LR0Item>(); 11 foreach (var item in state) 12 { 13 queue.Enqueue(item); 14 } 15 while (queue.Count > 0) 16 { 17 LR0Item item = queue.Dequeue(); 18 TreeNodeType node = item.GetNodeNext2Dot(); 19 if (node == null) { continue; } 20 21 foreach (var regulation in list) 22 { 23 if (regulation.Left == node) 24 { 25 var newItem = new LR0Item(regulation, 0); 26 if (state.TryInsert(newItem)) 27 { 28 queue.Enqueue(newItem); 29 } 30 } 31 } 32 } 33 34 return state; 35 }
測試
以前我總喜歡做個非常精致的GUI來測試。現在發現沒那個必要,簡單的Console就可以了。
詳細的測試結果導出到文件里,可以慢慢查看分析。

1 =====> Processing .\TestCases\3_8.Grammar\3_8.Grammar 2 Get grammar from source code... 3 Dump 3_8.TokenList.log 4 Dump 3_8.Tree.log 5 Dump 3_8.FormatedGrammar.log 6 Dump 3_8.FIRST.log 7 Dump 3_8.FOLLOW.log 8 LR(0) parsing... 9 Dump 3_8.State.log 10 Dump 3_8.Edge.log 11 Dump LR(0) Compiler's source code... 12 SLR parsing... 13 Dump 3_8.State.log 14 Dump 3_8.Edge.log 15 Dump SLR Compiler's source code... 16 LALR(1) parsing... 17 Dump 3_8.State.log 18 Dump 3_8.Edge.log 19 Dump LALR(1) Compiler's source code... 20 LR(1) parsing... 21 Dump 3_8.State.log 22 Dump 3_8.Edge.log 23 Dump LR(1) Compiler's source code... 24 Compiling 3_8 of LR(0) version 25 Test Code 3_8 of LR(0) version 26 Compiling 3_8 of SLR version 27 Test Code 3_8 of SLR version 28 Compiling 3_8 of LALR(1) version 29 Test Code 3_8 of LALR(1) version 30 Compiling 3_8 of LR(1) version 31 Test Code 3_8 of LR(1) version 32 =====> Processing .\TestCases\Demo.Grammar\Demo.Grammar 33 Get grammar from source code... 34 Dump Demo.TokenList.log 35 Dump Demo.Tree.log 36 Dump Demo.FormatedGrammar.log 37 Dump Demo.FIRST.log 38 Dump Demo.FOLLOW.log 39 LR(0) parsing... 40 Dump Demo.State.log 41 Dump Demo.Edge.log 42 Dump LR(0) Compiler's source code... 43 【Exists 5 Conflicts in Parsingmap】 44 SLR parsing... 45 Dump Demo.State.log 46 Dump Demo.Edge.log 47 Dump SLR Compiler's source code... 48 【Exists 2 Conflicts in Parsingmap】 49 LALR(1) parsing... 50 Dump Demo.State.log 51 Dump Demo.Edge.log 52 Dump LALR(1) Compiler's source code... 53 【Exists 2 Conflicts in Parsingmap】 54 LR(1) parsing... 55 Dump Demo.State.log 56 Dump Demo.Edge.log 57 Dump LR(1) Compiler's source code... 58 【Exists 6 Conflicts in Parsingmap】 59 Compiling Demo of LR(0) version 60 Test Code Demo of LR(0) version 61 No need to Test Code with conflicts in SyntaxParser 62 Compiling Demo of SLR version 63 Test Code Demo of SLR version 64 No need to Test Code with conflicts in SyntaxParser 65 Compiling Demo of LALR(1) version 66 Test Code Demo of LALR(1) version 67 No need to Test Code with conflicts in SyntaxParser 68 Compiling Demo of LR(1) version 69 Test Code Demo of LR(1) version 70 No need to Test Code with conflicts in SyntaxParser 71 =====> Processing .\TestCases\GLSL.Grammar\GLSL.Grammar 72 Get grammar from source code... 73 Dump GLSL.TokenList.log 74 Dump GLSL.Tree.log 75 Dump GLSL.FormatedGrammar.log 76 Dump GLSL.FIRST.log 77 Dump GLSL.FOLLOW.log 78 LR(0) parsing...
初戰GLSL
測試完成后,就可以磨刀霍霍向GLSL文法了。由于GLSL文法比那些測試用的文法規模大的多,最初的版本里,計算過程居然花了好幾個小時。最終出現內存不足的Exception,不得不進行優化。
書中給的GLSL文法也是比較奇葩。或許是有什么特別的門道我沒有看懂吧。總之要降低難度先。
思路是,把grammar拆分成幾個部分,分別處理。
首先是Expression,這是其他部分的基礎。Expression部分是符合SLR的,非常好。
然后是statement,statement里有個else懸空的問題,幸好虎書里專門對這個問題做了說明,說可以容忍這個沖突,直接選擇Shift,忽略Reduction即可。也非常好。
然后是function_definition,出乎意料的是這部分也是符合SLR的。Nice。
最后是declaration,這里遇到了意想不到的大麻煩。GLSL文法里有個<TYPE_NAME>。這個東西我研究了好久,最后發現他代表的含義竟然是"在讀取源代碼時動態發現的用戶定義的類型"。比如 struct LightInfo{ … } ,他代表的是 LightInfo 這種類型。如果簡單的用identifier代替<TYPE_NAME>,文法就會產生無法解決的沖突。
我只好就此打住,先去實現另一種更強的分析方式——同步分析。
同步分析
現在,我的詞法分析、語法分析是分開進行的。詞法分析全部完成后,才把單詞流交給語法分析器進行分析。為了及時識別出用戶自定義的類型,這種方式完全不行,必須用"分析一個單詞->語法分析->可能的語義分析->分析一個單詞"這樣的同步分析方式。例如下面的代碼:
1 struct LightInfo { 2 vec4 Position; // Light position in eyecoords. 3 vec3 La; // Ambient light intensity 4 vec3 Ld; // Diffuse light intensity 5 vec3 Ls; // Specular light intensity 6 }; 7 uniform LightInfo Light;
在讀到第二個單詞"LightInfo"后,就必須立即將這個"LightInfo"類型加到用戶自定義的類型表里。這樣,在繼續讀到"uniform LightInfo Light"里的"LightInfo"時,詞法分析器才會知道"LightInfo"是一個userDefinedType,而不是一個隨隨便便的identifier。(對照上文的文法的文法,可見為實現一個看似不起眼的userDefinedType需要做多少事)
前端分析器(FrontEndParser)
既然要同步解析了,那么詞法分析和語法分析就是結結實實綁在一起的過程,所有用個FrontEndParser封裝一下就很有必要。其中的UserDefinedTypeCollection就用來記錄用戶自定義的類型。
1 /// <summary> 2 /// 前端分析器。 3 /// 詞法分析、語法分析、語義動作同步進行。 4 /// </summary> 5 public class FrontEndParser 6 { 7 private LexicalAnalyzer lexicalAnalyzer; 8 private LRSyntaxParser syntaxParser; 9 10 public FrontEndParser(LexicalAnalyzer lexicalAnalyzer, LRSyntaxParser syntaxParser) 11 { 12 this.lexicalAnalyzer = lexicalAnalyzer; 13 this.syntaxParser = syntaxParser; 14 } 15 16 /// <summary> 17 /// 詞法分析、語法分析、語義動作同步進行。 18 /// </summary> 19 /// <param name="sourceCode"></param> 20 /// <param name="tokenList"></param> 21 /// <returns></returns> 22 public SyntaxTree Parse(string sourceCode, out TokenList tokenList) 23 { 24 tokenList = new TokenList(); 25 UserDefinedTypeCollection userDefinedTypeTable = new UserDefinedTypeCollection(); 26 this.lexicalAnalyzer.StartAnalyzing(userDefinedTypeTable); 27 this.syntaxParser.StartParsing(userDefinedTypeTable); 28 foreach (var token in this.lexicalAnalyzer.AnalyzeStep(sourceCode)) 29 { 30 tokenList.Add(token); 31 this.syntaxParser.ParseStep(token); 32 } 33 34 SyntaxTree result = this.syntaxParser.StopParsing(); 35 return result; 36 } 37 }
同步詞法分析
詞法分析器需要每讀取一個單詞就返回之,等待語法分析、語義分析結束后再繼續。C#的 yield return 語法糖真是甜。

1 public abstract partial class LexicalAnalyzer 2 { 3 protected UserDefinedTypeCollection userDefinedTypeTable; 4 private bool inAnalyzingStep = false; 5 6 internal void StartAnalyzing(UserDefinedTypeCollection userDefinedTypeTable) 7 { 8 if (!inAnalyzingStep) 9 { 10 this.userDefinedTypeTable = userDefinedTypeTable; 11 inAnalyzingStep = true; 12 } 13 } 14 15 internal void StopAnalyzing() 16 { 17 if (inAnalyzingStep) 18 { 19 this.userDefinedTypeTable = null; 20 inAnalyzingStep = false; 21 } 22 } 23 24 /// <summary> 25 /// 每次分析都返回一個<see cref="Token"/>。 26 /// </summary> 27 /// <param name="sourceCode"></param> 28 /// <returns></returns> 29 internal IEnumerable<Token> AnalyzeStep(string sourceCode) 30 { 31 if (!inAnalyzingStep) { throw new Exception("Must invoke this.StartAnalyzing() first!"); } 32 33 if (!string.IsNullOrEmpty(sourceCode)) 34 { 35 var context = new AnalyzingContext(sourceCode); 36 int count = sourceCode.Length; 37 38 while (context.NextLetterIndex < count) 39 { 40 Token token = NextToken(context); 41 if (token != null) 42 { 43 yield return token; 44 } 45 } 46 } 47 48 this.StopAnalyzing(); 49 } 50 }
同步語法/語義分析
每次只獲取一個新單詞,據此執行可能的分析步驟。如果分析動作還綁定了語義分析(這里是為了找到自定義類型),也執行之。

1 public abstract partial class LRSyntaxParser 2 { 3 bool inParsingStep = false; 4 ParsingStepContext parsingStepContext; 5 6 internal void StartParsing(UserDefinedTypeCollection userDefinedTypeTable) 7 { 8 if (!inParsingStep) 9 { 10 LRParsingMap parsingMap = GetParsingMap(); 11 RegulationList grammar = GetGrammar(); 12 var tokenTypeConvertor = new TokenType2TreeNodeType(); 13 parsingStepContext = new ParsingStepContext( 14 grammar, parsingMap, tokenTypeConvertor, userDefinedTypeTable); 15 inParsingStep = true; 16 } 17 } 18 19 internal SyntaxTree StopParsing() 20 { 21 SyntaxTree result = null; 22 if (inParsingStep) 23 { 24 result = ParseStep(Token.endOfTokenList); 25 parsingStepContext.TokenList.RemoveAt(parsingStepContext.TokenList.Count - 1); 26 parsingStepContext = null; 27 inParsingStep = false; 28 } 29 30 return result; 31 } 32 /// <summary> 33 /// 獲取歸約動作對應的語義動作。 34 /// </summary> 35 /// <param name="parsingAction"></param> 36 /// <returns></returns> 37 protected virtual Action<ParsingStepContext> GetSemanticAction(LRParsingAction parsingAction) 38 { 39 return null; 40 } 41 42 internal SyntaxTree ParseStep(Token token) 43 { 44 if (!inParsingStep) { throw new Exception("Must invoke this.StartParsing() first!"); } 45 46 parsingStepContext.AddToken(token); 47 48 while (parsingStepContext.CurrentTokenIndex < parsingStepContext.TokenList.Count) 49 { 50 // 語法分析 51 TreeNodeType nodeType = parsingStepContext.CurrentNodeType(); 52 int stateId = parsingStepContext.StateIdStack.Peek(); 53 LRParsingAction action = parsingStepContext.ParsingMap.GetAction(stateId, nodeType); 54 int currentTokenIndex = action.Execute(parsingStepContext); 55 parsingStepContext.CurrentTokenIndex = currentTokenIndex; 56 // 語義分析 57 Action<ParsingStepContext> semanticAction = GetSemanticAction(action); 58 if (semanticAction != null) 59 { 60 semanticAction(parsingStepContext); 61 } 62 } 63 64 if (parsingStepContext.TreeStack.Count > 0) 65 { 66 return parsingStepContext.TreeStack.Peek(); 67 } 68 else 69 { 70 return new SyntaxTree(); 71 } 72 } 73 }
再戰GLSL
此時武器終于齊備。
文法->解析器
為下面的GLSL文法生成解析器,我的筆記本花費大概10分鐘左右。

1 <translation_unit> ::= <external_declaration> ; 2 <translation_unit> ::= <translation_unit> <external_declaration> ; 3 <external_declaration> ::= <function_definition> ; 4 <external_declaration> ::= <declaration> ; 5 <function_definition> ::= <function_prototype> <compound_statement_no_new_scope> ; 6 <variable_identifier> ::= identifier ; 7 <primary_expression> ::= <variable_identifier> ; 8 <primary_expression> ::= number ; 9 <primary_expression> ::= number ; 10 <primary_expression> ::= number ; 11 <primary_expression> ::= <BOOLCONSTANT> ; 12 <primary_expression> ::= number ; 13 <primary_expression> ::= "(" <expression> ")" ; 14 <BOOLCONSTANT> ::= "true" ; 15 <BOOLCONSTANT> ::= "false" ; 16 <postfix_expression> ::= <primary_expression> ; 17 <postfix_expression> ::= <postfix_expression> "[" <integer_expression> "]" ; 18 <postfix_expression> ::= <function_call> ; 19 <postfix_expression> ::= <postfix_expression> "." <FIELD_SELECTION> ; 20 <postfix_expression> ::= <postfix_expression> "++" ; 21 <postfix_expression> ::= <postfix_expression> "--" ; 22 <FIELD_SELECTION> ::= identifier ; 23 <integer_expression> ::= <expression> ; 24 <function_call> ::= <function_call_or_method> ; 25 <function_call_or_method> ::= <function_call_generic> ; 26 <function_call_generic> ::= <function_call_header_with_parameters> ")" ; 27 <function_call_generic> ::= <function_call_header_no_parameters> ")" ; 28 <function_call_header_no_parameters> ::= <function_call_header> "void" ; 29 <function_call_header_no_parameters> ::= <function_call_header> ; 30 <function_call_header_with_parameters> ::= <function_call_header> <assignment_expression> ; 31 <function_call_header_with_parameters> ::= <function_call_header_with_parameters> "," <assignment_expression> ; 32 <function_call_header> ::= <function_identifier> "(" ; 33 <function_identifier> ::= <type_specifier> ; 34 <function_identifier> ::= <postfix_expression> ; 35 <unary_expression> ::= <postfix_expression> ; 36 <unary_expression> ::= "++" <unary_expression> ; 37 <unary_expression> ::= "--" <unary_expression> ; 38 <unary_expression> ::= <unary_operator> <unary_expression> ; 39 <unary_operator> ::= "+" ; 40 <unary_operator> ::= "-" ; 41 <unary_operator> ::= "!" ; 42 <unary_operator> ::= "~" ; 43 <multiplicative_expression> ::= <unary_expression> ; 44 <multiplicative_expression> ::= <multiplicative_expression> "*" <unary_expression> ; 45 <multiplicative_expression> ::= <multiplicative_expression> "/" <unary_expression> ; 46 <multiplicative_expression> ::= <multiplicative_expression> "%" <unary_expression> ; 47 <additive_expression> ::= <multiplicative_expression> ; 48 <additive_expression> ::= <additive_expression> "+" <multiplicative_expression> ; 49 <additive_expression> ::= <additive_expression> "-" <multiplicative_expression> ; 50 <shift_expression> ::= <additive_expression> ; 51 <shift_expression> ::= <shift_expression> "<<" <additive_expression> ; 52 <shift_expression> ::= <shift_expression> ">>" <additive_expression> ; 53 <relational_expression> ::= <shift_expression> ; 54 <relational_expression> ::= <relational_expression> "<" <shift_expression> ; 55 <relational_expression> ::= <relational_expression> ">" <shift_expression> ; 56 <relational_expression> ::= <relational_expression> "<=" <shift_expression> ; 57 <relational_expression> ::= <relational_expression> ">=" <shift_expression> ; 58 <equality_expression> ::= <relational_expression> ; 59 <equality_expression> ::= <equality_expression> "==" <relational_expression> ; 60 <equality_expression> ::= <equality_expression> "!=" <relational_expression> ; 61 <and_expression> ::= <equality_expression> ; 62 <and_expression> ::= <and_expression> "&" <equality_expression> ; 63 <exclusive_or_expression> ::= <and_expression> ; 64 <exclusive_or_expression> ::= <exclusive_or_expression> "^" <and_expression> ; 65 <inclusive_or_expression> ::= <exclusive_or_expression> ; 66 <inclusive_or_expression> ::= <inclusive_or_expression> "|" <exclusive_or_expression> ; 67 <logical_and_expression> ::= <inclusive_or_expression> ; 68 <logical_and_expression> ::= <logical_and_expression> "&&" <inclusive_or_expression> ; 69 <logical_xor_expression> ::= <logical_and_expression> ; 70 <logical_xor_expression> ::= <logical_xor_expression> "^^" <logical_and_expression> ; 71 <logical_or_expression> ::= <logical_xor_expression> ; 72 <logical_or_expression> ::= <logical_or_expression> "||" <logical_xor_expression> ; 73 <conditional_expression> ::= <logical_or_expression> ; 74 <conditional_expression> ::= <logical_or_expression> "?" <expression> ":" <assignment_expression> ; 75 <assignment_expression> ::= <conditional_expression> ; 76 <assignment_expression> ::= <unary_expression> <assignment_operator> <assignment_expression> ; 77 <assignment_operator> ::= "=" ; 78 <assignment_operator> ::= "*=" ; 79 <assignment_operator> ::= "/=" ; 80 <assignment_operator> ::= "%=" ; 81 <assignment_operator> ::= "+=" ; 82 <assignment_operator> ::= "-=" ; 83 <assignment_operator> ::= "<<=" ; 84 <assignment_operator> ::= ">>=" ; 85 <assignment_operator> ::= "&=" ; 86 <assignment_operator> ::= "^=" ; 87 <assignment_operator> ::= "|=" ; 88 <expression> ::= <assignment_expression> ; 89 <expression> ::= <expression> "," <assignment_expression> ; 90 <constant_expression> ::= <conditional_expression> ; 91 <declaration> ::= <function_prototype> ";" ; 92 <declaration> ::= <init_declarator_list> ";" ; 93 <declaration> ::= "precision" <precision_qualifier> <type_specifier> ";" ; 94 <declaration> ::= <type_qualifier> identifier "{" <struct_declaration_list> "}" ";" ; 95 <declaration> ::= <type_qualifier> identifier "{" <struct_declaration_list> "}" identifier ";" ; 96 <declaration> ::= <type_qualifier> identifier "{" <struct_declaration_list> "}" identifier <array_specifier> ";" ; 97 <declaration> ::= <type_qualifier> ";" ; 98 <declaration> ::= <type_qualifier> identifier ";" ; 99 <declaration> ::= <type_qualifier> identifier <identifier_list> ";" ; 100 <identifier_list> ::= "," identifier ; 101 <identifier_list> ::= <identifier_list> "," identifier ; 102 <function_prototype> ::= <function_declarator> ")" ; 103 <function_declarator> ::= <function_header> ; 104 <function_declarator> ::= <function_header_with_parameters> ; 105 <function_header_with_parameters> ::= <function_header> <parameter_declaration> ; 106 <function_header_with_parameters> ::= <function_header_with_parameters> "," <parameter_declaration> ; 107 <function_header> ::= <fully_specified_type> identifier "(" ; 108 <parameter_declarator> ::= <type_specifier> identifier ; 109 <parameter_declarator> ::= <type_specifier> identifier <array_specifier> ; 110 <parameter_declaration> ::= <type_qualifier> <parameter_declarator> ; 111 <parameter_declaration> ::= <parameter_declarator> ; 112 <parameter_declaration> ::= <type_qualifier> <parameter_type_specifier> ; 113 <parameter_declaration> ::= <parameter_type_specifier> ; 114 <parameter_type_specifier> ::= <type_specifier> ; 115 <init_declarator_list> ::= <single_declaration> ; 116 <init_declarator_list> ::= <init_declarator_list> "," identifier ; 117 <init_declarator_list> ::= <init_declarator_list> "," identifier <array_specifier> ; 118 <init_declarator_list> ::= <init_declarator_list> "," identifier <array_specifier> "=" <initializer> ; 119 <init_declarator_list> ::= <init_declarator_list> "," identifier "=" <initializer> ; 120 <single_declaration> ::= <fully_specified_type> ; 121 <single_declaration> ::= <fully_specified_type> identifier ; 122 <single_declaration> ::= <fully_specified_type> identifier <array_specifier> ; 123 <single_declaration> ::= <fully_specified_type> identifier <array_specifier> "=" <initializer> ; 124 <single_declaration> ::= <fully_specified_type> identifier "=" <initializer> ; 125 <fully_specified_type> ::= <type_specifier> ; 126 <fully_specified_type> ::= <type_qualifier> <type_specifier> ; 127 <invariant_qualifier> ::= "invariant" ; 128 <interpolation_qualifier> ::= "smooth" ; 129 <interpolation_qualifier> ::= "flat" ; 130 <interpolation_qualifier> ::= "noperspective" ; 131 <layout_qualifier> ::= "layout" "(" <layout_qualifier_id_list> ")" ; 132 <layout_qualifier_id_list> ::= <layout_qualifier_id> ; 133 <layout_qualifier_id_list> ::= <layout_qualifier_id_list> "," <layout_qualifier_id> ; 134 <layout_qualifier_id> ::= identifier ; 135 <layout_qualifier_id> ::= identifier "=" number ; 136 <precise_qualifier> ::= "precise" ; 137 <type_qualifier> ::= <single_type_qualifier> ; 138 <type_qualifier> ::= <type_qualifier> <single_type_qualifier> ; 139 <single_type_qualifier> ::= <storage_qualifier> ; 140 <single_type_qualifier> ::= <layout_qualifier> ; 141 <single_type_qualifier> ::= <precision_qualifier> ; 142 <single_type_qualifier> ::= <interpolation_qualifier> ; 143 <single_type_qualifier> ::= <invariant_qualifier> ; 144 <single_type_qualifier> ::= <precise_qualifier> ; 145 <storage_qualifier> ::= "const" ; 146 <storage_qualifier> ::= "inout" ; 147 <storage_qualifier> ::= "in" ; 148 <storage_qualifier> ::= "out" ; 149 <storage_qualifier> ::= "centroid" ; 150 <storage_qualifier> ::= "patch" ; 151 <storage_qualifier> ::= "sample" ; 152 <storage_qualifier> ::= "uniform" ; 153 <storage_qualifier> ::= "buffer" ; 154 <storage_qualifier> ::= "shared" ; 155 <storage_qualifier> ::= "coherent" ; 156 <storage_qualifier> ::= "volatile" ; 157 <storage_qualifier> ::= "restrict" ; 158 <storage_qualifier> ::= "readonly" ; 159 <storage_qualifier> ::= "writeonly" ; 160 <storage_qualifier> ::= "subroutine" ; 161 <storage_qualifier> ::= "subroutine" "(" <type_name_list> ")" ; 162 <type_name_list> ::= userDefinedType ; 163 <type_name_list> ::= <type_name_list> "," userDefinedType ; 164 <type_specifier> ::= <type_specifier_nonarray> ; 165 <type_specifier> ::= <type_specifier_nonarray> <array_specifier> ; 166 <array_specifier> ::= "[" "]" ; 167 <array_specifier> ::= "[" <constant_expression> "]" ; 168 <array_specifier> ::= <array_specifier> "[" "]" ; 169 <array_specifier> ::= <array_specifier> "[" <constant_expression> "]" ; 170 <type_specifier_nonarray> ::= "void" ; 171 <type_specifier_nonarray> ::= "float" ; 172 <type_specifier_nonarray> ::= "double" ; 173 <type_specifier_nonarray> ::= "int" ; 174 <type_specifier_nonarray> ::= "uint" ; 175 <type_specifier_nonarray> ::= "bool" ; 176 <type_specifier_nonarray> ::= "vec2" ; 177 <type_specifier_nonarray> ::= "vec3" ; 178 <type_specifier_nonarray> ::= "vec4" ; 179 <type_specifier_nonarray> ::= "dvec2" ; 180 <type_specifier_nonarray> ::= "dvec3" ; 181 <type_specifier_nonarray> ::= "dvec4" ; 182 <type_specifier_nonarray> ::= "bvec2" ; 183 <type_specifier_nonarray> ::= "bvec3" ; 184 <type_specifier_nonarray> ::= "bvec4" ; 185 <type_specifier_nonarray> ::= "ivec2" ; 186 <type_specifier_nonarray> ::= "ivec3" ; 187 <type_specifier_nonarray> ::= "ivec4" ; 188 <type_specifier_nonarray> ::= "uvec2" ; 189 <type_specifier_nonarray> ::= "uvec3" ; 190 <type_specifier_nonarray> ::= "uvec4" ; 191 <type_specifier_nonarray> ::= "mat2" ; 192 <type_specifier_nonarray> ::= "mat3" ; 193 <type_specifier_nonarray> ::= "mat4" ; 194 <type_specifier_nonarray> ::= "mat2x2" ; 195 <type_specifier_nonarray> ::= "mat2x3" ; 196 <type_specifier_nonarray> ::= "mat2x4" ; 197 <type_specifier_nonarray> ::= "mat3x2" ; 198 <type_specifier_nonarray> ::= "mat3x3" ; 199 <type_specifier_nonarray> ::= "mat3x4" ; 200 <type_specifier_nonarray> ::= "mat4x2" ; 201 <type_specifier_nonarray> ::= "mat4x3" ; 202 <type_specifier_nonarray> ::= "mat4x4" ; 203 <type_specifier_nonarray> ::= "dmat2" ; 204 <type_specifier_nonarray> ::= "dmat3" ; 205 <type_specifier_nonarray> ::= "dmat4" ; 206 <type_specifier_nonarray> ::= "dmat2x2" ; 207 <type_specifier_nonarray> ::= "dmat2x3" ; 208 <type_specifier_nonarray> ::= "dmat2x4" ; 209 <type_specifier_nonarray> ::= "dmat3x2" ; 210 <type_specifier_nonarray> ::= "dmat3x3" ; 211 <type_specifier_nonarray> ::= "dmat3x4" ; 212 <type_specifier_nonarray> ::= "dmat4x2" ; 213 <type_specifier_nonarray> ::= "dmat4x3" ; 214 <type_specifier_nonarray> ::= "dmat4x4" ; 215 <type_specifier_nonarray> ::= "atomic_uint" ; 216 <type_specifier_nonarray> ::= "sampler1D" ; 217 <type_specifier_nonarray> ::= "sampler2D" ; 218 <type_specifier_nonarray> ::= "sampler3D" ; 219 <type_specifier_nonarray> ::= "samplerCube" ; 220 <type_specifier_nonarray> ::= "sampler1DShadow" ; 221 <type_specifier_nonarray> ::= "sampler2DShadow" ; 222 <type_specifier_nonarray> ::= "samplerCubeShadow" ; 223 <type_specifier_nonarray> ::= "sampler1DArray" ; 224 <type_specifier_nonarray> ::= "sampler2DArray" ; 225 <type_specifier_nonarray> ::= "sampler1DArrayShadow" ; 226 <type_specifier_nonarray> ::= "sampler2DArrayShadow" ; 227 <type_specifier_nonarray> ::= "samplerCubeArray" ; 228 <type_specifier_nonarray> ::= "samplerCubeArrayShadow" ; 229 <type_specifier_nonarray> ::= "isampler1D" ; 230 <type_specifier_nonarray> ::= "isampler2D" ; 231 <type_specifier_nonarray> ::= "isampler3D" ; 232 <type_specifier_nonarray> ::= "isamplerCube" ; 233 <type_specifier_nonarray> ::= "isampler1DArray" ; 234 <type_specifier_nonarray> ::= "isampler2DArray" ; 235 <type_specifier_nonarray> ::= "isamplerCubeArray" ; 236 <type_specifier_nonarray> ::= "usampler1D" ; 237 <type_specifier_nonarray> ::= "usampler2D" ; 238 <type_specifier_nonarray> ::= "usampler3D" ; 239 <type_specifier_nonarray> ::= "usamplerCube" ; 240 <type_specifier_nonarray> ::= "usampler1DArray" ; 241 <type_specifier_nonarray> ::= "usampler2DArray" ; 242 <type_specifier_nonarray> ::= "usamplerCubeArray" ; 243 <type_specifier_nonarray> ::= "sampler2DRect" ; 244 <type_specifier_nonarray> ::= "sampler2DRectShadow" ; 245 <type_specifier_nonarray> ::= "isampler2DRect" ; 246 <type_specifier_nonarray> ::= "usampler2DRect" ; 247 <type_specifier_nonarray> ::= "samplerBuffer" ; 248 <type_specifier_nonarray> ::= "isamplerBuffer" ; 249 <type_specifier_nonarray> ::= "usamplerBuffer" ; 250 <type_specifier_nonarray> ::= "sampler2DMS" ; 251 <type_specifier_nonarray> ::= "isampler2DMS" ; 252 <type_specifier_nonarray> ::= "usampler2DMS" ; 253 <type_specifier_nonarray> ::= "sampler2DMSArray" ; 254 <type_specifier_nonarray> ::= "isampler2DMSArray" ; 255 <type_specifier_nonarray> ::= "usampler2DMSArray" ; 256 <type_specifier_nonarray> ::= "image1D" ; 257 <type_specifier_nonarray> ::= "iimage1D" ; 258 <type_specifier_nonarray> ::= "uimage1D" ; 259 <type_specifier_nonarray> ::= "image2D" ; 260 <type_specifier_nonarray> ::= "iimage2D" ; 261 <type_specifier_nonarray> ::= "uimage2D" ; 262 <type_specifier_nonarray> ::= "image3D" ; 263 <type_specifier_nonarray> ::= "iimage3D" ; 264 <type_specifier_nonarray> ::= "uimage3D" ; 265 <type_specifier_nonarray> ::= "image2DRect" ; 266 <type_specifier_nonarray> ::= "iimage2DRect" ; 267 <type_specifier_nonarray> ::= "uimage2DRect" ; 268 <type_specifier_nonarray> ::= "imageCube" ; 269 <type_specifier_nonarray> ::= "iimageCube" ; 270 <type_specifier_nonarray> ::= "uimageCube" ; 271 <type_specifier_nonarray> ::= "imageBuffer" ; 272 <type_specifier_nonarray> ::= "iimageBuffer" ; 273 <type_specifier_nonarray> ::= "uimageBuffer" ; 274 <type_specifier_nonarray> ::= "image1DArray" ; 275 <type_specifier_nonarray> ::= "iimage1DArray" ; 276 <type_specifier_nonarray> ::= "uimage1DArray" ; 277 <type_specifier_nonarray> ::= "image2DArray" ; 278 <type_specifier_nonarray> ::= "iimage2DArray" ; 279 <type_specifier_nonarray> ::= "uimage2DArray" ; 280 <type_specifier_nonarray> ::= "imageCubeArray" ; 281 <type_specifier_nonarray> ::= "iimageCubeArray" ; 282 <type_specifier_nonarray> ::= "uimageCubeArray" ; 283 <type_specifier_nonarray> ::= "image2DMS" ; 284 <type_specifier_nonarray> ::= "iimage2DMS" ; 285 <type_specifier_nonarray> ::= "uimage2DMS" ; 286 <type_specifier_nonarray> ::= "image2DMSArray" ; 287 <type_specifier_nonarray> ::= "iimage2DMSArray" ; 288 <type_specifier_nonarray> ::= "uimage2DMSArray" ; 289 <type_specifier_nonarray> ::= <struct_specifier> ; 290 <type_specifier_nonarray> ::= userDefinedType ; 291 <precision_qualifier> ::= "high_precision" ; 292 <precision_qualifier> ::= "medium_precision" ; 293 <precision_qualifier> ::= "low_precision" ; 294 // semantic parsing needed 295 <struct_specifier> ::= "struct" identifier "{" <struct_declaration_list> "}" ; 296 <struct_specifier> ::= "struct" "{" <struct_declaration_list> "}" ; 297 <struct_declaration_list> ::= <struct_declaration> ; 298 <struct_declaration_list> ::= <struct_declaration_list> <struct_declaration> ; 299 <struct_declaration> ::= <type_specifier> <struct_declarator_list> ";" ; 300 <struct_declaration> ::= <type_qualifier> <type_specifier> <struct_declarator_list> ";" ; 301 <struct_declarator_list> ::= <struct_declarator> ; 302 <struct_declarator_list> ::= <struct_declarator_list> "," <struct_declarator> ; 303 <struct_declarator> ::= identifier ; 304 <struct_declarator> ::= identifier <array_specifier> ; 305 <initializer> ::= <assignment_expression> ; 306 <initializer> ::= "{" <initializer_list> "}" ; 307 <initializer> ::= "{" <initializer_list> "," "}" ; 308 <initializer_list> ::= <initializer> ; 309 <initializer_list> ::= <initializer_list> "," <initializer> ; 310 <declaration_statement> ::= <declaration> ; 311 <statement> ::= <compound_statement> ; 312 <statement> ::= <simple_statement> ; 313 <simple_statement> ::= <declaration_statement> ; 314 <simple_statement> ::= <expression_statement> ; 315 <simple_statement> ::= <selection_statement> ; 316 <simple_statement> ::= <switch_statement> ; 317 <simple_statement> ::= <case_label> ; 318 <simple_statement> ::= <iteration_statement> ; 319 <simple_statement> ::= <jump_statement> ; 320 <compound_statement> ::= "{" "}" ; 321 <compound_statement> ::= "{" <statement_list> "}" ; 322 <statement_no_new_scope> ::= <compound_statement_no_new_scope> ; 323 <statement_no_new_scope> ::= <simple_statement> ; 324 <compound_statement_no_new_scope> ::= "{" "}" ; 325 <compound_statement_no_new_scope> ::= "{" <statement_list> "}" ; 326 <statement_list> ::= <statement> ; 327 <statement_list> ::= <statement_list> <statement> ; 328 <expression_statement> ::= ";" ; 329 <expression_statement> ::= <expression> ";" ; 330 <selection_statement> ::= "if" "(" <expression> ")" <selection_rest_statement> ; 331 <selection_rest_statement> ::= <statement> "else" <statement> ; 332 <selection_rest_statement> ::= <statement> ; 333 <condition> ::= <expression> ; 334 <condition> ::= <fully_specified_type> identifier "=" <initializer> ; 335 <switch_statement> ::= "switch" "(" <expression> ")" "{" <switch_statement_list> "}" ; 336 <switch_statement_list> ::= <statement_list> ; 337 <case_label> ::= "case" <expression> ":" ; 338 <case_label> ::= "default" ":" ; 339 <iteration_statement> ::= "while" "(" <condition> ")" <statement_no_new_scope> ; 340 <iteration_statement> ::= "do" <statement> "while" "(" <expression> ")" ";" ; 341 <iteration_statement> ::= "for" "(" <for_init_statement> <for_rest_statement> ")" <statement_no_new_scope> ; 342 <for_init_statement> ::= <expression_statement> ; 343 <for_init_statement> ::= <declaration_statement> ; 344 <conditionopt> ::= <condition> ; 345 <for_rest_statement> ::= <conditionopt> ";" ; 346 <for_rest_statement> ::= <conditionopt> ";" <expression> ; 347 <jump_statement> ::= "continue" ";" ; 348 <jump_statement> ::= "break" ";" ; 349 <jump_statement> ::= "return" ";" ; 350 <jump_statement> ::= "return" <expression> ";" ; 351 <jump_statement> ::= "discard" ";" ;
補充語義分析片段
語義分析是不能自動生成的。此時需要的語義分析,只有找到自定義類型這一個目的。
在GLSL文法里,是下面這個state需要進行語義分析。此時,分析器剛剛讀到用戶自定義的類型名字(identifier)。
1 State [172]: 2 <struct_specifier> ::= "struct" identifier . "{" <struct_declaration_list> "}" ;, identifier "," ")" "(" ";" "["
語義分析動作內容則十分簡單,將identifier的內容作為自定義類型名加入UserDefinedTypeTable即可。
1 // State [172]: 2 // <struct_specifier> ::= "struct" identifier . "{" <struct_declaration_list> "}" ;, identifier "," ")" "(" ";" "[" 3 static void state172_struct_specifier(ParsingStepContext context) 4 { 5 SyntaxTree tree = context.TreeStack.Peek(); 6 string name = tree.NodeType.Content; 7 context.UserDefinedTypeTable.TryInsert(new UserDefinedType(name)); 8 }
當然,別忘了在初始化時將此動作綁定到對應的state上。
1 static GLSLSyntaxParser() 2 { 3 // 將語義動作綁定的到state上。 4 dict.Add(new LR1ShiftInAction(172), state172_struct_specifier); 5 } 6 static Dictionary<LRParsingAction, Action<ParsingStepContext>> dict = 7 new Dictionary<LRParsingAction, Action<ParsingStepContext>>(); 8 9 protected override Action<ParsingStepContext> GetSemanticAction(LRParsingAction parsingAction) 10 { 11 Action<ParsingStepContext> semanticAction = null; 12 if (dict.TryGetValue(parsingAction, out semanticAction)) 13 { 14 return semanticAction; 15 } 16 else 17 { 18 return null; 19 } 20 }
userDefinedType
下面是上文的LightInfo代碼片段的詞法分析結果。請注意在定義LightInfo時,他是個identifier,定義之后,就是一個userDefinedType類型的單詞了。
1 TokenList[Count: 21] 2 [[struct](__struct)[struct]]$[Ln:1, Col:1] 3 [[LightInfo](identifier)[LightInfo]]$[Ln:1, Col:8] 4 [[{](__left_brace)["{"]]$[Ln:1, Col:18] 5 [[vec4](__vec4)[vec4]]$[Ln:2, Col:5] 6 [[Position](identifier)[Position]]$[Ln:2, Col:10] 7 [[;](__semicolon)[";"]]$[Ln:2, Col:18] 8 [[vec3](__vec3)[vec3]]$[Ln:3, Col:5] 9 [[La](identifier)[La]]$[Ln:3, Col:10] 10 [[;](__semicolon)[";"]]$[Ln:3, Col:12] 11 [[vec3](__vec3)[vec3]]$[Ln:4, Col:5] 12 [[Ld](identifier)[Ld]]$[Ln:4, Col:10] 13 [[;](__semicolon)[";"]]$[Ln:4, Col:12] 14 [[vec3](__vec3)[vec3]]$[Ln:5, Col:5] 15 [[Ls](identifier)[Ls]]$[Ln:5, Col:10] 16 [[;](__semicolon)[";"]]$[Ln:5, Col:12] 17 [[}](__right_brace)["}"]]$[Ln:6, Col:1] 18 [[;](__semicolon)[";"]]$[Ln:6, Col:2] 19 [[uniform](__uniform)[uniform]]$[Ln:7, Col:1] 20 [[LightInfo](__userDefinedType)[LightInfo]]$[Ln:7, Col:9] 21 [[Light](identifier)[Light]]$[Ln:7, Col:19] 22 [[;](__semicolon)[";"]]$[Ln:7, Col:24]
下面是LightInfo片段的語法樹。你可以看到單詞的類型對照著葉結點的類型。

1 (__translation_unit)[<translation_unit>][translation_unit] 2 ├─(__translation_unit)[<translation_unit>][translation_unit] 3 │ └─(__external_declaration)[<external_declaration>][external_declaration] 4 │ └─(__declaration)[<declaration>][declaration] 5 │ ├─(__init_declarator_list)[<init_declarator_list>][init_declarator_list] 6 │ │ └─(__single_declaration)[<single_declaration>][single_declaration] 7 │ │ └─(__fully_specified_type)[<fully_specified_type>][fully_specified_type] 8 │ │ └─(__type_specifier)[<type_specifier>][type_specifier] 9 │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 10 │ │ └─(__struct_specifier)[<struct_specifier>][struct_specifier] 11 │ │ ├─(__structLeave__)[struct][struct] 12 │ │ ├─(identifierLeave__)[LightInfo][LightInfo] 13 │ │ ├─(__left_braceLeave__)["{"][{] 14 │ │ ├─(__struct_declaration_list)[<struct_declaration_list>][struct_declaration_list] 15 │ │ │ ├─(__struct_declaration_list)[<struct_declaration_list>][struct_declaration_list] 16 │ │ │ │ ├─(__struct_declaration_list)[<struct_declaration_list>][struct_declaration_list] 17 │ │ │ │ │ ├─(__struct_declaration_list)[<struct_declaration_list>][struct_declaration_list] 18 │ │ │ │ │ │ └─(__struct_declaration)[<struct_declaration>][struct_declaration] 19 │ │ │ │ │ │ ├─(__type_specifier)[<type_specifier>][type_specifier] 20 │ │ │ │ │ │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 21 │ │ │ │ │ │ │ └─(__vec4Leave__)[vec4][vec4] 22 │ │ │ │ │ │ ├─(__struct_declarator_list)[<struct_declarator_list>][struct_declarator_list] 23 │ │ │ │ │ │ │ └─(__struct_declarator)[<struct_declarator>][struct_declarator] 24 │ │ │ │ │ │ │ └─(identifierLeave__)[Position][Position] 25 │ │ │ │ │ │ └─(__semicolonLeave__)[";"][;] 26 │ │ │ │ │ └─(__struct_declaration)[<struct_declaration>][struct_declaration] 27 │ │ │ │ │ ├─(__type_specifier)[<type_specifier>][type_specifier] 28 │ │ │ │ │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 29 │ │ │ │ │ │ └─(__vec3Leave__)[vec3][vec3] 30 │ │ │ │ │ ├─(__struct_declarator_list)[<struct_declarator_list>][struct_declarator_list] 31 │ │ │ │ │ │ └─(__struct_declarator)[<struct_declarator>][struct_declarator] 32 │ │ │ │ │ │ └─(identifierLeave__)[La][La] 33 │ │ │ │ │ └─(__semicolonLeave__)[";"][;] 34 │ │ │ │ └─(__struct_declaration)[<struct_declaration>][struct_declaration] 35 │ │ │ │ ├─(__type_specifier)[<type_specifier>][type_specifier] 36 │ │ │ │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 37 │ │ │ │ │ └─(__vec3Leave__)[vec3][vec3] 38 │ │ │ │ ├─(__struct_declarator_list)[<struct_declarator_list>][struct_declarator_list] 39 │ │ │ │ │ └─(__struct_declarator)[<struct_declarator>][struct_declarator] 40 │ │ │ │ │ └─(identifierLeave__)[Ld][Ld] 41 │ │ │ │ └─(__semicolonLeave__)[";"][;] 42 │ │ │ └─(__struct_declaration)[<struct_declaration>][struct_declaration] 43 │ │ │ ├─(__type_specifier)[<type_specifier>][type_specifier] 44 │ │ │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 45 │ │ │ │ └─(__vec3Leave__)[vec3][vec3] 46 │ │ │ ├─(__struct_declarator_list)[<struct_declarator_list>][struct_declarator_list] 47 │ │ │ │ └─(__struct_declarator)[<struct_declarator>][struct_declarator] 48 │ │ │ │ └─(identifierLeave__)[Ls][Ls] 49 │ │ │ └─(__semicolonLeave__)[";"][;] 50 │ │ └─(__right_braceLeave__)["}"][}] 51 │ └─(__semicolonLeave__)[";"][;] 52 └─(__external_declaration)[<external_declaration>][external_declaration] 53 └─(__declaration)[<declaration>][declaration] 54 ├─(__init_declarator_list)[<init_declarator_list>][init_declarator_list] 55 │ └─(__single_declaration)[<single_declaration>][single_declaration] 56 │ ├─(__fully_specified_type)[<fully_specified_type>][fully_specified_type] 57 │ │ ├─(__type_qualifier)[<type_qualifier>][type_qualifier] 58 │ │ │ └─(__single_type_qualifier)[<single_type_qualifier>][single_type_qualifier] 59 │ │ │ └─(__storage_qualifier)[<storage_qualifier>][storage_qualifier] 60 │ │ │ └─(__uniformLeave__)[uniform][uniform] 61 │ │ └─(__type_specifier)[<type_specifier>][type_specifier] 62 │ │ └─(__type_specifier_nonarray)[<type_specifier_nonarray>][type_specifier_nonarray] 63 │ │ └─(__userDefinedTypeLeave__)[LightInfo][LightInfo] 64 │ └─(identifierLeave__)[Light][Light] 65 └─(__semicolonLeave__)[";"][;]
再加上其他的測試用例,這個GLSL解析器終于實現了。
最終目的
解析GLSL源代碼,是為了獲取其中的信息(都有哪些in/out/uniform等)。現在語法樹已經有了,剩下的就是遍歷此樹的事了。不再詳述。
故事
故事,其實是事故。由于心急,此項目第一次實現時出現了幾乎無法fix的bug。于是重寫了一遍,這次一步一步走,終于成功了。
LALR(1)State
LALR(1)State集合在嘗試插入一個新的State時,如果已有在LALR(1)意義上"相等"的狀態,仍舊要嘗試將新state的LookAhead列表插入已有狀態。
否則,下面的例子就顯示了文法3-8在忽視了這一點時的state集合與正確的state集合的差別(少了一些LookAhead項)。

1 State [1]: 2 <S> ::= . "(" <L> ")" ;, "$" 3 <S'> ::= . <S> "$" ;, "$" 4 <S> ::= . "x" ;, "$" 5 State [8]: 6 <S> ::= "(" <L> ")" . ;, "$" 7 State [4]: 8 <S> ::= "x" . ;, "$" 9 State [6]: 10 <L> ::= <S> . ;, ","")" 11 State [9]: 12 <L> ::= <L> "," <S> . ;, ","")" 13 State [5]: 14 <L> ::= <L> . "," <S> ;, ","")" 15 <S> ::= "(" <L> . ")" ;, "$" 16 State [7]: 17 <S> ::= . "(" <L> ")" ;, ","")" 18 <S> ::= . "x" ;, ","")" 19 <L> ::= <L> "," . <S> ;, ","")" 20 State [2]: 21 <S> ::= . "(" <L> ")" ;, ","")" 22 <S> ::= . "x" ;, ","")" 23 <S> ::= "(" . <L> ")" ;, "$" 24 <L> ::= . <L> "," <S> ;, ","")" 25 <L> ::= . <S> ;, ","")" 26 State [3]: 27 <S'> ::= <S> . "$" ;, "$"

1 State [1]: 2 <S> ::= . "(" <L> ")" ;, "$" 3 <S'> ::= . <S> "$" ;, "$" 4 <S> ::= . "x" ;, "$" 5 State [8]: 6 <S> ::= "(" <L> ")" . ;, "$"","")" 7 State [4]: 8 <S> ::= "x" . ;, "$"","")" 9 State [6]: 10 <L> ::= <S> . ;, ","")" 11 State [9]: 12 <L> ::= <L> "," <S> . ;, ","")" 13 State [5]: 14 <L> ::= <L> . "," <S> ;, ","")" 15 <S> ::= "(" <L> . ")" ;, "$"","")" 16 State [7]: 17 <S> ::= . "(" <L> ")" ;, ","")" 18 <S> ::= . "x" ;, ","")" 19 <L> ::= <L> "," . <S> ;, ","")" 20 State [2]: 21 <S> ::= . "(" <L> ")" ;, ","")" 22 <S> ::= . "x" ;, ","")" 23 <S> ::= "(" . <L> ")" ;, "$"","")" 24 <L> ::= . <L> "," <S> ;, ","")" 25 <L> ::= . <S> ;, ","")" 26 State [3]: 27 <S'> ::= <S> . "$" ;, "$"
CodeDom
CodeDom不支持readonly屬性,實在是遺憾。CodeDom還會對以"__"開頭的變量自動添加個@前綴,真是無語。
1 // private static TreeNodeType NODE__Grammar = new TreeNodeType(ContextfreeGrammarSLRTreeNodeType.NODE__Grammar, "Grammar", "<Grammar>"); 2 CodeMemberField field = new CodeMemberField(typeof(TreeNodeType), GetNodeNameInParser(node)); 3 // field.Attributes 不支持readonly,遺憾了。 4 field.Attributes = MemberAttributes.Private | MemberAttributes.Static; 5 var ctor = new CodeObjectCreateExpression(typeof(TreeNodeType), 6 new CodeFieldReferenceExpression( 7 new CodeTypeReferenceExpression(GetTreeNodeConstTypeName(grammarId, algorithm)), 8 GetNodeNameInParser(node)), 9 new CodePrimitiveExpression(node.Content), 10 new CodePrimitiveExpression(node.Nickname)); 11 field.InitExpression = ctor;
復雜的詞法分析器
從算法上說,理解語法分析器要比較理解詞法分析器困難的多。但是LR語法分析器的結構卻比詞法分析器的結構和LL語法分析器的結果簡單得多。目前實現dump詞法分析器代碼的代碼是最繞的。要處理注釋(//和/**/)是其中最復雜的問題。這段代碼寫好了我再也不想動了。
LL和LR
LR分析法確實比LL強太多。其適用各種現今的程序語言,對文法的限制極少,分析器結構還十分簡單。奇妙的是,稍微改動下文法,就可以減少LR分析的state,精簡代碼。
例如ContextfreeGrammarCompiler的文法,稍微改改會有不同的state數目。

1 ==================================================================== 2 135 set action items 3 <Grammar> ::= <ProductionList> <Production> ; 4 <ProductionList> ::= <ProductionList> <Production> | null ; 5 <Production> ::= <Vn> "::=" <Canditate> <RightPartList> ";" ; 6 <Canditate> ::= <V> <VList> ; 7 <VList> ::= <V> <VList> | null ; 8 <RightPartList> ::= "|" <Canditate> <RightPartList> | null ; 9 <V> ::= <Vn> | <Vt> ; 10 <Vn> ::= "<" identifier ">" ; 11 <Vt> ::= "null" | "identifier" | "number" | "constString" | constString ; 12 ==================================================================== 13 143 set action items 14 <Grammar> ::= <Production> <ProductionList> ; 15 <ProductionList> ::= <Production> <ProductionList> | null ; 16 <Production> ::= <Vn> "::=" <Canditate> <RightPartList> ";" ; 17 <Canditate> ::= <V> <VList> ; 18 <VList> ::= <V> <VList> | null ; 19 <RightPartList> ::= "|" <Canditate> <RightPartList> | null ; 20 <V> ::= <Vn> | <Vt> ; 21 <Vn> ::= "<" identifier ">" ; 22 <Vt> ::= "null" | "identifier" | "number" | "constString" | constString ; 23 ==================================================================== 24 139 set action items 25 <Grammar> ::= <ProductionList> <Production> ; 26 <ProductionList> ::= <ProductionList> <Production> | null ; 27 <Production> ::= <Vn> "::=" <LeftPartList> <Canditate> ";" ; 28 <LeftPartList> ::= <LeftPartList> <LeftPart> | null ; 29 <LeftPart> ::= <Canditate> "|" ; 30 <Canditate> ::= <V> <VList> ; 31 <VList> ::= <V> <VList> | null ; 32 <V> ::= <Vn> | <Vt> ; 33 <Vn> ::= "<" identifier ">" ; 34 <Vt> ::= "null" | "identifier" | "number" | "constString" | constString ; 35 ==================================================================== 36 120 set action items 37 <Grammar> ::= <ProductionList> <Production> ; 38 <ProductionList> ::= <ProductionList> <Production> | null ; 39 <Production> ::= <Vn> "::=" <Canditate> <RightPartList> ";" ; 40 <Canditate> ::= <VList> <V> ; 41 <VList> ::= <VList> <V> | null ; 42 <RightPartList> ::= "|" <Canditate> <RightPartList> | null ; 43 <V> ::= <Vn> | <Vt> ; 44 <Vn> ::= "<" identifier ">" ; 45 <Vt> ::= "null" | "identifier" | "number" | "constString" | constString ;
總結
實現了LALR(1)分析和GLSL解析器。
今后做任何語言、格式的解析都不用愁了。
文章列表