一個編譯器的實現3——用編譯原理自動化制作文本解析器
PS:本文PDF版在這里。
關于編譯器的概念、工作流程、算法和設計方案,可參考這里(http://www.cnblogs.com/bitzhuwei/archive/2013/06/05/CompilerDesignAndImp4Context-freeGrammar.html)。閱讀本文須理解“上下文無關文法(Context-free Grammar)”是什么。
本文以加減乘除表達式和一個3D坦克游戲模型為例,說明如何自動生成解析器以及如何使用自動生成的代碼。
文末附源代碼。
加減乘除表達式
運行編譯器代碼生成器(bitzhuwei.CGCompiler.Winform.exe),默認配置文件中已經有加減乘除表達式(Expression)的文法了。
![clip_image002[8] clip_image002[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194256-b99eb5dee5c446b3b983dd4dd2355262.jpg&width=557&height=365)
設置好編譯器名字、命名空間和代碼存放的位置,點擊“開始!”。
若文法沒有錯誤,會在指定位置生成Expression解析器的代碼。
![clip_image004[8] clip_image004[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194258-a4d23d599a034146931e29c9b292e610.jpg&width=557&height=364)
![clip_image006[8] clip_image006[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194300-166d722947ff46e8ba005d9a7cca5ea7.jpg&width=558&height=237)
一共生成了10個文件(其中bitzhuwei.CompilerBase.dll和使用說明.txt是直接復制的)。
三個Enum*.cs文件分別是文法的字符類型、單詞類型和語法樹結點類型。
LexicalAnalyzer*.cs文件是詞法分析器。
LL1SyntaxParser*.cs文件是語法分析器。
SyntaxTreeNodeValue*.cs文件是語法樹結點類型,稍候會用到。
使用生存的代碼的方法很簡單:創建一個類庫項目,把生成的10個文件全部加進去,引用bitzhuwei.CompilerBase.dll文件。
![clip_image007[8] clip_image007[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194302-a3a053d627a646bbbe605c2e4476876e.jpg&width=282&height=477)
為了測試,再創建一個Console項目,用下面的代碼測試。

測試Expression的代碼 static void Main(string[] args)
{
var sourceCodes = new string[]
{
"37",
"19 * 19 - 18 * 18",
"(19 + 18) * (19 - 18)",
};
foreach (var sourceCode in sourceCodes)
{
var lex = new bitzhuwei.ExpressionCompiler.LexicalAnalyzerExpression();
lex.SetSourceCode(sourceCode);
var tokens = lex.Analyze();
Console.WriteLine(tokens);
var parser = new bitzhuwei.ExpressionCompiler.LL1SyntaxParserExpression();
parser.SetTokenListSource(tokens);
var tree = parser.Parse();
Console.WriteLine(tree);
var value = tree.GetValue();
Console.WriteLine(value);
}
}
輸入的語法樹如下圖所示。
![clip_image009[8] clip_image009[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194306-99c7f6f3f97e40378bbd5fbd657a95b8.jpg&width=558&height=542)
我們使用解析器,目的是為了得到數據結構后再獲取有價值的結果。Expression的價值在于獲取表達式的值,通過遍歷語法樹獲取這個值是很容易的。(這個代碼只能自己寫,這屬于語義分析階段了,目前還無法自動生成。)

SyntaxTreeExpressionGetValue.cs/// <summary>
/// 提供SyntaxTree<EnumTokenTypeCG, EnumVTypeCG, SyntaxTreeNodeValueCG>的擴展方法
/// </summary>
public static partial class SyntaxTreeExpression
{
/// <summary>
/// 獲取源代碼的規范格式
/// <para>語法分析的副產品</para>
/// </summary>
/// <param name="tree">語法樹</param>
/// <returns></returns>
public static double GetValue(this SyntaxTree<EnumTokenTypeExpression, EnumVTypeExpression, TreeNodeValueExpression> tree)
{
if (tree == null) return double.NaN;
var tmpTree = tree.Clone() as SyntaxTree<EnumTokenTypeExpression, EnumVTypeExpression, TreeNodeValueExpression>;
_GetValue(tmpTree);
return double.Parse(tmpTree.Tag.ToString());
}
private static void _GetValue(SyntaxTree<EnumTokenTypeExpression, EnumVTypeExpression, TreeNodeValueExpression> tree)
{
switch (tree.NodeValue.NodeType)
{
case EnumVTypeExpression.Unknown:
break;
case EnumVTypeExpression.case_Expression://<Expression> ::= <Multiply> <PlusOpt>;
_GetValue(tree.Children[0]);
_GetValue(tree.Children[1]);
tree.Tag = double.Parse(tree.Children[0].Tag.ToString()) + double.Parse(tree.Children[1].Tag.ToString());
break;
case EnumVTypeExpression.case_PlusOpt://<PlusOpt> ::= "+" <Multiply> | "-" <Multiply> | null;
if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_PlusOpt___tail_plus_Leave())
{
_GetValue(tree.Children[1]);
tree.Tag = tree.Children[1].Tag;
}
else if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_PlusOpt___tail_minus_Leave())
{
_GetValue(tree.Children[1]);
tree.Tag = -double.Parse(tree.Children[1].Tag.ToString());
}
else if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_PlusOpt___tail_rightParentheses_Leave()
|| tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_PlusOpt___tail_startEndLeave())
{
tree.Tag = 0;
}
break;
case EnumVTypeExpression.case_Multiply://<Multiply> ::= <Unit> <MultiplyOpt>;
_GetValue(tree.Children[0]);
_GetValue(tree.Children[1]);
tree.Tag = double.Parse(tree.Children[0].Tag.ToString()) * double.Parse(tree.Children[1].Tag.ToString());
break;
case EnumVTypeExpression.case_MultiplyOpt://<MultiplyOpt> ::= "*" <Unit> | "/" <Unit> | null;
if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_multiply_Leave())
{
_GetValue(tree.Children[1]);
tree.Tag = double.Parse(tree.Children[1].Tag.ToString());
}
else if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_divide_Leave())
{
_GetValue(tree.Children[1]);
tree.Tag = 1 / (double)tree.Children[1].Tag;
}
else if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_plus_Leave()
|| tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_minus_Leave()
|| tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_rightParentheses_Leave()
|| tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_MultiplyOpt___tail_startEndLeave())
{
tree.Tag = 1;
}
break;
case EnumVTypeExpression.case_Unit://<Unit> ::= number | "(" <Expression> ")";
if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_Unit___numberLeave())
{
tree.Tag = double.Parse(tree.Children[0].NodeValue.NodeName);
}
else if (tree.CandidateFunc == LL1SyntaxParserExpression.GetFuncParsecase_Unit___tail_leftParentheses_Leave())
{
_GetValue(tree.Children[1]);
tree.Tag = tree.Children[1].Tag;
}
break;
default:
break;
}
}
}
ArmadaTank模型
坦克艦隊(ArmadaTank)是我很喜歡的一款游戲,現在我正在試圖用C#重寫這個游戲。喜歡的同學可以自行搜索“坦克艦隊”。
ArmadaTank的3D模型是用純文本的*.dtm文件標識的。完全可以用自動生成的解析器來加載之。
步驟就不再說了,和Expression的步驟一樣,這里只貼一下DTM文件的文法。

DTM的文法<ArmadaTanksModel> ::= "File" <FileContent> "endfile";
<FileContent> ::= "{" <BlockList> "}";
<BlockList> ::= <Block> <BlockList> | null;
<Block> ::= "FileDesc" <FileDesc> "endfiledesc" | "Faces" <Faces> | "MapChannel" <SignedNumber> <MapChannel> | "Frame" <SignedNumber> <Frame> "endframe";
<FileDesc> ::= "{" <FileDescItemList> "}";
<FileDescItemList> ::= <FileDescItem> <FileDescItemList> | null;
<FileDescItem> ::= "Frames" <SignedNumber> | "Vertices" <SignedNumber> | "Faces" <SignedNumber> | "Map" <SignedNumber> "TVertices" <SignedNumber>;
<Faces> ::= "{" <FaceList> "}";
<FaceList> ::= <Face> <FaceList> | null;
<Face> ::= "Face" <SignedNumber> <SignedNumber> <SignedNumber> <SignedNumber> "MatID" <SignedNumber>;
<MapChannel> ::= "{" <TextureList> "}";
<TextureList> ::= <Texture> <TextureList> | null;
<Texture> ::= "TextureVertices" <TextureVertices> | "TextureFaces" <TextureFaces>;
<TextureVertices> ::= "{" <TVertexList> "}";
<TVertexList> ::= <TVertex> <TVertexList> | null;
<TVertex> ::= "TVertex" <SignedNumber> <SignedNumber> <SignedNumber> <SignedNumber>;
<TextureFaces> ::= "{" <TFaceList> "}";
<TFaceList> ::= <TFace> <TFaceList> | null;
<TFace> ::= "TFace" <SignedNumber> <SignedNumber> <SignedNumber> <SignedNumber>;
<Frame> ::= "{" <FrameContentItemList> "}";
<FrameContentItemList> ::= <FrameContentItem> <FrameContentItemList> | null;
<FrameContentItem> ::= "Vertices" "{" <Vertices> "}";
<Vertices> ::= <Vertex> <Vertices> | null;
<Vertex> ::= "Vertex" <SignedNumber> <SignedNumber> <SignedNumber> <SignedNumber>;
<SignedNumber> ::= "+" number | "-" number | number;
用OpenGL來顯示3D模型(語義分析及其之后的階段),如下圖所示。
![clip_image011[8] clip_image011[8]](https://imageproxy.pixnet.cc/imgproxy?url=https://images.cnitblog.com/blog/383191/201311/20194308-cbf77c9cf7d546dd970df5efa9f971f8.jpg&width=558&height=449)
源代碼在此。
http://files.cnblogs.com/bitzhuwei/bitzhuwei.CGCompiler2013-11-20_19-27-00.rar