Building a Chinese Scripting Engine with AScript: A Practical Guide to Natural Language Programming
Introduction: Bridging the Language Gap in Programming
Programming has long been dominated by English-based syntax. From the earliest days of Fortran and COBOL to modern languages like Python and JavaScript, keywords, operators, and structure have been rooted in English vocabulary. This creates a significant barrier for non-English speakers, particularly in business contexts where domain experts understand the problem deeply but struggle with the syntactic requirements of traditional programming languages.
AScript, an open-source C# dynamic script parsing and execution library, offers an elegant solution to this challenge. With its support for custom syntax parsing, AScript enables developers to create scripting engines that operate in natural languages—including Chinese. This article provides a comprehensive, step-by-step guide to implementing a Chinese scripting engine using AScript, demonstrating how to create a conditional statement structure using native Chinese keywords.
Understanding AScript: The Foundation
Before diving into implementation, it's essential to understand what AScript provides and why it's uniquely suited for this task.
What is AScript?
AScript is a dynamic scripting library built for the .NET ecosystem. Its core capabilities include:
- Dynamic Script Parsing: Interprets script code at runtime without compilation
- Custom Syntax Support: Allows developers to define their own grammatical structures
- Extensible Token Handling: Provides interfaces for creating custom language constructs
- C# Integration: Seamlessly interoperates with existing C# code and libraries
Why AScript for Chinese Scripting?
The key advantage of AScript for natural language scripting lies in its token-based architecture. Rather than requiring a complete language grammar from the start, AScript allows incremental addition of language constructs through token handlers. This modular approach makes it practical to build a Chinese scripting engine piece by piece.
Implementation Walkthrough: Creating Chinese Conditional Statements
Let's implement a Chinese conditional statement with the structure: "如果 ... 则 ... 否则 ..." (If ... Then ... Else ...). This example demonstrates the core principles that can be extended to build a complete Chinese scripting language.
Step 1: Implementing the ITokenHandler Interface
The first step is creating a custom token handler that recognizes and processes the Chinese "如果" (if) keyword. This handler defines how the parser should interpret conditional statements.
public class 如果语法处理器 : ITokenHandler
{
private static readonly HashSet<string> _StatementEndTokens =
new HashSet<string> { "则", "否则" };
public void Build(DefaultSyntaxAnalyzer analyzer, TokenAnalyzingArgs e)
{
e.IsHandled = true;
e.End = true;
// If there's already a statement, push back the token and return
if (e.TreeBuilder.Root != null)
{
e.TokenReader.Push(e.CurrentToken);
return;
}
// Build the condition expression (everything until "则" or "否则")
var condition = analyzer.BuildOneStatement(
e.BuildContext,
e.ScriptContext,
e.Options,
e.TokenReader,
e.Control,
e.Ignore,
endTokens: _StatementEndTokens
);
// Validate and consume the "则" (then) keyword
analyzer.ValidateNextToken(e.TokenReader, "则");
// Build the main body of the if statement
var createAllOptions = new BuildOptions(e.Options)
{
CreateFullTreeNode = true
};
var body = analyzer.BuildOneStatement(
e.BuildContext,
e.ScriptContext,
createAllOptions,
e.TokenReader,
e.Control,
e.Ignore,
endTokens: _StatementEndTokens
);
// Create the if node with condition and body
var node = new IfNode
{
Condition = condition,
Body = body
};
// Check for additional tokens
var nextToken = e.TokenReader.Read();
if (nextToken.HasValue && nextToken.Value.Value == ";")
{
nextToken = e.TokenReader.Read();
}
// Handle optional "否则" (else) clause
if (nextToken.HasValue)
{
if (nextToken.Value.Value == "否则")
{
node.Else = analyzer.BuildOneStatement(
e.BuildContext,
e.ScriptContext,
createAllOptions,
e.TokenReader,
e.Control,
e.Ignore
);
}
else
{
e.TokenReader.Push(nextToken.Value);
}
}
// Add the completed if node to the syntax tree
e.TreeBuilder.Add(
e.BuildContext,
e.ScriptContext,
e.Options,
e.Control,
node
);
}
}Understanding the Token Handler Logic
Let's break down what this handler accomplishes:
- Token Recognition: The handler identifies "如果" as the start of a conditional statement
- Condition Parsing: It collects tokens until reaching "则" (then) or "否则" (else), forming the condition expression
- Validation: It ensures the "则" keyword is present, maintaining syntactic correctness
- Body Construction: It parses the statement block that executes when the condition is true
- Optional Else Handling: It checks for and processes an optional "否则" clause
- Tree Building: It constructs the syntax tree node representing the complete conditional statement
This approach mirrors how traditional compilers handle if-then-else constructs, but uses Chinese keywords instead of English ones.
Step 2: Defining the Chinese Language Environment
Next, we create a class that inherits from ScriptLang to define our Chinese language environment. This class registers all the Chinese keywords and their associated handlers.
public class 中文语言 : ScriptLang
{
// Singleton instance for the Chinese language
public static readonly 中文语言 实例 = new 中文语言();
public 中文语言()
{
// Register Chinese type names
AddType<int>("整型");
AddType<string>("文本");
// Register token handlers for Chinese keywords
AddTokenHandler("如果", new 如果语法处理器());
AddTokenHandler("返回", AScript.TokenHandlers.ReturnTokenHandler.Instance);
}
}Key Components Explained
Type Registration:
整型(Integer Type): Maps to C#'sinttype文本(Text Type): Maps to C#'sstringtype
This allows scripts to declare variables using Chinese type names, making the language feel natural to Chinese speakers.
Token Handler Registration:
如果(If): Associates the Chinese "if" keyword with our custom handler返回(Return): Uses AScript's built-in return handler for function returns
This registration process is how AScript builds its understanding of the language. Each keyword maps to specific parsing and execution logic.
Step 3: Registering the Chinese Language
Once the language class is defined, it must be registered with the AScript engine:
Script.Langs["中文"] = 中文语言。实例;This single line makes the Chinese language available to the scripting engine. Scripts can now specify "中文" as their language, and AScript will use the registered handlers and type definitions.
Step 4: Writing and Executing Chinese Scripts
With the language registered, we can write and execute scripts using Chinese syntax. Here's a complete example:
string s = @"
整型 n=10;
文本 s='';
如果 n<5 则 {
s='小于 5';
} 否则 如果 n<20 则 {
s='大于等于 5 且小于 20';
} 否则 {
s='大于等于 20';
}
返回 $'{n},{s}';
";
var script = new Script();
Assert.AreEqual("10,大于等于 5 且小于 20", script.Eval(s));Script Analysis
Let's examine this script line by line:
- Variable Declaration:
整型 n=10;declares an integer variable namednwith value 10 - String Initialization:
文本 s='';creates an empty string variable - Conditional Logic: The if-then-else structure evaluates conditions in Chinese
- Nested Conditions:
否则 如果creates an else-if chain for multiple conditions - String Interpolation:
$'{n},{s}'combines values into a formatted output string - Return Statement:
返回sends the result back to the caller
The expected output "10,大于等于 5 且小于 20" confirms that:
- The variable
nequals 10 - The condition
n<5is false - The condition
n<20is true - The appropriate branch executes and returns the correct message
Technical Deep Dive: How It Works
Tokenization Process
When AScript processes a Chinese script, it goes through several stages:
- Lexical Analysis: The input string is broken into tokens (keywords, identifiers, operators, literals)
- Token Recognition: Each token is matched against registered handlers
- Syntax Building: Handlers construct syntax tree nodes according to language rules
- Execution: The syntax tree is traversed and executed
Custom Token Handling
The ITokenHandler interface is the extension point that makes Chinese scripting possible. By implementing this interface, developers can:
- Define new keywords in any language
- Create custom control structures
- Implement domain-specific operations
- Extend the language with new features
This extensibility is what transforms AScript from a simple scripting engine into a platform for creating domain-specific languages.
Syntax Tree Construction
The syntax tree represents the hierarchical structure of the script. For our conditional statement:
IfNode
├── Condition: n < 5
├── Body: s = '小于 5'
└── Else: IfNode
├── Condition: n < 20
├── Body: s = '大于等于 5 且小于 20'
└── Else: s = '大于等于 20'This tree structure enables efficient execution and provides a foundation for optimization and analysis.
Practical Applications
Chinese scripting engines built with AScript have numerous practical applications:
Domain-Specific Languages (DSLs)
Business domains often have terminology and logic that don't map cleanly to traditional programming constructs. A Chinese scripting engine allows:
- Business Rule Expression: Rules written in business terminology
- Workflow Definition: Processes described in natural language
- Configuration Scripts: Settings specified by domain experts
Business Rule Engines
Organizations can encode business logic in scripts that business analysts can read and modify:
如果 客户。等级 = "VIP" 则 {
订单。折扣 = 0.2;
} 否则 如果 订单。金额 > 1000 则 {
订单。折扣 = 0.1;
} 否则 {
订单。折扣 = 0;
}This approach bridges the gap between technical implementation and business understanding.
Educational Tools
Chinese scripting can serve as an educational bridge:
- Programming Introduction: Students learn programming concepts in their native language
- Logic Training: Computational thinking developed without language barriers
- Progressive Learning: Transition from Chinese to English syntax as skills advance
Rapid Prototyping
Development teams can prototype logic quickly using natural language scripts before implementing in production code. This accelerates iteration and improves communication between technical and non-technical stakeholders.
Extending the Language
The conditional statement is just the beginning. A complete Chinese scripting language would include:
Additional Control Structures
- Loops:
当 ... 时(while),对于每个(for each) - Exception Handling:
尝试(try),捕获(catch),最终(finally) - Switch Statements:
选择(switch),情况(case)
Data Structures
- Arrays:
数组with Chinese indexing syntax - Objects:
对象with Chinese property access - Collections:
列表,字典,集合
Function Definitions
- Function Declaration:
函数 名称 (参数) { ... } - Lambda Expressions: Chinese arrow function syntax
- Anonymous Functions: Inline function definitions
Standard Library
- String Operations: Chinese-named string manipulation functions
- Math Functions: Mathematical operations with Chinese identifiers
- I/O Operations: File and network operations in Chinese
Challenges and Considerations
Ambiguity Resolution
Natural languages are inherently ambiguous. Unlike programming languages designed for precision, natural language contains:
- Contextual Meaning: Words change meaning based on context
- Implicit Relationships: Connections not explicitly stated
- Cultural Nuances: Expressions specific to language culture
The scripting engine must resolve these ambiguities through careful grammar design and validation rules.
Performance Optimization
Interpreted languages trade performance for flexibility. Optimization strategies include:
- Syntax Tree Caching: Reusing parsed structures for repeated scripts
- JIT Compilation: Converting frequently-executed scripts to native code
- Expression Optimization: Simplifying expressions during parsing
Tooling and Debugging
A complete language ecosystem requires:
- Syntax Highlighting: Editor support for Chinese keywords
- Debugging Tools: Breakpoints, step-through execution, variable inspection
- Error Messages: Clear, helpful error messages in Chinese
The Future of Natural Language Programming
The Chinese scripting engine example demonstrates a broader trend: programming languages becoming more accessible and expressive. As AI and natural language processing advance, we may see:
- Hybrid Languages: Mixing natural language with formal syntax
- AI-Assisted Coding: Natural language descriptions converted to code
- Domain-Specific Evolution: Languages tailored to specific industries
AScript provides a practical foundation for exploring these possibilities today.
Conclusion: Empowering Developers Through Language
The Chinese scripting engine built with AScript represents more than a technical achievement—it's a step toward democratizing programming. By allowing developers and domain experts to express logic in their native language, we:
- Reduce Barriers: Lower the entry threshold for programming
- Improve Communication: Bridge gaps between technical and business teams
- Enhance Productivity: Enable faster iteration and modification
- Preserve Knowledge: Capture business logic in understandable form
The implementation demonstrated here—conditional statements with "如果 ... 则 ... 否则 ..."—provides a template for building complete natural language scripting engines. The principles extend beyond Chinese to any language, opening possibilities for truly global, accessible programming.
For developers interested in exploring this further, the AScript library provides the tools. The question isn't whether natural language programming has value—it's what you'll build with it.
The future of programming isn't just about more powerful languages—it's about more accessible ones. Chinese scripting with AScript is a concrete step in that direction.