The Token Translation Debate: Examining AI Terminology Through Computational Ontology
Introduction
Recently, the National Committee for Terms in Sciences and Technologies announced a public notice recommending the translation of "Token" in the artificial intelligence field as "词元" (Ci Yuan), opening it for public trial use. Subsequently, People's Daily published an article titled "Expert Interpretation: Why Token's Chinese Name is Set as 'Ci Yuan'", systematically explaining this naming decision from a professional perspective.
The article mentions that the term "token" originates from Old English "tācen", meaning "symbol" or "mark". In language models, token represents the smallest discrete unit obtained after text segmentation or byte-level encoding, which can manifest as words, subwords, affixes, or characters in different forms. Models demonstrate certain intelligent capabilities precisely through modeling token sequences.
This translation name was considered to comply with principles of unambiguity, scientificity, conciseness, and coordination within the expert argumentation system, also possessing certain usage foundation in the current Chinese context. However, after reading relevant interpretations, I formed a different understanding of this naming approach.
From a standardization perspective, this naming scheme offers comprehensibility and communication advantages in the short term. However, examining from dimensions of computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability requires further verification. Against this background, an equally worthy alternative path—"符元" (Fu Yuan, meaning Symbol Unit)—gradually demonstrates stronger structural consistency and cross-context stability.
The Definition Misalignment: Cannot Replace "Essence" with "Origin"
Expert Viewpoint
According to Academician Chen Xilin from the Institute of Computing Technology, Chinese Academy of Sciences: Token's initial role in artificial intelligence is the "basic semantic unit of language", therefore "Ci Yuan" can better align with its essence.
Critical Analysis
This judgment holds rationality within historical context, but in the current era of significant technological paradigm leaps, this thinking essentially represents "academic carving swords to mark the boat"—a reference to the futility of rigid adherence to outdated frameworks.
At the logical level of terminology definition, we must strictly distinguish between "initial application scenarios" and "structural essential attributes".
Token indeed originated from Natural Language Processing (NLP), but in AGI's evolutionary path, it has long transcended language model boundaries, evolving into a fundamental unit for unified processing of text, images, speech, and even physical signals. Within modern computing systems, Token's true structural ontology is "discrete symbol unit", not a single-modality language unit.
Historical Precedents
If we define terms based on "initial roles":
- Computer should still be called "Electronic Calculator" (originating from its initial function replacing human calculators)
- Internet should be named "Cold War Military Network"
This naming logic's fatal flaw lies in seeing only technology's "temporary job" at specific historical moments while ignoring its "physical ontology" spanning eras.
Key Principle: Historical paths cannot equate to essential attributes. Similarly, we cannot permanently lock Token within the narrow context of "words" simply because it was initially used for text processing.
Defining fundamental concepts using "initial application scenarios" essentially replaces structural ontological truth with historical path dependency. While this definition may provide understanding convenience during technology's early stages, it rapidly becomes ineffective during the paradigm expansion phase of multimodal explosion, becoming shackles hindering cognition.
In contrast, "Fu Yuan" (Symbol Unit) directly aligns with the symbol ontology of cross-modal computation. It defines not Token's "past" but Token's "truth".
The Boundary of Analogy: When Explanation Becomes Definition, Deviation Begins
Expert Viewpoint
According to Associate Professor Dong Yuxiao from Tsinghua University's Computer Science Department: Through analogies like "word cloud" and "bag of words", discrete units in multimodal contexts can be understood as "generalized words".
Critical Perspective
Professor Dong's analogy aids understanding but shouldn't replace definition. This approach holds certain inspirational value at the explanatory level, but if further elevated as naming basis, it may trigger conceptual category misalignment.
Methodological Distinction
From a methodological perspective:
- Analogy's role: Lower understanding thresholds
- Definition's responsibility: Delineate semantic boundaries
When "word" expands to cover image patches, speech segments, vector representations (embeddings), and even broader perceptual signals, its original language attributes become continuously diluted, with semantic boundaries tending toward fuzziness.
This "analogy-driven" expansion path can maintain explanatory consistency in the short term but容易造成 semantic drift in long-term evolution.
Warning Against Category Confusion
We must remain vigilant about "analogy" sliding toward "definition" in cross-modal expansion capabilities. Within terminology review contexts, we must distinguish boundaries between "explanatory metaphors" and "ontological definitions", preventing the former from replacing the latter.
Intuitive Comparison
In popular science contexts, we can analogize light bulbs as "artificial suns" to enhance understanding intuitiveness. However, within scientific naming systems, we cannot rename the current unit "Ampere" as "Light Unit" based on this analogy.
- Former: Descriptive expression
- Latter: Strict measurement system and standardized definition
These two cannot be mixed.
Token's Scale and Significance
Similarly, terms like "word cloud" and "bag of words" essentially belong to descriptive or statistical metaphors, functioning to help understand data structures or distribution patterns. Token, as a fundamental measurement unit in large models, has deeply embedded itself into computing billing, model training, and academic measurement systems.
When usage scale reaches daily hundreds of billions to trillions of calls, the naming carries not just explanatory function but represents a foundational concept with engineering and standard significance.
At this level, terminology needs to align with its ontological attributes rather than rely on analogy extension.
The Dangerous Premise
Pushing this analogy logic further to the naming level implicitly contains a dangerous premise: Since people have become accustomed to understanding Token as "word", we might as well continue using this analogy.
However, this actually represents path dependency continuation—using existing cognitive convenience to replace conceptual ontological correction. In this sense, such naming approaches "linguistic romanticism" rather than strict alignment with computational ontology.
Fundamental Principle: We cannot require discussing "electronic horses" in electric motors simply because "horsepower" contains "horse". Analogy can inspire understanding but cannot define standards.
The Superior Alternative
In contrast, "Fu" (Symbol) as a more neutral concept naturally possesses cross-modal adaptation capabilities, covering text, images, speech, and other information forms without requiring additional explanation.
Therefore, the naming path centered on "symbol unit" at the definition level approaches Token's structural essence more closely. Within this logic, "Fu Yuan" as the corresponding translation name demonstrates higher conceptual consistency and long-term adaptability.
The Cognitive Cost: When Semantic Anchoring Creates Systematic Misunderstanding
Expert Consensus
"Ci Yuan" expression is concise, conforms to Chinese habits, and facilitates dissemination.
Critical Examination
This judgment holds certain rationality at the communication level, but its implicit premise assumes: the public can accept "word"'s cross-modal analogy.
However, analogy essentially represents an expert thinking tool, not the public's natural cognitive approach. For ordinary users, "word" possesses extremely strong semantic anchoring effects—once hearing "word", their intuition necessarily points to language systems, not images, sounds, actions, or other modalities.
This cognitive path isn't a technical issue but a stable structure at the cognitive psychology level.
The Deviation Mechanism
On this basis, when "word" expands to so-called "generalized words", it actually creates deviation in user cognition. Users first form intuitive understanding of "word = language unit" rather than the abstract concept of "cross-modal symbol unit".
Once this misunderstanding establishes, all subsequent explanations become corrections to existing cognition rather than natural understanding extension.
Real-World Example
For instance, when media report "the model was trained using 10 trillion Ci Yuan", the public easily understands this as "read massive amounts of text" while ignoring the large quantities of images, speech, and other modal data included.
This misunderstanding isn't isolated but represents systematic induction produced by the terminology's own semantic anchoring.
Engineering Context Impact
In actual engineering contexts, this naming may also bring friction to interdisciplinary communication. When discrete units in vision models or speech models are called "words", it not only easily triggers semantic misunderstanding but also creates unnecessary language conflicts between different fields.
Multimodal systems require "symbol layer" unification rather than language category expansion.
Long-Term Considerations
In contrast, "Fu" as a more abstract concept, while having slightly higher initial understanding thresholds, possesses more neutral semantic direction, not pre-locking cognition at the language layer.
In long-term use, it更有利于 establishing stable, unified cognitive frameworks, thereby reducing overall explanation costs and providing more stable cognitive foundations for multimodal unification.
Key Insight: Naming costs don't occur at definition time but during correction time. Once early naming forms semantic anchoring, subsequent cognitive repair costs rise exponentially.
Experts can expand "word"'s boundaries through analogy, but the public won't understand concepts through analogy. Naming doesn't serve experts but is responsible for the entire era's cognitive system.
The Unambiguity Illusion: When One Term Attempts to Carry Two Systems
Terminology Review Principles
"Ci Yuan" conforms to unambiguity principles, helping resolve translation confusion issues.
Further Discussion Space
Regarding terminology unambiguity, we need special attention to systematic risks that "one term, two meanings" may trigger. Within scientific terminology review, "unambiguity" represents one of the foundational principles.
If a terminology requires relying on context or additional explanation to distinguish meanings, then its value as a standard component has already been lost.
However, from existing academic systems, this judgment still has room for further discussion.
The Existing Meaning Conflict
The term "Ci Yuan" has long been "claimed" in linguistics and Natural Language Processing (NLP) fields. In classical linguistics, its long-term corresponding English concept is Lemma, referring to a word's standardized original form (for example, is/am/are's lemma is "be").
This usage has formed stable consensus in linguistics and NLP foundational textbooks and academic papers.
The Catastrophic Scenario
Against this background, translating Token also as "Ci Yuan" easily produces semantic conflicts in specific expressions, creating disastrous situations.
Example: When describing "lemmatize a token in NLP", the Chinese expression becomes "perform 'lemmatization' on 'Ci Yuan'".
This expression not only increases understanding costs but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether "Ci Yuan" points to segmented discrete units or words' standardized original forms.
Functional Distinction
From a conceptual functionality perspective, the two also have clear distinctions:
- Lemma: Emphasizes "restoration" at the language level, corresponding to standardized expressions after morphological changes
- Token: Emphasizes "segmentation" during computation processes, corresponding to the smallest discrete units when models process information
This "restoration" versus "segmentation" difference precisely corresponds to different dimensions of semantic layer and symbol layer.
The Reality Check
Therefore, when a terminology needs to rely on "generalization" to simultaneously cover multiple existing concepts, its unambiguity has actually transformed into "explanatory unification" rather than "semantic stability".
When a terminology requires explanation to maintain unification, its stability as a standard terminology has often already begun to shake.
The Superior Alternative
In contrast, "Fu Yuan" has no semantic conflicts within existing terminology systems. On one hand, it retains Token's ontological attributes as discrete symbols. On the other hand, it also avoids overlap with Lemma's existing translation names, thereby demonstrating higher stability in semantic clarity and system consistency.
The Ontological Return: Token is Essentially "Symbol", Not "Word"
General Interpretation
Token is the smallest unit used for processing text in language models.
Deeper Analysis
This expression holds at the functional level but still remains at the "how to use" level without touching its ontological attributes in computational theory.
From information theory and computational theory perspectives, the basic objects processed by computing systems aren't "words" but "symbols".
Two-Level Understanding
This point can be further understood from two levels:
Information Theory Perspective
From an information theory viewpoint, information's essence lies in eliminating uncertainty. Its measurement unit is bits, and its carrying entity is discrete symbols.
Symbols don't care about semantic content but relate only to probability distribution and encoding structure.
Computational Implementation Level
At the computation implementation level, large model underpinnings don't "recognize characters". Their processing objects are discrete index representations (IDs).
Regardless of whether this ID corresponds to a Chinese character, an image patch, or an audio sampling point, all participate in computation in unified symbol form during the computation process.
The Symbol Layer Importance
Within this framework, precisely because its essence lies at the "symbol layer" rather than the "semantic layer". Symbols themselves don't carry semantics but exist as encoding and computation's basic carriers.
Naming Token as "Ci Yuan" to some extent introduces language semantic layer's implicit direction, pulling this concept originally at the symbol layer back to a language-centered understanding path.
This naming approach may provide intuitiveness at the explanatory level but easily blurs boundaries between "symbol computation" and "semantic understanding" at the theoretical level.
The Alternative Advantage
In contrast, "Fu Yuan" remains within the symbol layer conceptually. On one hand, it accurately reflects Token's computational attributes as discrete symbols. On the other hand, it also avoids introducing semantic features into ontological definitions, thereby conforming more closely to information theory and computational theory's basic frameworks.
From a broader perspective, as artificial intelligence systems continuously evolve toward multimodal and general intelligence, if basic concept naming can directly align with their mathematical and computational ontology, it will更有利于 building stable, scalable cognitive systems.
In this sense, the naming path centered on "symbol unit" isn't just a language selection issue but represents a consistent expression of computational essence, with "Fu Yuan" being the natural correspondence within this framework.
Fundamental Principle: Defining concepts from the symbol layer aligns with computational essence. Naming concepts from the semantic layer approaches explanation rather than definition.
The Language Rupture: Mapping Failure in Back-Translation Mechanisms
Comprehensive Interpretation
"Ci Yuan" has gradually formed usage foundation in the Chinese academic community, possessing certain communication advantages.
Cross-Language Considerations
In cross-language contexts, we need to remain vigilant about systematic impacts brought by terminology "back-translation rupture".
Measuring whether a scientific terminology possesses long-term vitality depends not only on its expressive capability within the Chinese context but more on whether it can achieve stable mapping in international academic systems.
Ideal terminologies should possess "reversibility", achieving consistent round-trip semantics between different languages.
The Back-Translation Challenge
The above judgment reflects "Ci Yuan"'s acceptability within local contexts, but from a cross-language perspective, there's still room for further discussion.
If a terminology only holds within a single language system without forming stable correspondence in international contexts, it may introduce additional understanding costs in academic exchanges.
Specific Analysis
Specifically, "Ci Yuan" lacks clear, unique corresponding paths during the back-translation process. When restored to English, it often creates divergence among multiple approximate concepts:
- "Word unit": Lacks strict academic definition
- "Morpheme": Corresponds to linguistics' morphemes
- "Lexeme": Points to lexical positions
None of these concepts can accurately cover Token's meaning in computational contexts, instead introducing category offsets.
The Superior Alternative
In contrast, "Fu Yuan" can relatively naturally correspond to "symbolic unit". This concept possesses clear theoretical foundations and stable usage in information theory, discrete mathematics, and multimodal representation fields, maintaining consistent semantic direction across different contexts.
Therefore, it更容易 forms one-to-one mapping relationships between Chinese and English.
Practical Implications
From a practical perspective, once terminologies enter academic papers, technical documents, and international exchange scenarios, their back-translation capabilities directly impact expression efficiency and understanding accuracy.
If a terminology requires additional explanation to complete cross-language conversion, its long-term usage costs will continuously accumulate.
Conclusion
Therefore, within cross-language systems, the main problem "Ci Yuan" faces lies in mapping path instability, while "Fu Yuan" demonstrates higher certainty in semantic correspondence and conceptual consistency.
Against the background of increasingly globalized artificial intelligence, selecting terminologies with good back-translation characteristics will更有利于 building open, interoperable academic and technical systems.
Key Insight: Terminology's international reversibility essentially represents the key benchmark for whether it possesses long-term academic vitality.
The Unification Misconception: Formal Consistency Doesn't Equal Structural Consistency
Comprehensive Expert Opinion
"Ci Yuan" maintains consistent expression style with terminologies like "embedding" and "attention", being concise and abstract, conforming to Chinese technical contexts.
Conclusion First
Terminology system unification should be built upon "conceptual isomorphism" rather than "linguistic homology".
The Valid Need
Within "Ci Yuan"'s supporting arguments, a common rationale is: its expression style maintains consistency with terminologies like "embedding" and "attention", being concise and abstract, conforming to Chinese technical contexts.
This reason captures terminology systems' real need for unification, but the problem lies in—if unification only stays at the language level rather than the structural level, it will slide from "order" to "illusion".
Why Existing Terms Work
"Embedding" and "Attention" became stable terminologies because they correspond to clear computational structures:
- Embedding: Vector mapping
- Attention: Weight mechanism
Their naming directly points to computational essence.
The Critical Difference
"Ci Yuan" belongs to explanatory naming, with its rationality depending on the "generalized word" analogy framework. Once divorced from explanation, this naming itself lacks self-consistent structural direction.
The Key Problem
This difference brings a critical issue: formal consistency with semantic offset.
- Former: Reduces expression costs
- Latter: Guarantees cognitive stability
If prioritizing pursuit of "linguistic homology", complexity won't disappear but transfers into long-term cognitive burden. Only naming built upon "conceptual isomorphism" can maintain stability during cross-context and multimodal evolution.
The False Impression
When "embedding", "attention", and "Ci Yuan" appear together, they easily form an illusion of "conceptual same layer". But actually:
- Former two: Mechanisms
- Latter: Object
- Former two: Possess strict definitions
- Latter: Depends on contextual explanation
This structural misalignment buries hidden fractures within cognitive systems.
The Broader Impact
More importantly, when a fundamental concept's naming relies on analogy rather than structural definition, its impact won't stay within a single terminology but will diffuse toward the entire terminology system.
When subsequent concepts attempt to expand around this naming, they must continuously maintain consistency through explanation, thereby forming implicit structural misalignment.
The Alternative Path
In this sense, "Fu Yuan" provides an expression path closer to underlying structures. It directly points to basic objects in computing systems—symbols—without requiring analogy explanation to maintain consistency across different contexts.
Profound Truth: Terminology isn't just labels but cognition's entry points. Good terminologies make explanations gradually disappear, while poor terminologies make annotations continuously increase.
When fundamental concepts deviate from structures, terminology systems can only rely on explanation to maintain operation rather than relying on definition for self-consistency.
Conclusion
From an essential perspective, terminology selection isn't merely a language issue but represents early shaping of a field's cognitive structure.
Once naming deviates from its structural ontology at the initial stage, subsequent systems can only operate through continuous explanation, struggling to form self-consistent conceptual networks.
In artificial intelligence's journey toward generalization and multimodal fusion, a terminology capable of aligning with computational ontology and possessing cross-context stability will more likely become a long-term effective cognitive cornerstone.
In this sense, the naming path centered on "symbol unit" demonstrates more balanced adaptability in balancing technical essence and cognitive clarity.
The choice between "Ci Yuan" and "Fu Yuan" transcends mere translation preference—it represents a fundamental decision about how we conceptualize and communicate the building blocks of artificial intelligence across languages, cultures, and technological paradigms.