From "Word-Unit" to "Symbol-Unit": The Cognitive Battle Behind Token's Chinese Translation
Introduction
Recently, the National Committee for Terms in Sciences and Technologies announced its recommendation to translate "Token" in the artificial intelligence field as "词元" (Word-Unit), opening the term for public trial use. Subsequently, People's Daily published an expert interpretation explaining the rationale behind choosing "词元" as Token's Chinese name, providing systematic professional justification for this decision.
The expert commentary noted that "token" originates from Old English "tācen," meaning "symbol" or "sign." In language models, tokens represent the smallest discrete units obtained after text segmentation or byte-level encoding, manifesting as words, subwords, affixes, or characters. Models demonstrate intelligent capabilities precisely through modeling sequences of these tokens.
This translation received endorsement within the expert argumentation framework, deemed compliant with principles of monosemy, scientificity, conciseness, and coordination, while possessing established usage foundations in the current Chinese linguistic context. However, after reviewing the relevant interpretations, I have developed a different understanding of this naming approach.
From a standardization perspective, this naming scheme offers comprehensibility and dissemination advantages in the short term. Yet when examined through dimensions of computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability warrants further scrutiny. Against this backdrop, an alternative pathway gradually emerges—"符元" (Symbol-Unit)—demonstrating stronger structural consistency and cross-contextual stability.
Section 1: Definitional Misalignment—Cannot Replace "Essence" with "Origin"
Expert Position (Researcher Chen Xilin, Institute of Computing Technology, Chinese Academy of Sciences): Token's initial role in artificial intelligence was as a "basic semantic unit of language," therefore "词元" aligns more closely with its essence.
This judgment holds rationality within historical context, but during the current era of significant technological paradigm shifts, such thinking essentially represents "academic boat-carving to seek swords"—a fundamentally flawed approach to terminology definition.
Logical Distinction in Terminology Definition
Within terminology definition logic, we must rigorously distinguish between "initial application scenarios" and "structural essential attributes."
Token indeed originated in Natural Language Processing (NLP), but throughout AGI's evolutionary pathway, it has long transcended language model boundaries, evolving into a foundational unit uniformly processing text, images, speech, and even physical signals. Within modern computational systems, Token's true structural ontology is "discrete symbolic unit," not a single-modality language unit.
Historical Precedents
If we define terms by "initial roles":
- Computer should still be called "Electronic Calculator" (originating from its initial function replacing human calculators)
- Internet should be named "Cold War Military Network"
This naming logic's fatal flaw lies in observing only technology's "temporary job" at specific historical moments while ignoring its "physical ontology" spanning eras.
Historical pathways cannot equate to essential attributes. Similarly, we cannot permanently lock Token within the narrow context of "words" simply because it was initially used for text processing.
Using "initial application scenarios" to define foundational concepts essentially substitutes historical path dependency for structural ontological truth. While such definitions may provide understanding convenience during technology's early stages, they rapidly become ineffective during paradigm expansion phases of multimodal explosion, transforming into cognitive shackles.
In contrast, "符元" directly aligns with cross-modal computational symbolic ontology, defining not Token's "past" but Token's "truth."
Section 2: Analogy Boundaries—When Explanation Becomes Definition, Deviation Begins
Expert Position (Associate Professor Dong Yuxiao, Tsinghua University Department of Computer Science): Through analogies like "word cloud" and "bag of words," discrete units in multimodal contexts can be understood as "generalized words."
Professor Dong's analogy aids understanding but should not replace definition. This approach offers certain heuristic value at the explanatory level, yet when elevated to naming basis, it may trigger conceptual categorical misalignment.
Methodological Considerations
From a methodological perspective:
- Analogy's function: Lower understanding thresholds
- Definition's responsibility: Delineate semantic boundaries
When "word" expands to cover image patches, speech segments, vector representations (embeddings), and broader perceptual signals, its original linguistic attributes become continuously diluted, with semantic boundaries growing increasingly blurred. This "analogy-driven" expansion pathway can maintain explanatory consistency short-term but容易造成 semantic drift during long-term evolution.
Cross-Modal Expansion Capability
Within cross-modal expansion contexts, we must remain vigilant against "analogy" sliding toward "definition." In terminology standardization contexts, distinguishing boundaries between "explanatory metaphors" and "ontological definitions" becomes essential, preventing the former from replacing the latter.
Intuitive Comparison
Consider this illustrative contrast: In popularization contexts, we might analogize light bulbs as "artificial suns" to enhance intuitive understanding. However, within scientific naming systems, we would never rename the current unit "Ampere" as "Light-Unit" based on this analogy.
- The former belongs to descriptive expression
- The latter involves strict measurement systems and standardized definitions
These two categories cannot be conflated.
Scale Considerations
Similarly, terms like "word cloud" and "bag of words"本质上 belong to descriptive or statistical metaphors, functioning to help understand data structures or distribution patterns. Token, however, as a foundational measurement unit in large models, has deeply embedded itself into computational power billing, model training, and academic measurement systems.
When usage scale reaches daily hundreds of billions to trillions of calls, the naming carries not merely explanatory function but represents a foundational concept with engineering and standard significance. At this level, terminology requires alignment with its ontological attributes rather than relying on analogical extension.
Dangerous Premise
Pushing this analogical logic further to the naming level implicitly contains a dangerous premise: Since people have become accustomed to understanding Token through "words," we might as well continue using this analogy. But this actually represents path dependency continuation—substituting existing cognitive convenience for conceptual ontological correction.
In this sense, such naming approaches "linguistic romanticism" rather than strict alignment with computational ontology.
We wouldn't require discussing "electronic horses" in motors simply because "horsepower" contains "horse." Analogy can inspire understanding but cannot define standards.
Superior Alternative
In contrast, "symbol" as a more neutral concept naturally possesses cross-modal adaptation capability, covering text, images, speech, and other information forms without requiring additional explanation. Therefore, a naming pathway centered on "symbolic unit" demonstrates higher conceptual consistency and long-term adaptability at the definition level.
Section 3: Cognitive Cost—When Semantic Anchors Create Systematic Misunderstanding
Expert Position (Comprehensive Expert Opinion): "词元" offers concise expression, conforms to Chinese habits, and facilitates dissemination.
This judgment holds certain rationality at the dissemination level, but its implicit premise assumes: the public can accept "word's" cross-modal analogy. However, analogy essentially represents an expert thinking tool, not the public's natural cognitive approach.
Semantic Anchoring Effect
For ordinary users, "word" possesses extremely strong semantic anchoring effects—upon hearing "word," intuitive direction necessarily points to language systems, not images, sounds, actions, or other modalities. This cognitive pathway represents not a technical problem but a stable structure at the cognitive psychology level.
Cognitive Deviation Creation
Against this backdrop, when "word" expands to so-called "generalized words," it actually creates deviation within user cognition. Users first form intuitive understanding of "word = language unit," not the abstract concept of "cross-modal symbolic unit." Once this misunderstanding establishes itself, all subsequent explanations become corrections to existing cognition rather than natural understanding extension.
Practical Example
For instance, when media report "model trained using 10 trillion word-units," the public easily interprets this as "reading massive text," overlooking the substantial images, speech, and other modal data included. This misunderstanding represents not isolated cases but systematic induction produced by the terminology itself's semantic anchoring.
Engineering Context Impact
Within actual engineering contexts, this naming may also bring interdisciplinary communication friction. When discrete units in vision models or speech models are called "words," it not only easily triggers semantic misunderstanding but also creates unnecessary linguistic conflicts between different fields.
Multimodal systems require "symbol layer" unification, not language category expansion.
Comparative Advantage
Relatively speaking, "symbol" as a more abstract concept, while having slightly higher initial understanding thresholds, demonstrates more neutral semantic direction, not pre-locking cognition at the language layer. During long-term use, it more readily establishes stable, unified cognitive frameworks, thereby reducing overall explanation costs and providing more stable cognitive foundations for multimodal unification.
Key Insight: Naming costs don't occur at definition time but during correction time. Once early naming forms semantic anchoring, subsequent cognitive repair costs rise exponentially.
Experts can expand "word's" boundaries through analogy, but the public won't understand concepts through analogy. Naming serves not experts but the entire era's cognitive system.
Section 4: Monosemy Illusion—When One Word Attempts Carrying Two Systems
Expert Position (Terminology Standardization Principle): "词元" conforms to monosemy principles, helping resolve translation confusion.
Regarding terminology monosemy, we must pay special attention to systematic risks potentially triggered by "one word, two meanings." Within scientific terminology standardization, "monosemy" represents a foundational principle. When a term requires depending on context or additional explanation to distinguish meanings, its value as a standard component has already been lost.
However, from the existing academic system perspective, this judgment still warrants further discussion space.
Existing Terminology Conflict
The term "词元" has long been "claimed" in linguistics and Natural Language Processing (NLP) fields. Within classical linguistics, its long-term corresponding English concept is Lemma—the word's normalized original form (for example, the lemma of is/am/are is "be"). This usage has formed stable consensus in linguistics and NLP foundational textbooks and academic papers.
Against this backdrop, translating Token also as "词元" easily produces semantic conflict in concrete expressions, creating disastrous on-site situations.
Catastrophic Expression Example
For instance, when describing "lemmatize a token in NLP," Chinese expression would appear as "对'词元'进行'词元化'" (perform "lemmatization" on "word-unit"). This expression not only increases understanding costs but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether "词元" points to segmented discrete units or the word's normalized original form.
Conceptual Functional Distinction
From a conceptual function perspective, the two also maintain clear distinction:
- Lemma emphasizes "restoration" at the language layer, corresponding to normalized expression after morphological changes
- Token emphasizes "segmentation" during computational processes, corresponding to the smallest discrete unit when models process information
This "restoration" versus "segmentation" difference precisely corresponds to different dimensions of semantic layer and symbolic layer.
Stability Erosion
Therefore, when a term requires depending on "generalization" to simultaneously cover multiple existing concepts, its monosemy actually transforms from "semantic stability" to "explanatory unification."
When a term requires explanation to maintain unification, its stability as a standard terminology often begins shaking.
Superior Alternative
In contrast, "符元" possesses no semantic conflict within existing terminology systems. On one hand, it retains Token's discrete symbol ontological attributes; on the other hand, it avoids overlap with Lemma's existing translation, thereby demonstrating higher stability in semantic clarity and system consistency.
Section 5: Ontological Return—Token is Essentially "Symbol," Not "Word"
Expert Position (General Explanation): Token represents the smallest unit used for processing text in language models.
This expression holds validity at the functional level but remains at the "how to use" layer without touching its ontological attributes within computational theory. From information theory and computational theory perspectives, the basic objects processed by computational systems are not "words" but "symbols."
Information Theory Perspective
This can be further understood from two levels:
First, from an information theory perspective, information's essence lies in eliminating uncertainty, with measurement units being bits and carrying entities being discrete symbols. Symbols don't concern semantic content but relate only to probability distributions and encoding structures.
Second, at the computational implementation level, large model 底层 don't "recognize characters"—their processing objects are discrete index representations (IDs). Regardless of whether this ID corresponds to a Chinese character, an image patch, or an audio sampling point, all participate in computation in unified symbolic form during the computational process.
Symbolic Layer Positioning
Within this framework, Token's essence lies at the "symbolic layer," not the "semantic layer." Symbols themselves don't carry semantics but exist as encoding and computation's basic carriers.
Naming Token as "词元" introduces implicit direction from the language semantic layer to a certain extent, pulling this originally symbolic-layer concept back to a language-centric understanding pathway. While this naming approach may provide intuitiveness at the explanatory level, it easily blurs boundaries between "symbolic computation" and "semantic understanding" at the theoretical level.
Alternative Advantage
In contrast, "符元" remains conceptually within the symbolic layer. On one hand, it accurately reflects Token's discrete symbol computational attributes; on the other hand, it avoids introducing semantic features into ontological definitions, thereby better conforming to information theory and computational theory's basic frameworks.
Broader Perspective
From a broader perspective, as AI systems continuously evolve toward multimodal and general intelligence, if foundational concept naming can directly align with mathematical and computational ontology, it will more readily build stable, expandable cognitive systems. In this sense, a naming pathway centered on "symbolic unit" represents not merely a language choice problem but a consistent expression of computational essence, with "符元" being the natural correspondence within this framework.
Fundamental Principle: Defining concepts from the symbolic layer aligns with computational essence; naming concepts from the semantic layer approaches explanation rather than definition.
Section 6: Linguistic Fracture—Mapping Failure in Back-Translation Mechanisms
Expert Position (Comprehensive Interpretation): "词元" has gradually formed usage foundations in the Chinese academic community, possessing certain dissemination advantages.
Within cross-language contexts, we must remain vigilant regarding systematic impacts brought by terminology "back-translation fracture." Measuring whether a scientific terminology possesses long-term vitality depends not only on its expressive capability within Chinese contexts but more on whether it can achieve stable mapping within international academic systems. Ideal terminology should possess "reversibility," achieving semantically consistent round-trips between different languages.
Back-Translation Instability
The above judgment reflects "词元's" acceptability within local contexts, but from a cross-language perspective, further discussion space still exists. If a term establishes itself only within a single language system without forming stable correspondence in international contexts, it may introduce additional understanding costs in academic exchanges.
Specific Manifestation
Specifically, "词元" lacks clear, unique corresponding pathways during back-translation processes. When restored to English, it often diverges among multiple approximate concepts:
- "word unit" lacks strict academic definition
- "morpheme" corresponds to linguistics' morphological units
- "lexeme" points to lexical positions
None of these concepts can accurately cover Token's meaning in computational contexts, instead introducing categorical偏移.
Stable Alternative
In contrast, "符元" can relatively naturally correspond to "symbolic unit." This concept possesses clear theoretical foundations and stable usage in information theory, discrete mathematics, and multimodal representation fields, maintaining consistent semantic direction across different contexts. Therefore, it more readily forms one-to-one mapping relationships between Chinese and English.
Practical Impact
From a practical perspective, once terminology enters academic papers, technical documents, and international exchange scenarios, its back-translation capability directly impacts expression efficiency and understanding accuracy. If a term requires additional explanation to complete cross-language conversion, its long-term usage costs will continuously accumulate.
Therefore, within cross-language systems, "词元's" main problem lies in unstable mapping pathways, while "符元" demonstrates higher certainty in semantic correspondence and conceptual consistency. Against the backdrop of increasingly globalized AI, selecting terminology with good back-translation characteristics will more readily build open, interoperable academic and technical systems.
Key Insight: Terminology's international reversibility essentially becomes the key benchmark for whether it possesses long-term academic vitality.
Section 7: Unification Misconception—Formal Consistency Doesn't Equal Structural Consistency
Expert Position (Comprehensive Expert Opinion): "词元" maintains consistent expressive style with terms like "embedding" and "attention," being concise and abstract, conforming to Chinese technical contexts.
Conclusion First: Terminology system unification should be built upon "conceptual isomorphism," not "linguistic homomorphy."
Within "词元's" supporting arguments, a common rationale states: its expressive style maintains consistency with terms like "embedding" and "attention," being concise and abstract, conforming to Chinese technical contexts. This reason grasps terminology systems' genuine need for unification, but the problem lies in—if unification remains only at the language level rather than the structural level, it slides from "order" to "illusion."
Why "Embedding" and "Attention" Work
"Embedding" and "attention" became stable terminology because they correspond to clear computational structures:
- The former represents vector mapping
- The latter represents weight mechanisms
Their naming directly points to computational essence. "词元," however, belongs to explanatory naming, with its rationality depending on the "generalized word" analogical framework. Once脱离 explanation, this naming itself lacks self-consistent structural direction.
Critical Problem: Formal Consistency, Semantic Shift
This difference brings a key problem:
- Formal consistency reduces expression costs
- Semantic shift guarantees cognitive stability
If prioritizing "linguistic homomorphy," complexity doesn't disappear but transfers to long-term cognitive burdens. Only naming built upon "conceptual isomorphism" can maintain stability during cross-contextual and multimodal evolution.
Structural Misalignment
When "embedding," "attention," and "词元" appear together, they easily form an illusion of "conceptual same layer." But actually:
- The first two are mechanisms, the latter is an object
- The first two possess strict definitions, the latter depends on contextual explanation
This structural misalignment buries implicit fractures within cognitive systems.
Systemic Impact
More importantly, when a foundational concept's naming depends on analogy rather than structural definition, its impact won't stay within a single term but diffuses to the entire terminology system. When subsequent concepts attempt expanding around this naming, they must continuously maintain consistency through explanation, thereby forming implicit structural misalignment.
Alternative Pathway
In this sense, "符元" provides an expression pathway closer to underlying structures. It directly points to computational systems' basic object—symbol—requiring no analogical explanation to maintain consistency across different contexts.
Profound Truth: Terminology isn't merely labels but cognitive entrances. Good terminology makes explanation gradually disappear; poor terminology makes annotations continuously increase. When foundational concepts deviate from structure, terminology systems can only maintain through explanation, unable to achieve self-consistency through definition.
Conclusion
From an essential perspective, terminology selection isn't merely a language problem but represents early shaping of a field's cognitive structure. Once naming deviates from its structural ontology during initial stages, subsequent systems can only operate through continuous explanation, unable to form self-consistent conceptual networks.
During AI's advancement toward generalization and multimodal fusion, a terminology capable of aligning with computational ontology and possessing cross-contextual stability will more likely become a long-term effective cognitive cornerstone. In this sense, a naming pathway centered on "symbolic unit" demonstrates more balanced adaptability in balancing technical essence and cognitive clarity.
The choice between "词元" and "符元" transcends mere translation preference—it represents a fundamental decision about how we conceptualize and communicate the building blocks of artificial intelligence. As the field continues evolving toward increasingly multimodal and generalized systems, the terminology we choose today will shape understanding and development patterns for years to come.
The debate itself reflects the healthy maturation of AI as a discipline, where foundational concepts receive the scrutiny and careful consideration they deserve. Whether the community ultimately embraces "词元," "符元," or arrives at yet another solution, the important outcome is the thoughtful examination of how our language shapes our understanding of the technology we're building.