From "Word Unit" to "Symbol Unit": The AI Cognitive Debate Behind Token's Chinese Translation
Introduction: The Official Translation Announcement
Recently, the National Committee for Terms in Sciences and Technologies announced a public notice recommending the translation of "Token" in the artificial intelligence field as "词元" (Word Unit), opening it for public trial use. Subsequently, People's Daily published an article titled "Expert Interpretation: Why Token's Chinese Name is Determined as 'Word Unit'," systematically explaining this naming from a professional perspective.
The article mentions that the term "token" originates from Old English "tācen," meaning "symbol" or "mark." In language models, token represents the minimum discrete unit obtained after text segmentation or byte-level encoding, which can manifest as words, subwords, affixes, or characters in different forms. Models demonstrate certain intelligent capabilities precisely through modeling token sequences.
This translation name was considered to conform to principles of unambiguity, scientificity, conciseness, and coordination within the expert argumentation system, also possessing certain usage foundations in the current Chinese context. However, after reading relevant interpretations, I formed a different understanding of this naming path.
From a standardization perspective, this naming scheme possesses comprehensibility and communication advantages in the short term. But examining from dimensions of computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability requires further examination. Against this background, an equally worthy alternative path—"符元" (Symbol Unit)—gradually reveals stronger structural consistency and cross-context stability.
Section 1: Definition Dislocation—Cannot Replace "Essence" with "Origin"
Expert Viewpoint
According to Chen Xilin, researcher at the Institute of Computing Technology, Chinese Academy of Sciences: Token's initial role in artificial intelligence is the "basic semantic unit of language," therefore "词元" can better align with its essence.
Critical Analysis
This judgment holds rationality within historical contexts, but in the present era of significant technological paradigm leaps, this thinking essentially represents "academic boat-carving for swords."
At the logical level of terminology definition, we must strictly distinguish between "initial application scenarios" and "structural essential attributes."
Token indeed originated from Natural Language Processing (NLP), but within AGI's evolutionary path, it has long transcended language model boundaries, evolving into a fundamental unit uniformly processing text, images, speech, and even physical signals. In modern computing systems, Token's true structural ontology is "discrete symbolic unit," not a single-modality language unit.
Historical Counterexamples:
If naming follows "initial role" logic:
- Computer should still be called "Electronic Calculator" (originating from its initial function replacing human calculators)
- Internet should be called "Cold War Military Network"
This naming logic's fatal flaw: it only sees technology's "temporary occupation" at specific historical moments, ignoring its "physical ontology" transcending eras.
Historical paths cannot equate to essential attributes. Similarly, we cannot permanently lock Token within the narrow context of "words" simply because it was initially used for text processing.
Defining fundamental concepts using "initial application scenarios" essentially replaces structural ontological truth with historical path dependency. This definition may provide understanding convenience during technology's early stages, but during the paradigm expansion phase of multimodal explosion, it rapidly becomes ineffective and transforms into a shackle hindering cognition.
In contrast, "符元" directly aligns with the symbolic ontology of cross-modal computation. It defines not Token's "past" but Token's "truth."
Section 2: Analogy Boundaries—Explanation Begins Deviating Once It Becomes Definition
Expert Viewpoint
According to Dong Yuxiao, associate professor at Tsinghua University's Department of Computer Science: Through analogies like "word cloud" and "bag of words," discrete units in multimodal contexts can be understood as "generalized words."
Critical Analysis
Professor Dong's analogy aids understanding but shouldn't replace definition. This thinking possesses certain heuristic value at the explanation level, but if further elevated to naming basis, it may trigger conceptual category dislocation.
Methodological Perspective:
- Analogy's Function: Lower understanding thresholds
- Definition's Responsibility: Delineate semantic boundaries
When "word" expands to cover image patches, speech segments, vector representations (embeddings), and even broader perceptual signals, its original language attributes become continuously diluted, and semantic boundaries tend toward blur. This "analogy-driven" expansion path can maintain explanatory consistency in the short term but容易造成 semantic drift in long-term evolution.
In cross-modal expansion capability, we must guard against "analogy" sliding toward "definition." In terminology standardization contexts, we must distinguish boundaries between "explanatory metaphors" and "ontological definitions," avoiding the former replacing the latter.
Intuitive Comparison:
In popular science contexts, we can analogize light bulbs as "artificial suns" to enhance understanding intuitiveness. But within scientific naming systems, it's impossible to rename the current unit "Ampere" as "Light Unit" based on this. The former belongs to descriptive expression, while the latter involves strict measurement systems and standardized definitions—the two cannot be mixed.
Similarly, terms like "word cloud" and "bag of words" essentially belong to descriptive or statistical metaphors. Their function aids in understanding data structures or distribution patterns. Token, as a fundamental measurement unit in large models, has deeply embedded into computing power billing, model training, and academic measurement systems. When its usage scale reaches hundreds of billions to trillions of daily calls, the naming carries not just explanatory function but a fundamental concept with engineering and standard significance.
At this level, terminology更需要 align with its ontological attributes rather than rely on analogy extension.
Dangerous Premise:
If we further push this analogy logic to the naming level, it implicitly contains a dangerous premise: since people have become accustomed to understanding Token using "word," let's continue using this analogy. But this actually represents path dependency continuation—using existing cognitive convenience to replace conceptual ontology correction. In this sense, this naming approaches "linguistic romanticism" rather than strict alignment with computational ontology.
We cannot require discussing "electronic horses" in motors simply because "horsepower" contains "horse." Analogy can inspire understanding but cannot define standards.
In contrast, "symbol" as a more neutral concept naturally possesses cross-modal adaptability. It can cover text, images, speech, and other information forms without additional explanation. Therefore, a naming path centered on "symbolic unit" at the definition level approaches Token's structural essence more closely. Within this logic, "符元" as the corresponding translation name possesses higher conceptual consistency and long-term adaptability.
Section 3: Cognitive Cost—When Semantic Anchors Create Systematic Misunderstanding
Expert Viewpoint
According to comprehensive expert opinions: "词元" expression is concise, conforms to Chinese habits, and facilitates dissemination.
Critical Analysis
This judgment possesses certain rationality at the communication level, but its implicit premise is: the public can accept "word's" cross-modal analogy. However, analogy essentially represents an expert thinking tool, not the public's natural cognitive approach.
For ordinary users, "word" possesses extremely strong semantic anchoring effect—once hearing "word," their intuition necessarily points to language systems, not images, sounds, actions, or other modalities. This cognitive path isn't a technical problem but a stable structure at the cognitive psychology level.
Against this background, when "word" expands to so-called "generalized word," it actually already creates deviation in user cognition. Users first form intuitive understanding of "word = language unit," not the abstract concept of "cross-modal symbolic unit." Once this misunderstanding is established, all subsequent explanations become corrections to existing cognition rather than natural understanding extension.
Practical Example:
When media report "model trained using 10 trillion word units," the public easily understands this as "reading massive text," ignoring the large quantity of images, speech, and other modal data included. This misunderstanding isn't isolated but systematic induction produced by the terminology itself's semantic anchoring.
In actual engineering contexts, this naming may also bring friction to interdisciplinary communication. When discrete units in vision models or speech models are called "words," it not only easily causes semantic misunderstanding but also creates unnecessary language conflicts between different fields. Multimodal systems need "symbol layer" unification, not language category expansion.
In contrast, "symbol" as a more abstract concept, although having slightly higher initial understanding thresholds, possesses more neutral semantic direction. It doesn't pre-lock cognition at the language layer. In long-term use, it more benefits establishing stable, unified cognitive frameworks, thereby reducing overall explanation costs and providing more stable cognitive foundations for multimodal unification.
The cost of naming doesn't occur at definition time but at correction time. Once early naming forms semantic anchoring, the cost of subsequent cognitive repair rises exponentially.
Experts can expand "word's" boundaries through analogy, but the public won't understand concepts through analogy. Naming serves not experts but the entire era's cognitive system.
Section 4: Unambiguity Illusion—When One Word Attempts to Carry Two Systems
Expert Viewpoint
According to terminology standardization principles: "词元" conforms to the unambiguity principle, helping resolve translation chaos problems.
Critical Analysis
Regarding terminology unambiguity, we need to particularly pay attention to systematic risks that "one word, two meanings" may trigger. In scientific terminology standardization, "unambiguity" is one of the fundamental principles. If a terminology requires relying on context or additional explanation to distinguish meanings, then its value as a standard component has already been lost.
However, from existing academic systems, this judgment still has space for further discussion. "词元" has long been "claimed" in linguistics and Natural Language Processing (NLP) fields. In classical linguistics, its long-term corresponding English concept is Lemma, meaning the word's normative original form (for example, is/am/are's Lemma is be). This usage has formed stable consensus in linguistics and NLP basic textbooks and academic papers.
Against this background, if Token is also translated as "词元," then in concrete expression, semantic conflicts easily arise, creating disastrous scenes.
Example:
When describing "lemmatize a token in NLP," Chinese expression will appear as "perform 'lemmatization' on 'word unit'." This expression not only increases understanding costs but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether "词元" points to segmented discrete units or the word's normative original form.
From conceptual function perspectives, the two also have clear distinctions:
- Lemma emphasizes "restoration" at the language level, corresponding to normative expression after morphological changes
- Token emphasizes "segmentation" during computational processes, corresponding to the minimum discrete unit when models process information
This difference between "restoration" and "segmentation" precisely corresponds to different dimensions of semantic layer and symbol layer.
Therefore, when a terminology needs to simultaneously cover multiple existing concepts through "generalization," its unambiguity actually has transformed into "explanatory unification" rather than "semantic stability."
When a terminology needs to maintain unification through explanation, its stability as a standard terminology often has already begun to shake.
In contrast, "符元" has no semantic conflicts in existing terminology systems. On one hand, it retains Token's ontological attributes as discrete symbols; on the other hand, it also avoids overlap with Lemma's existing translation name, thereby demonstrating higher stability in semantic clarity and system consistency.
Section 5: Ontological Return—Token is Essentially "Symbol," Not "Word"
Expert Viewpoint
According to general explanations: Token is the minimum unit used for processing text in language models.
Critical Analysis
This expression holds at the functional level but still stays at the "how to use" level without touching its ontological attributes in computational theory. From information theory and computational theory perspectives, the basic objects processed by computing systems aren't "words" but "symbols."
This can be further understood from two levels:
On one hand, from information theory perspective:
Information's essence lies in eliminating uncertainty. Its measurement unit is bits (bit), and its carrying entity is discrete symbols. Symbols don't concern semantic content but only relate to probability distribution and encoding structures.
On the other hand, from computational implementation level:
Large models at the underlying level don't "recognize characters." Their processing objects are discrete index representations (ID). Regardless of whether this ID corresponds to a Chinese character, an image patch, or an audio sampling point, during computational processes, all participate in operations in unified symbolic form.
Within this framework, precisely because its essence lies at the "symbol layer," not the "semantic layer." Symbols themselves don't carry semantics but exist as encoding and computation's fundamental carriers.
Naming Token as "词元" to a certain extent introduces implicit direction from the language semantic layer, pulling this concept originally at the symbol layer back to the understanding path centered on language. This naming method may provide intuitiveness at the explanation level but at the theoretical level easily blurs boundaries between "symbolic computation" and "semantic understanding."
In contrast, "符元" conceptually remains within the symbol layer. On one hand, it accurately reflects Token's computational attributes as discrete symbols; on the other hand, it also avoids introducing semantic features into ontological definitions, thereby conforming more to information theory and computational theory's basic frameworks.
From a broader perspective, as artificial intelligence systems continuously evolve toward generalization and multimodal fusion, if fundamental concept naming can directly align with its mathematical and computational ontology, it will more benefit building stable, scalable cognitive systems. In this sense, a naming path centered on "symbolic unit" is not just a language choice problem but a consistency expression of computational essence. "符元" is precisely the natural correspondence within this framework.
Defining concepts from the symbol layer represents alignment with computational essence; naming concepts from the semantic layer approaches explanation rather than definition.
Section 6: Language Fracture—Mapping Failure in Back-Translation Mechanisms
Expert Viewpoint
According to comprehensive interpretations: "词元" has gradually formed usage foundations in Chinese academia, possessing certain communication advantages.
Critical Analysis
In cross-language contexts, we need to guard against systematic impacts brought by terminology "back-translation fracture." Measuring whether a scientific terminology possesses long-term vitality depends not only on its expressive ability within Chinese contexts but more on whether it can achieve stable mapping in international academic systems.
Ideal terminologies should possess "reversibility," meaning they can achieve semantically consistent round-trips between different languages.
The above judgment reflects "词元's" acceptability within local contexts, but from a cross-language perspective, there's still space for further discussion. If a terminology only holds within a single language system but cannot form stable correspondence in international contexts, it may introduce additional understanding costs in academic exchanges.
Specifically, "词元" lacks clear, unique corresponding paths during back-translation processes. When restored to English, it often creates divergence among multiple approximate concepts:
- "word unit" lacks strict academic definition
- "morpheme" corresponds to linguistics' morphemes
- "lexeme" points to lexical positions
None of these concepts can accurately cover Token's meaning in computational contexts, instead introducing category offsets.
In contrast, "符元" can relatively naturally correspond to "symbolic unit." This concept possesses clear theoretical foundations and stable usage in information theory, discrete mathematics, and multimodal representation fields, maintaining consistent semantic direction across different contexts. Therefore, it更容易 form one-to-one mapping relationships between Chinese and English.
From a practical perspective, once terminologies enter academic papers, technical documents, and international exchange scenarios, their back-translation capabilities will directly impact expression efficiency and understanding accuracy. If a terminology requires additional explanation to complete cross-language conversion, its long-term usage costs will continuously accumulate.
Therefore, in cross-language systems, the main problem "词元" faces lies in mapping path instability, while "符元" demonstrates higher certainty in semantic correspondence and conceptual consistency. Against the background of increasingly globalized artificial intelligence, selecting terminologies with good back-translation characteristics will more benefit building open, interoperable academic and technical systems.
Terminology's international reversibility is essentially the key benchmark for whether it possesses long-term academic vitality.
Section 7: Unification Misconception—Formal Consistency Doesn't Equal Structural Consistency
Expert Viewpoint
According to comprehensive expert opinions: "词元" maintains consistent expression style with terminologies like "embedding" and "attention"—concise, abstract, conforming to Chinese technical contexts.
Critical Analysis
Conclusion First: Terminology system unification should be built upon "conceptual isomorphism," not "linguistic homomorphism."
In "词元's" supporting arguments, a common reason is: its expression style maintains consistency with terminologies like "embedding" and "attention"—concise, abstract, conforming to Chinese technical contexts. This reason captures the real demand that terminology systems need unification, but the problem lies in—if unification only stays at the language level rather than the structural level, it will slide from "order" to "illusion."
Why "embedding" and "attention" became stable terminologies:
Because they correspond to clear computational structures:
- The former is vector mapping
- The latter is weight mechanism
Their naming directly points to computational essence.
Whereas "词元" belongs to explanatory naming. Its rationality relies on the "generalized word" analogy framework. Once脱离 explanation, this naming itself doesn't possess self-consistent structural direction.
This difference brings a key problem: Formal consistency, semantic offset.
- The former reduces expression costs
- The latter guarantees cognitive stability
If prioritizing pursuit of "linguistic homomorphism," complexity won't disappear but transfers into long-term cognitive burden. Only naming built upon "conceptual isomorphism" can maintain stability during cross-context and multimodal evolution.
When "embedding," "attention," and "词元" appear together, it easily forms the illusion of "conceptual same layer." But actually, the former two are mechanisms, the latter is an object; the former two possess strict definitions, the latter relies on contextual explanation. This structural misalignment will bury implicit fractures within cognitive systems.
More importantly, when a fundamental concept's naming relies on analogy rather than structural definition, its impact won't stay within a single terminology but will diffuse to the entire terminology system. When subsequent concepts attempt to unfold around this naming, they will have to continuously maintain consistency through explanation, thereby forming implicit structural dislocation.
In this sense, "符元" provides an expression path closer to underlying structures. It directly points to the basic object in computing systems—symbol. Without relying on analogy explanation, it can maintain consistency across different contexts.
Terminology isn't just labels but cognition's entrance. Good terminologies make explanations gradually disappear; poor terminologies make annotations continuously increase. When fundamental concepts deviate from structures, terminology systems can only maintain through explanation, not achieve self-consistency through definition.
Conclusion
From an essential perspective, terminology selection isn't just a language problem but early shaping of a field's cognitive structure. Once naming deviates from its structural ontology at the initial stage, subsequent systems can only operate through continuous explanation, unable to form self-consistent conceptual networks.
In artificial intelligence's process toward generalization and multimodal fusion, a terminology that can align with computational ontology and possess cross-context stability will more possibly become a long-term effective cognitive cornerstone. In this sense, a naming path centered on "symbolic unit" demonstrates more balanced adaptability in balancing technical essence and cognitive clarity.
The choice between "词元" and "符元" isn't merely about translation preference—it represents a fundamental decision about how we conceptualize and communicate the building blocks of artificial intelligence across languages and cultures.