From Token to Symbol-Unit: The AI Cognitive Debate Behind Token's Chinese Translation

Introduction

Recently, the National Committee for Terms in Sciences and Technology announced a public notice recommending the translation of "Token" in the artificial intelligence field as "词元" (Ci Yuan), opening it for public trial use. Subsequently, People's Daily published an article titled "Expert Interpretation: Why Token's Chinese Name Is Determined as '词元'", systematically explaining this naming from a professional perspective.

The article mentions that the word "token" originates from ancient English "tācen", meaning "symbol" or "mark". In language models, token represents the minimum discrete unit obtained after text segmentation or byte-level encoding, which can manifest in different forms such as words, subwords, affixes, or characters. Models demonstrate certain intelligent capabilities precisely through modeling token sequences.

This translation name was considered to conform to principles of monosemy, scientificity, conciseness, and coordination within the expert argumentation system, also possessing certain usage foundations in the current Chinese context. However, after reading relevant interpretations, I formed a different understanding of this naming path.

From a standardization perspective, this naming scheme offers comprehensibility and communication advantages in the short term. However, examining dimensions such as computational ontology, information structure, multimodal evolution, and back-translation consistency reveals its long-term adaptability requires further verification. Against this background, an equally worthy alternative path—"符元" (Fu Yuan)—gradually demonstrates stronger structural consistency and cross-context stability.

The Definition Dislocation: Cannot Replace "Essence" with "Origin"

Expert Viewpoint (Chen Xilin, Researcher at Institute of Computing Technology, Chinese Academy of Sciences): Token's initial role in artificial intelligence is "basic semantic unit of language", therefore "词元" can better align with its essence.

This judgment holds rationality within historical context, but in the current era of technological paradigm leaps, this thinking essentially represents "academic carving swords on moving boats"—seeking fixed markers in flowing contexts.

At the terminology definition logic level, a strict distinction must be made between "initial application scenarios" and "structural essential attributes".

Token indeed originated from natural language processing (NLP), but in AGI's evolutionary path, it has long transcended language model boundaries, evolving into a fundamental unit uniformly processing text, images, speech, and even physical signals. In modern computing systems, Token's true structural ontology is "discrete symbol unit", not a single-modality language unit.

If naming follows "initial role", Computer should still be called "Electronic Calculator" (originating from its initial function replacing human calculators); Internet should be called "Cold War Military Network". This naming logic's fatal flaw lies in: it only sees technology's "temporary occupation" at specific historical moments, ignoring its "physical ontology" across eras.

Historical paths cannot equate to essential attributes. Similarly, we cannot permanently lock Token within the narrow context of "words" simply because it was initially used for text processing.

Defining basic concepts using "initial application scenarios" essentially replaces structural ontological truth with historical path dependency. Such definitions may provide understanding convenience in technology's early stages, but during the paradigm expansion phase of multimodal explosion, they rapidly become ineffective and transform into cognitive shackles. In contrast, "符元" directly aligns with cross-modal computing's symbolic ontology—it defines not Token's "past" but Token's "truth".

Analogy Boundaries: When Explanation Becomes Definition, Deviation Begins

Expert Viewpoint (Dong Yuxiao, Associate Professor at Tsinghua University's Computer Science Department): Through analogies like "word cloud" and "bag of words", discrete units in multimodal contexts can be understood as "generalized words".

Professor Dong's analogy aids understanding but shouldn't replace definition. This approach holds certain heuristic value at the explanation level, but if further elevated to naming basis, it may trigger conceptual category dislocation.

Methodologically, analogy's function lies in lowering understanding thresholds, while definition's responsibility involves drawing semantic boundaries. When "word" expands to cover image patches, speech segments, vector representations (embeddings), and even broader perceptual signals, its original language attributes become continuously diluted, semantic boundaries tend toward blurring. This "analogy-driven" expansion path can maintain explanatory consistency in the short term, but easily causes semantic drift during long-term evolution.

In cross-modal expansion capability, vigilance against "analogy" sliding toward "definition" is necessary. Within terminology standardization contexts, distinguishing boundaries between "explanatory metaphors" and "ontological definitions" must be maintained, preventing the former from replacing the latter.

A more intuitive comparison: In popular science contexts, we can analogize light bulbs as "artificial suns" to enhance understanding intuitiveness; but within scientific naming systems, it's impossible to rename the current unit "Ampere" as "Light Unit" based on this. The former belongs to descriptive expression, the latter involves strict measurement systems and standardized definitions—the two cannot be confused.

Similarly, terms like "word cloud" and "bag of words" essentially belong to descriptive or statistical metaphors, their function aiding understanding of data structures or distribution patterns; while Token as a fundamental measurement unit in large models has deeply embedded into computing power billing, model training, and academic measurement systems. When its usage scale reaches daily hundreds of billions to trillions of calls, the naming carries not just explanatory function, but a fundamental concept with engineering and standard significance. At this level, terminology more needs alignment with its ontological attributes rather than relying on analogy extension.

If this analogy logic is further pushed to the naming level, it actually implies a dangerous premise: since people have become accustomed to understanding Token using "words", might as well continue using this analogy. But this actually represents path dependency continuation—using existing cognitive convenience to replace conceptual ontological correction. In this sense, this naming approaches "linguistic romanticism" rather than strict alignment with computational ontology.

We cannot require discussing "electronic horses" in motors simply because "horsepower" contains "horse". Analogy can inspire understanding but cannot define standards.

In contrast, "符" (symbol) as a more neutral concept naturally possesses cross-modal adaptation capability, covering text, images, speech, and other information forms without relying on additional explanation. Therefore, the naming path centered on "symbol unit" at the definition level approaches Token's structural essence more closely. Within this logic, "符元" as the corresponding translation name demonstrates higher conceptual consistency and long-term adaptability.

The Cognitive Cost: When Semantic Anchors Create Systematic Misunderstanding

Expert Opinion (Comprehensive expert views): "词元" expression is concise, conforms to Chinese habits, and facilitates dissemination.

This judgment holds certain rationality at the communication level, but its implicit premise is: the public can accept "word"'s cross-modal analogy. However, analogy is essentially an expert thinking tool, not the public's natural cognitive method. For ordinary users, "word" possesses extremely strong semantic anchoring effects—once hearing "word", their intuition necessarily points to language systems, not images, sounds, actions, or other modalities. This cognitive path isn't a technical problem but a stable structure at the cognitive psychology level.

On this basis, when "word" is expanded to so-called "generalized words", it actually creates deviation in user cognition. Users first form intuitive understanding of "word = language unit" rather than the abstract concept of "cross-modal symbol unit". Once this misunderstanding is established, all subsequent explanations become corrections to existing cognition rather than natural understanding extension.

For example, when media report "model trained using 10 trillion 词元", the public easily understands this as "read massive amounts of text", ignoring the large quantities of images, speech, and other modal data included. This misunderstanding isn't isolated but systematic induction produced by the terminology's own semantic anchoring.

In actual engineering contexts, this naming may also bring friction to cross-disciplinary communication. When discrete units in vision models or speech models are called "words", it not only easily triggers semantic misunderstanding but also creates unnecessary language conflicts between different fields. Multimodal systems need "symbol layer" unification rather than language category expansion.

Comparatively, "符" as a more abstract concept, although having slightly higher initial understanding thresholds, demonstrates more neutral semantic direction, not pre-locking cognition at the language layer. In long-term use, it more benefits establishing stable, unified cognitive frameworks, thereby reducing overall explanation costs and providing more stable cognitive foundations for multimodal unification.

Naming costs don't occur at definition time but at correction time. Once early naming forms semantic anchoring, the cost of subsequent cognitive repair rises exponentially.

Experts can expand "word"'s boundaries through analogy, but the public doesn't understand concepts through analogy. Naming doesn't serve experts but is responsible for the entire era's cognitive system.

The Monosemy Illusion: When One Word Attempts to Carry Two Systems

Expert Viewpoint (Terminology standardization principles): "词元" conforms to monosemy principles, helping resolve translation chaos problems.

Regarding terminology monosemy, special attention must be paid to systematic risks that "one word, two meanings" may trigger. In scientific terminology standardization, "monosemy" is one of the foundational principles. If a terminology requires relying on context or additional explanation to distinguish meanings, its value as a standard component is already lost.

However, from existing academic systems, this judgment still has further discussion space. The term "词元" has long been "claimed" in linguistics and natural language processing (NLP) fields. In classical linguistics, its long-term corresponding English concept is Lemma, referring to words' standardized original forms (for example, the Lemma of is/am/are is be). This usage has formed stable consensus in linguistics and NLP foundational textbooks and academic papers.

Against this background, translating Token also as "词元" easily produces semantic conflicts in specific expressions, creating disastrous situations.

For example, when describing "lemmatize a token in NLP", the Chinese expression会出现 "perform 'lemmatization' on '词元'" structure. This expression not only increases understanding costs but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether "词元" points to segmented discrete units or words' standardized original forms.

From conceptual functionality, the two also have clear distinctions: Lemma emphasizes "restoration" at the language level, corresponding to standardized expressions after morphological changes; while Token emphasizes "segmentation" during computational processes, corresponding to minimum discrete units when models process information. This "restoration" versus "segmentation" difference precisely corresponds to different dimensions of semantic layer and symbol layer.

Therefore, when a terminology needs to cover multiple existing concepts simultaneously through "generalization", its monosemy actually transforms into "explanatory unification" rather than "semantic stability".

When a terminology requires explanation to maintain unification, its stability as a standard terminology often begins to shake.

In contrast, "符元" has no semantic conflicts in existing terminology systems. On one hand, it retains Token's ontological attributes as discrete symbols; on the other hand, it avoids overlap with Lemma's existing translation name, thereby demonstrating higher stability in semantic clarity and system consistency.

Ontological Return: Token Is Essentially "Symbol" Not "Word"

Expert Viewpoint (General explanation): Token is the minimum unit used for processing text in language models.

This expression holds at the functional level but still stays at the "how to use" level without touching its ontological attributes in computational theory. From information theory and computational theory perspectives, the basic objects processed by computing systems are not "words" but "symbols" (symbol).

This can be further understood from two levels:

On one hand, from the information theory perspective, information's essence lies in eliminating uncertainty, its measurement unit is bits (bit), its carrying entity is discrete symbols. Symbols don't concern semantic content but only relate to probability distribution and encoding structure.

On the other hand, at the computational implementation level, large models don't "recognize characters" at the bottom layer; their processing objects are discrete index representations (ID). Regardless of whether this ID corresponds to a Chinese character, an image patch, or an audio sampling point, all participate in computation in unified symbol form during the computational process.

Within this framework, precisely because its essence lies at the "symbol layer" not the "semantic layer". Symbols themselves don't carry semantics but exist as the basic carrier of encoding and computation.

Naming Token as "词元" to some extent introduces implicit direction from the language semantic layer, pulling this concept originally at the symbol layer back to a language-centric understanding path. This naming method may provide intuitiveness at the explanatory level but easily blurs boundaries between "symbolic computation" and "semantic understanding" at the theoretical level.

In contrast, "符元" remains conceptually within the symbol layer. On one hand, it accurately reflects Token's computational attributes as discrete symbols; on the other hand, it avoids introducing semantic features into ontological definitions, thereby better conforming to the basic frameworks of information theory and computational theory.

From a broader perspective, as artificial intelligence systems continuously evolve toward generalization and multimodal fusion, if basic concept naming can directly align with their mathematical and computational ontology, it more benefits building stable, scalable cognitive systems. In this sense, the naming path centered on "symbol unit" is not merely a language choice problem but a consistent expression of computational essence, and "符元" is precisely the natural correspondence within this framework.

Defining concepts from the symbol layer aligns with computational essence; naming concepts from the semantic layer approaches explanation rather than definition.

Language Rupture: Mapping Failure in Back-Translation Mechanisms

Expert Viewpoint (Comprehensive interpretation): "词元" has gradually formed usage foundations in Chinese academia, possessing certain communication advantages.

In cross-language contexts, vigilance against systematic impacts brought by terminology "back-translation rupture" is necessary. Measuring whether a scientific terminology possesses long-term vitality depends not only on its expressive capability in Chinese contexts but more on whether it can achieve stable mapping in international academic systems. Ideal terminologies should possess "reversibility", achieving consistent round-trip semantics between different languages.

The above judgment reflects "词元"'s acceptability in local contexts, but from a cross-language perspective, further discussion space still exists. If a terminology only holds within a single language system without forming stable corresponding relationships in international contexts, it may introduce additional understanding costs in academic exchanges.

Specifically, "词元" lacks clear, unique corresponding paths during back-translation processes. When restored to English, it often produces divergence among multiple approximate concepts: for example, "word unit" lacks strict academic definition, "morpheme" corresponds to linguistics' morphemes, "lexeme" points to lexical items. None of these concepts can accurately cover Token's meaning in computational contexts, instead introducing category offset.

In contrast, "符元" can relatively naturally correspond to "symbolic unit (symbol unit)". This concept possesses clear theoretical foundations and stable usage in information theory, discrete mathematics, and multimodal representation fields, maintaining consistent semantic direction across different contexts. Therefore, it more easily forms one-to-one mapping relationships between Chinese and English.

From a practical perspective, once terminologies enter academic papers, technical documents, and international exchange scenarios, their back-translation capability directly affects expression efficiency and understanding accuracy. If a terminology requires additional explanation to complete cross-language conversion, its long-term usage cost will continue accumulating.

Therefore, within cross-language systems, the main problem "词元" faces lies in mapping path instability, while "符元" demonstrates higher certainty in semantic correspondence and conceptual consistency. Against the background of increasingly globalized artificial intelligence, selecting terminologies with good back-translation characteristics more benefits building open, interoperable academic and technical systems.

Terminology's international reversibility is essentially the key benchmark for whether it possesses long-term academic vitality.

The Unification Misconception: Formal Consistency Doesn't Equal Structural Consistency

Expert Viewpoint (Comprehensive expert opinions): "词元" maintains consistent expression style with terminologies like "embedding" and "attention", being concise and abstract, conforming to Chinese technical contexts.

Conclusion first: Terminology system unification should be built upon "conceptual isomorphism" rather than "linguistic homomorphism".

In arguments supporting "词元", a common rationale is: its expression style maintains consistency with terminologies like "embedding" and "attention", being concise and abstract, conforming to Chinese technical contexts. This reason captures the real demand for terminology system unification, but the problem lies in—if unification only stays at the language level rather than the structural level, it slides from "order" to "illusion".

"Embedding" and "attention" became stable terminologies because they correspond to clear computational structures: the former is vector mapping, the latter is weight mechanisms, their naming directly points to computational essence. While "词元" belongs to explanatory naming, its rationality depends on the "generalized word" analogy framework. Once脱离 explanation, this naming itself lacks self-consistent structural direction.

This difference brings a key problem: formal consistency, semantic offset.

The former reduces expression costs, the latter guarantees cognitive stability. If "linguistic homomorphism" is prioritized, complexity doesn't disappear but transfers into long-term cognitive burden; only naming built upon "conceptual isomorphism" can maintain stability during cross-context and multimodal evolution.

When "embedding", "attention", and "词元" appear together, it easily forms the illusion of "conceptual same layer". But actually, the former two are mechanisms, the latter is an object; the former two possess strict definitions, the latter relies on contextual explanation. This structural misalignment buries implicit fractures within cognitive systems.

More importantly, when a basic concept's naming relies on analogy rather than structural definition, its impact doesn't stay within a single terminology but spreads to the entire terminology system. When subsequent concepts attempt to unfold around this naming, they must continuously maintain consistency through explanation, thereby forming implicit structural misalignment.

In this sense, "符元" provides an expression path closer to underlying structures. It directly points to basic objects in computing systems—symbols (symbol), without relying on analogy explanation, can maintain consistency across different contexts.

Terminology isn't merely labels but cognitive entrances. Good terminologies make explanations gradually disappear; poor terminologies make annotations continuously increase. When basic concepts deviate from structure, terminology systems can only maintain through explanation rather than relying on definitional self-consistency.

Conclusion

Essentially, terminology selection isn't merely a language problem but early shaping of a field's cognitive structure. Once naming deviates from its structural ontology at the initial stage, subsequent systems can only operate through continuous explanation, unable to form self-consistent conceptual networks.

In artificial intelligence's journey toward generalization and multimodal fusion, a terminology capable of aligning with computational ontology and possessing cross-context stability more possibly becomes a long-term effective cognitive cornerstone. In this sense, the naming path centered on "symbol unit" demonstrates more balanced adaptability in balancing technical essence and cognitive clarity.