(This is a Chinese translation of my previous blog post: Our paroqial fermament, one tide on another, written in mixed cn-tw vocabularies.)
(這是我前一篇部落格的中譯,標題原本引 Joyce,實在沒法譯,只好改周德偉的對聯,是當年辦藝立協時,每周在牆上看到的。)
Adina Levin 要我講講中文推特的資訊密度與英文有何不同,我答「且等咱們社算表(原圍紀算表)運行速度上的瑕疵都修好了再說。」
昨天我除完了蟲,這下只好勉力用英語編程了。
我首先想說的是,Ken 當年創發 UTF-8 編碼(見 Rob Pike 的現場實錄),實在已正確反映了英美字母和中日韓字元間的資訊差異。
想想,這通用字符集從 16 位元被迫擴充到了 21 位元,豈不正是「多如繁星,萬碼奔騰」的中文字害的麼?即便硬生生搞個漢同文,滅了無數重複字元,到頭來雙位元組仍是不敷使用。
假設推特的限制是 140 個 UTF-8「位元組」,那我們寫中文推特的感覺,和寫英文或許相去不遠。因為每個中文字占 3 個位元組——有時我用些生僻古字,超出了基本多語面,那就要占 4 個位元組了。
但歷史的意外,讓推特算的是 140 個「字符」。(不是「語素」,這我親身求證過了。)如此一則中文推特,就有 420 個位元組可用,相當於一篇博客短文了。
...且慢,還沒完呢!
要知道,中文有兩種寫法:Vernacular (白話) 和 Literary (文言)。這白話呢,通常用兩個字符,來代表一個英文詞兒。像「網絡」,就是「Network」的意思。
在文言裡,每個字表示一組「概念」,像「網」,英文的「net」,可以是「網路」、「漁網」、「連網」、「網羅」的意思,全依上下文脈絡決定。
寫文言文時,字與字間既沒有空白,標點句讀也省略不少,這樣就有 140 組「概念」可用,換成英文得用 200 個詞來表示,也就是 1kb 的信息量。
理論如上,介紹完畢,接下來看看我最近的三則推特,小心求證一番。
第一則完全是大白話,第二則半文半白,第三則(最近的一則)則幾乎全是文言。
在白話到文言的過程裡,信息密度應該要遞增,這可以用英文譯本的長度來計算。
第一則是引陳映真(1936 年生,台灣當時影響力最大的作家及運動者之一)最近說過的話:
谷歌將它譯成:RT 陳映真:「文學退化,影像、聲音成為『當下世界』的符碼,」他說:「托爾斯泰生在今日,大部頭的作品也會喪失大量讀者。」現在的創作流行輕薄短小,又以自 我為中心,他因此形容這一代青年創作者「是脫光了衣服站在鏡子前面,凝視鏡中自己的身體與慾望… 他們讀的不多,不能成就崇高的文學」。(139 字)
RT Chen Ying-chen: "Literature degradation, images, sounds become codes of present world," he said: "Tolstoy was born in today, voluminous works of the loss of a large number of readers will be."
Now the creation of popular thin and light short, Youyi self-centered, so he described this generation of young artists "is stripped of clothes and stand in front of the mirror, staring in the mirror his own body and desires ... they read much, does not develop high literature."
(475 字)
這譯本大致無誤,但小錯比比皆是(更別提對青年創作者身體的性別岐視了)。而最後一句更是整個譯錯:「他們讀得少」竟然譯成「他們讀得多」!
但至少這譯本還能達意,我們不妨除除看:中文與英文字元之比,是 3.4 倍。
接著第二則是我對陳的回應:
Re 陳映真:但背了整本維基,行遍無數國度,亦不成就崇高的文學。歐巴馬「以父之名」成不了作家,只好化藝術為行動、為現實。上一輩在高壓下,被迫濃粹經驗為符碼,而我們讀遍之後,融合歷史視域,還原此在而為 Hacktivism,竊自以為然。 (116 字)
谷歌的譯本這下亂了套,幾乎沒法看懂了。比例是 3.8 倍:
Chen Ying-chen Re: But the back of the whole wiki, line countless times a country, nor the noble achievements of literature. Obama, "Name of the Father" can not become a writer had to arts-based action, into reality. The last generation under high pressure, was forced to experience concentrated Intrade codes, and we read times, the integration of history, depending on the domain, restore this in the for Hacktivism, stolen from that it does.
(444 字)
我的翻譯如下,中英比率是 5.7:
Re Chen Ying-Zhen: But having an entire Wikipedia-backed memory, and having traveled to countless countries, still won't make high literature out of me.
Consider Obama who, having failed to launch a writer's career with "Dreams from my Father", is forced to project his art into activism and into reality-shaping.
My earlier generation, under tremendous Fascist pressure, is forced to distill their subjective experience to highly compressed literary code.
When my generation finished deciphering those codes to achieve a fusion of horizons, we decompressed them into the here-and-now as Hacktivism. This kind of adaption is IMHO natural and quite justified.
(662 字)
緊接著看第三則,近一步解釋前面的看法,大致用文言寫成:
舉實例言,《金盾工程》(即「功夫網」、「資訊長城」),無異吾儕之柏林牆。我十年前譯寫自由網,而至其後 Tor、無界等武裝,無非保持對話、以獨促統之意。與 Beijing.pm 嘗言:「功網未散,何以為族?」。凡此種種,生自文事,衍為武功,皆此代之共業。 (127 字)
谷歌譯出 509 字來,長則長矣,意義盡失(「促進意大利之統一」??),故不在此引述。茲譯如下:
As a concrete example, for our contemporary people, the "Golden Shield Project" (a.k.a., the "Gong-Fu Web", "Great Firewall of China") is no different from the Berlin Wall.
Ten years ago I translated and coded for the Freenet project; along with its follow-ups such as Tor and Wu-Jie, they're nothing but cyberspace armaments designed with the sole purpose of keeping the conversation flowing.
This way we maintain our independent identity, in the hope of accelerating a fusion of horizons with the Chinese government.
I've told Beijing.pm: "Without the dissipation of the Great Firewall, how can we make one people out of us?"
All these circumstances, born out of the literary world, has derived into the cyberspace hacktivism, and affected us in creating a software-shaped reality. This is a factor in the shared karmic setting of this generation.
(848 字)
正如所料,密度比例達到了 6.67 倍。切記,上述範例並非刻意作成:中文推特寫手,確實活在 2~8 倍於英文寫手的資訊密度當中。