Wired June 6, 2020 pp40-49 “Flying Fish Is Listening”. “China’s voice-computing giant iFlytek built similar chatty assistants beloved by users. But its technology is also enabling the surveillance state to identify citizens by the sound of their voices”. By Hara Hvistendahl
Learning to decipher, speak, write and read Mandarin is especially difficult for non-native tongue students. Early efforts to transcribe spoken conversations were slow and depended on learned masters capable of rendering as many as 50,000 characters. In the 1980s personal computer software lowered the threshold of expertise and reduced the time-to-transcribe. This was only the beginning-Improvements were needed. Enter Julian Chen a Peking physicist by training. After the culture wars, in 1979 Chen navigated his way over to NYC enrolling in graduate studies at Columbia University. In 1985 after earning a Ph.D. in Physics he joined IBM working at the IBM TJ Watson Research Center. In 1994, although not his forte, working as a Research Staff Member he volunteered for a project striving to simplify the spoken Mandarin transcribing interface.
Julian Chen, Ph.D. (Photo Columbia.edu)
Mandarin has countless homophones-words that sound the same with different meanings, so being a native speaker Chen realized that basing software on the most basic element of speech known as phonemes-distinct units of sound that distinguish one word from another (think in English pad/pat/bad/bat) would be most efficient. Chen and team hired on Chinese-speakers based in NYC and Beijing to read aloud and digitally record Mandarin documents including the “People’s Daily”. Fast forward to 1996, Dr. Chen was invited to a Chinese Technology Conference to publically demonstrate the new IBM-transcribing tool known as ViaVoice. The live presentation of ViaVoice stunned the expert audience transcribing in real time spoken Mandarin to Mandarin Characters projected on screen. First Chen speaking and then a female attendee volunteered to test ViaVoice-the challenge met. In 1997 IBM launched ViaVoice in China “the computer understands Mandarin”.
The Chinese were mortified that an American company found the first real but not perfect transcribing solution. In 1999, wanting to own and improve the process, a 26 year-old doctoral student Qingfeng Liu at University of Science and Technology in Hefei started what was to become iFlyTek (IFK). While working with Huawei, IFK developed voice-activated menus for call centers. IFK went public in 2008 and have many applications leveraging IT and AI etc. One APP allows one to dictate Mandarin in an EMAIL, Web Search and WeChat. Another product is a virtual assistant that works in real time using artificial intelligence to understand speakers of other dialects and languages. IFK also has a tool that translates F2F dialog creating closed captioning in real time. Using IFK, WeChat has 6B voice texts daily. These voice texts are considered superior to voice message or texting in Mandarin. Voice texts are possibly better due to a quirk-they are limited to one minute so users urgently “dash them off in one long string”. IFT also has a tablet that automatically transcribes business meetings-a digital recorder that generates instantaneous transcripts and a voice-activated virtual assistant for vehicles dubbed “Flying Fish”. Unlike Siri or Alexa, IFK boasts that “Flying Fish” is always on [listening]! Liu speaking of these voice recognition systems “It will be everywhere, as common as water or electricity”. “Empower the world with artificial intelligence" is an IFK motto.
IFK is also applying other capabilities to collect and interrogate massive amounts of dialog data-no consent required in China. The systems are primarily used by large organizations and businesses. There is a sense that all this can or does go beyond benign institutional uses. IFT, (Trades as SZSE on Shenzhen Stock Exchange-is partially state-owned) with a market capitalization of >$10B owns 70% of the Chinese voice market and has 700M users, receives 60% of its revenue from Chinese-government sponsored projects including “Intelligent Criminal Investigation Assisted System”. Another project launched in 2017 is the “Dialect Protection Plan” said to protect the history of other languages and societies in the Ulghur and Tibetian regions. With over 1M Ulghurs interned in Chinese concentration and labor camps the project further “aligns with China’s vision for a surveillance state”. With runaway technical capabilities China is calling for “public security organs to closely cooperate with technology companies ‘to create’ prevention and control systems” and to create a “military-civil” fusion. One basic capability, IFK technology can rapidly scan up to 200 voice sound bites and identify a speaker within two seconds based on unique voice characteristics. Interestingly, the technology is being used in America by the NSA, FBI and The Bureau of Prisons. Reportedly, the systems were used to positively identify Saddam Hussein and Osama bin Laden.
In 2017, Human Rights Watch claimed that IFK was part of China’s plan “to build a digital totalitarian system” to which IFK responded “baseless and absurd”. In the Ulghur region residents are assigned “Big Sisters and Big Brothers” that collect biometric data that is required to confirm at checkpoints. Fears are about coordinating photos, fingerprints and DNA. Last year IFK was placed on the U.S. Government Entity list of campanies subject to import restrictions. Shortly thereafter, MIT, which previously received funding for collaborative projects from IFK, terminated the IFK/MIT relationship.
See text the iFlytek and MIT relationship was terminated by MIT in February 2020.
Following on the novel by Ma Boyong “City of Silence” there’s a sense that street-level resistance if required will take the form of “…homonyms and slang to circumvent [listening devices and] censors”. Some argue that the AI voice systems are not yet so reliable or the computing power is not yet available for large scale “Orwellian” type of surveillance but others feel if the idea is in mind then there is self-censoring.