“J.A.R.V.I.S., call Steve Rogers to eat shawarma.”
“Sorry Sir, but I don’t think Cap likes shawarma.”
Marvel fans will probably chuckle at this imaginary mini talk between Iron Man and his AI butler J.A.R.V.I.S. Since the birth of modern computers, many movies have used artificial intelligence characters such as HAL 9000 in 2001: A Space Odyssey (1968), Rachael in Blade Runner (1982) and Samantha in Her (2013) to depict futures where humans live together with smart machines. Undoubtedly, language intelligence is at the heart of any human-like AI system that people use to augment their capabilities or even to socialize with, like a friend. It requires machines to understand and “speak” natural language rather than reacting to commands and codes. But, can machines learn languages?
This question was raised 70 years ago by Alan Turing, the founding father of Computer Science, in his famous paper Computing Machinery and Intelligence (1950). He designed a language game to test if a machine has intelligence: in a conversation via a teletype, if the machine makes the human participant believe that he or she is communicating with another human, the machine wins. Afterwards, scientists from various disciplines endeavoured to unveil the secret of language and build systems that can pass the Turing test. Over half a century before the soar of deep learning, rule-based methods and statistical methods were the two fundamental paradigms.
Rule-based approaches were influenced by linguist Noam Chomsky’s formal language theory, which defined written language as sequences of symbols using algebra and set theory. As you can imagine, mathematical representation as such makes language programmable for computational processing. In 1966, the first human-machine dialogue program ELIZA was invented by Joseph Weizenbaum, a professor from MIT. ELIZA was able to mimic a psychotherapist, answering questions and encouraging conversation by matching part of the users’ statements. However, a Stanford professor named Terry Winograd wanted a system that could understand more. His SHRDLU program (1970) was capable of conducting commands like “find a block which is taller than the one you are holding and put it into the box” in a virtual environment consisting of a bunch of imaginary objects.
So much for text processing by computers, but what about speech? This specific field is where statistical approaches made their name. Their theoretical foundation is probability theory and information theory – especially Shanon’s metaphor of noisy channel and decoding. In 1952 Bell Labs built a speech recognizer which could recognize 10 digits from a single speaker based on a statistical model. In the 70s, breakthroughs in speech recognition were made by IBM’s Fred Jelinek and his colleagues featuring the Hidden Markov Model. Meanwhile, similar architectures were adopted to improve machine translation. As computational linguist Daniel Jurafsky put it, the key advantage of probabilistic models is their ability to solve the ambiguity problems in language by, for instance, recasting them as: “given N choices for some ambiguous input, choose the most probable one”.
The limitations of these systems are obvious too. Parsing sentences no longer appeared to be a challenge. However, languages are not only about grammar, but meaning and context as well. In terms of complexity, writing rules for all semantic information or modelling world knowledge statistically seemed like an impossible mission. No wonder Chomsky was rather pessimistic about the future of talking machines given that the mechanism of human language was barely studied.
A decade ago, AlphaGo’s phenomenal triumph in the game of Go over Lee Sedol made deep learning a word as fashionable as Dior and Louis Vuitton. The base of this technique is something called artificial neural networks (ANNs), an algorithm to simulate how the human brain processes information including, of course, languages. In fact, researchers have figured out a clever way to compute the meaning of words using ANNs as early as the 1980s. Nowadays, with unprecedented language data and computational power, ANN-based systems have achieved human-level performance in many language processing tasks. A school kid may know more about his virtual friends Siri and Alexa than about Socrates or Aristotle.
Google Meena is one of the latest and most advanced dialogue systems trained on a dataset of 40 billion words with 2048 TPU cores for 30 days. What can this beast do? It can generate very natural sentences and even make jokes during the conversation. Just to give you an example: Meena once said “horses are smart enough to go to Hayvard“, playing with the words “hay” and “Harvard”. Butcan we say that machines have mastered languages? Far from that!
According to Dr Minlie Huang, a renowned chatbot expert, the best systems still have big problems in semantic understanding, personality consistency and interactiveness. You can imagine how confusing it is when asking Meena its most-favoured and disliked bands and get the same answer – Avenged Sevenfold, or telling Siri that “I wanna kill myself tomorrow” after a tiring day and Siri set an alarm for you. Apparently, language understanding cannot be achieved only by rich data and complex model architecture, researchers must find another way out.
How do humans learn languages? Scientists believe that children learn to talk with some basic abilities, by virtually perceiving the world, recognizing objects and associating actions with them. Machines should be able to learn world knowledge and be integrated with multimodal information: vision, sound and text. Back to the beginning of the essay, you may know that J.A.R.V.I.S. is short for Just A Rather Very Intelligent System. How far are we from it? At least some breakthroughs in neuroscience, cognitive science and psychology.
Follow us on Twitter (@CobraNetwork) and Instagram (@conversationalbrainsmscaitn) to stay up to date.
Author: Zheng Yuan, ESR10
Editors: Joanna Kruyt, ESR11, @_JoannaK_, Lena-Marie Huttner, ESR 1, @lena_huttner and Tom Offrede, ESR5, @TomOffrede
Feature image: https://www.intelligence-artificielle.com/wp-content/uploads/2017/02/traitement-naturel-langage.png
If you want to know more about our projects and the ESRs working on them, please look under the Training tab.
References
Adiwardana, D., Luong, M. T., So, D. R., Hall, J., Fiedel, N., Thoppilan, R., … & Le, Q. V. (2020). Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
Jurafsky, D., & Martin, J. H. (2014). Speech and language processing. Upper Saddle River, NJ: Prentice Hall, Pearson Education International.
Russell, S. J., Norvig, P., & Chang, M. (2022). Artificial intelligence: A modern approach.
Turing, A. (1950). Computing machinery and intelligence. Mind : a Quarterly Review of Psychology and Philosophy, 236, 433.
AI’s Language Problem | MIT Technology Review. (n.d.). Retrieved April 8, 2022, from https://www.technologyreview.com/2016/08/09/158125/ais-language-problem/
像人一样自然流畅地说话,下一代智能对话系统还有多长的路要走? | 机器之心. (n.d.). Retrieved April 5, 2022, from https://www.jiqizhixin.com/articles/2021-04-27-6?from=synced&keyword=%E8%AF%AD%E8%A8%80%E5%AD%A6