A Brief History of AI

For many centuries, philosophers and scientists have engaged in philosophical and scientific debates about whether or not machines can mimic human thought. The first important attraction is to understand if it is possible to create a machine that acts like a human. This philosophical debate turned into a scientific problem only during the 20th Century due, in large part, to the work done by Alan Turing in 1950 with his paper, “Computing Machinery and Intelligence,” where he took, or proposed, the philosophical question of “What is thinking?” and created a way to test and evaluate it scientifically via the Imitation Game, (now known as the Turing Test). The Imitation Game was a method to test the ability of a Machine to demonstrate Human Behaviour through means of Conversation. This was the first time that a computer needed to pass a test by showing that it was capable of producing a conversation that mimicked how a Human would behave. This is when the idea of Artificial Intelligence became an empirical science rather than a metaphysical belief system. Also during this time, the development of Electronic Computing and Cryptography occurred during World War II and laid the groundwork for building machines that could be programmed for more than one task. Turing and his contemporaries also recognized that by developing a Symbolic Representation for any type of Reasoning, all Reasoning Processes could be represented through Computation, enabling the computer to simulate any mechanical form of Reasoning, or Cognitive Task. The concept of the “Stored Program Architecture” allowed a Computer to not only execute a program but to also rewrite its own Programming, leading to an Increase in the number of Cognitive Tasks Computers were capable of executing.

John McCarthy (Father of AI)

The Origin (1940- 1950's)

From their beginnings, AI has been heavily influenced by experimental efforts to define logical, mathematical, and computational reasoning, which many researchers attempted following World War II. Among the first and most influential proving machines that contributed to this effort were Claude Shannon’s proofs of how to represent Boolean algebra with electrical circuits. As a result of this discovery, machines could not only perform numerical operations, but also manipulate symbolic representations (e.g., using an alphabet, words, etc.). The development of new theories regarding how machines would learn was greatly influenced by the symbolic viewpoint of the time. This perspective was further enhanced by the creation of early programmable computing machines, including ENIAC and EDVAC as well as more advanced machines, such as the IAS computer. These machines demonstrated that complex procedures could be represented as instructions and encoded so that cognition could eventually be modeled in computer programs using algorithms. Beginning in the mid-to-late 1950s, many areas of AI research began to converge into a single field of study. The most influential event during this period was the Dartmouth Summer Research Project on Artificial Intelligence (1956). This project was organized by researchers John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. At their inception, this project declared that the mechanisms of learning and the underlying processes of intelligence could eventually be accurately defined and simulated by computer programs. The Dartmouth Summer Research Project is recognized as the official beginning of the field of Artificial Intelligence as an academic discipline. This meeting also provided a foundation for the development of interdisciplinary collaborations among researchers studying the areas of logic, linguistics, cybernetics, and cognitive science during this time. In general, most researchers involved in these early efforts believed that intelligence results from manipulating a set of representations of the world (objects, rules, and relationships), and to do so requires formal applications of logical operation.

Early beginnings and symbolic AI (1950-1960's)

The extraordinary optimism surrounding what machines could accomplish was present throughout the 1950s and into the 1960s. Researchers believed it was primarily a matter of developing the appropriate symbolic representations and algorithms to replicate reasoning processes like those of humans. The Logic Theorist was created by Allen Newell and Herbert A. Simon and another software package called “General Problem Solver” (GPS) were among the first attempts to create software that could represent human behaviours when solving problems based upon rules or heuristics. The GPS was designed to model human reasoning across multiple domains by dividing problems into smaller goals and connecting knowledge-based changes with them. The GPS did serve as an impetus for additional research into automated reasoning, planning, and natural-language processing. Laboratories founded at the Massachusetts Institute of Technology (MIT) by Marvin Minsky and at Stanford University by John McCarthy became centres of artificial intelligence (AI) research, attracting teams of researchers from many disciplines, including mathematics, psychology, linguistics, and computer science. Government funding agencies in the United States provided substantial funding for AI research under the assumption that it would be feasible to achieve a general-purpose human-level of artificial intelligence (AI) in one generation. The first demonstrated applications of natural-language processing, such as ELIZA and SHRDLU, demonstrated the ability of computers to participate in structured conversational exchanges and interpret and manipulate linguistic data symbolically. Researchers have also conducted research into robotics and perception, resulting in recent experiments using machine vision and mobile robots operating within tightly controlled environments.

Herber A Simon

Marvin Minsky

Perception, Neurals & First backlash (1957-1970's)

In addition to symbolic A.I., there was another approach to A.I. called “connectionism,” which was based on how the human brain worked. Frank Rosenblatt was the first person to create a computer program called a “perceptron” (in 1957) that would allow computers to learn how to classify data by changing weights on connections through repeated exposure to examples of what they should be classifying. Unlike symbolic A.I. systems that needed explicit rules, perceptrons learn from example, which means they have the ability to generalize from examples provided. Rosenblatt created both software-implemented and dedicated hardware-implemented perceptron systems, creating a great deal of interest from both the public and academia in this area of research. The U.S. Navy invested heavily in this line of research because they were hopeful that perceptron-type machines would be capable of automatically recognizing patterns. While perceptron systems could solve some simple problems (e.g., linearly separable classification problems), their limitations soon became evident. In 1969, Marvin Minsky and Seymour Papert published a book called Perceptrons that mathematically proved that single-layer perceptrons could not compute certain functions (e.g., XOR operation) and that single-layer perceptrons were incapable of representing hierarchical relationships between features. Even so, there were some ideas that came out of the era of perceptrons that provided significant contributions; such as, learning from data (as a learning process), distributed representation (in neural networks), and the idea of adjusting parameters based on the learned data. These concepts became critical to re-establishing multilayered networks, and the backlash exposed larger problems within AI (e.g., unrealistic expectations, fragmentation in methodology, etc.). Additionally, the ongoing debate between symbolic and connectionist models (i.e., rules vs. learning) has been used throughout the history of AI.

Statistical learning and deep-learning precursors (1980s–2006)

As the limits of symbolic AI and expert systems became apparent, the AI community shifted towards a new approach to artificial intelligence (AI) based on statistical learning and probabilistic modelling. AI techniques based on statistical learning and probabilistic modelling were described as being ‘data-driven’ and ‘mathematically grounded’ in how they operate. Examples of these AI techniques are: Bayesian inference; Hidden Markov models (HMM); Support Vector Machines (SVM); Decision Trees. These AI techniques enable a probabilistic model to represent the uncertainty around its predictions, and thus learning from examples is possible, as well as being able to perform reliably despite operating in noisy environments. Due to these capabilities of these AI techniques, they found great success in early computer vision tasks (speech recognition, handwriting recognition, etc.). Neural networks also underwent a renaissance, as researchers in the 1980s rediscovered the backpropagation algorithm and used it to train multilayer neural networks more efficiently by calculating the gradient of the neural network’s error relative to its input. Researchers such as Geoffrey Hinton, Yann LeCun and others began to explore new forms of neural networks (e.g., Convolutional Neural Networks (CNN), Recurrent Networks, Distributed Representations, etc.). In the 1990s, the commercial success of Yann LeCun’s convolutional networks in postal code recognition systems was gained. However, it remained challenging for many researchers to train deep networks due to their computational requirements, the vanishing gradient problem and their lack of data. Therefore, many researchers preferred to work with shallow statistical models due to their high levels of interpretability and the ability to perform effectively on small datasets.

In 2006, a significant milestone for the development of deep learning was reached when Geoffrey Hinton and his group published their research on deep belief networks and layered strategies for pretraining networks using unsupervised learning. This new approach to pretraining enabled deep learning networks to be initialized much easier than previous methods to construct deep learning models, which were previously very difficult to train and required a lot of computational power. Therefore, this research represented a substantial paradigm shift in how people viewed the feasibility of constructing and training deep learning models. The methods outlined in this research represent the building blocks of modern deep learning systems today.

The new developments also provided evidence that machine learning had become the primary paradigm of artificial intelligence (AI) as the emphasis shifted away from using “hand-crafted” rules and transforming to developing AI systems through a data-driven approach to build better AI systems. The advancements made in 2006 created the foundation for the rapid development of AI systems that was created by the convergence of high-performance graphics processing units (GPUs), large volumes of training data, and deep learning architectures.

Big data, GPUs, and the deep-learning revolution (2012–2016)

The current generation of Artificial Intelligence began when advances (in algorithm design) came together with unprecedented levels of computational power and the availability of huge amounts of data. GPU acceleration was originally developed for graphics processing, but turned out to be very suitable for training neural networks. While, at the same time, access to large datasets (web-scale datasets) meant that Neural Networks could be trained based on millions of labeled data (examples). In 2012, Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton introduced AlexNet, a Deep Convolutional Neural Network, as their entry into the ImageNet competition. With their ability to perform significantly better than any other competitors at that time and their capabilities, AlexNet changed the perception that deep learning techniques could outperform traditional machine learning for large-scale (vision type) problems. AlexNet’s advances in technology were made possible due to four major engineering advances (in the authors’ opinion). First, ReLU activation functions enabled the rapid training of Neural Networks, and GPGPU based parallelization (use of multiple processors for the training process) allowed for larger models to be built in a shorter amount of time. In addition, Data Augmentation improved the generalization ability of the models. Finally, using Dropout Regularization improved the robustness of the models and limited overfitting due to the randomization of the data during training. The immediate effect was that Deep Convolutional Neural Networks became a dominant force in Computer Vision and have been used as a foundation for many other problems such as Image Classification, Object Detection, Image Segmentation, and Video Analysis.

Foundation models and the modern era (late 2010s–2020s)

In the late 2010s, foundation models (large neural network architectures that have been trained on large, heterogeneous data sets) became a mainstream topic. Through their development (specifically Transformer models), in 2017, it was revealed that Transformer architectures could support massive amounts of parallel computation and provide long-distance sequence-learning capabilities. These models, such as BERT, GPT systems, and PaLM, demonstrated that when the size of the model and the volume of training datasets were scaled up, the model generalization capability of these models also improved significantly. As such, it was identified that rather than building dedicated systems for specific tasks, a single pre-trained model could be modified for multiple applications with minimal fine-tuning. As a result of this discovery, the economics and methods for developing AI changed dramatically. Foundation models have shown superiority in reasoning abilities, writing (text) generation, generating software code, translating languages, creating summaries, and performing multimodal tasks. The introduction of Diffusion Models and Vision-Language Systems, which enabled the generation of high-fidelity images, the ability to reason across multiple modalities, and the ability to create innovative products/services across different domains, followed the introduction of Foundation Models. Additionally, with the increasing availability of these types of models, researchers are focusing more closely than ever before on safety, fairness and interpretability, as these models begin to be integrated into various aspects of business, creativity, and social systems. The rapid increase in generative AI use has resulted in the creation of many new products/services, as well as new discussions surrounding authenticity, labor dynamics, knowledge production and ethics. Today’s era will be defined by technical scale and by the extensive utilization of these models in a number of industries, including but not limited to healthcare, education, entertainment, and scientific research.