The History of Artificial Intelligence: From Turing’s Dream to Modern Deep Learning and Language Models

“Can machines think?” In 1950, British mathematician Alan Turing posed this question in the academic journal Mind, and more than 75 years later it still captivates us. Living in the 21st century — chatting with ChatGPT, marveling at AI-generated artwork, and riding in self-driving cars — we are slowly constructing an answer.

The history of artificial intelligence is not merely a chronicle of technological progress. It is the history of an intellectual quest in which humanity has wrestled with the most fundamental question of all: what is intelligence? Through dazzling ages of optimism, brutal winters of stagnation, and unexpected revolutions, AI has grown into what it is today.

1. Seeds of a Dream: Philosophical and Mathematical Foundations (1930s–1940s)

The roots of artificial intelligence stretch back well before the computer existed.

In 1936, Alan Turing proposed the concept of the Turing machine in his paper “On Computable Numbers.”^[1] This imaginary device operates by reading and writing symbols on a tape according to simple rules, yet in theory it can carry out any computable function. The paper was an attempt to answer mathematically the question of “what can be computed,” and it became the bedrock of modern computer science.

World War II, paradoxically, accelerated the development of artificial intelligence. Turing contributed to designing the Bombe machine at Bletchley Park in England, which helped crack Nazi Germany’s Enigma cipher. This was powerful evidence that algorithmic thinking could solve real-world problems.^[2]

In 1943, neuroscientist Warren McCulloch and mathematician Walter Pitts published the paper “A Logical Calculus of Ideas Immanent in Nervous Activity.”^[3] They presented a mathematical model showing that neurons in the brain could function like binary logic gates. This McCulloch–Pitts neuron is the conceptual ancestor of artificial neural networks, and it would become the seed of the deep learning revolution to come.

Then in 1950, Turing presented the world with the most famous question in the history of artificial intelligence.

Alan Turing (aged 16) — *Alan Turing, age 16 (c. 1928) — the mathematician who asked the world “Can machines think?” and the father of computer science* Source: Wikimedia Commons (Public Domain)

2. The Imitation Game: The Turing Test (1950)

The paper Alan Turing published in 1950, “Computing Machinery and Intelligence,” is one of the most important documents in the history of artificial intelligence.^[4] Turing decided that “Can machines think?” was too vague a question and reframed it as a more concrete game.

The rules of the Imitation Game he proposed — known today as the Turing Test — are simple. A judge communicates via text with both a human and a machine. If the machine’s responses are indistinguishable from the human’s, that machine can be considered “intelligent.”

In this paper, Turing systematically rebutted various objections to the idea that machines could think. He also predicted that by the year 2000 a machine would be able to fool 30% of judges during a five-minute conversation. The paper was a historic turning point that brought artificial intelligence out of the realm of philosophical speculation and into the domain of scientific inquiry.^[4]

3. The Birth of AI: The Dartmouth Conference (1956)

If historians are asked to name the date when artificial intelligence was born as a formal discipline, most point to the summer of 1956.

On September 2, 1955, four scientists — John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon — jointly drafted a proposal.^[5] The document was titled “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence,” and it marked the first official use of the term “Artificial Intelligence.”

The historic workshop was held at Dartmouth College for roughly two months beginning on June 18, 1956.^[5] Participants included some of the finest mathematicians and computer scientists of the day: Claude Shannon, Allen Newell, Herbert Simon, and others. They gathered under the optimistic assumption that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

Although no breakthrough emerged from the Dartmouth conference itself, the event served as the christening of artificial intelligence as a formal academic field. McCarthy went on to develop the LISP programming language, Minsky co-founded MIT’s AI laboratory, and the other participants each shaped the history of AI in their own way.

4. The Golden Age: Early AI Programs and Symbolic AI (1950s–1960s)

Following the Dartmouth conference, AI research advanced rapidly amid remarkable optimism. This era is often called the age of Symbolic AI (or classical AI). Symbolic AI rests on the assumption that human intelligence can in essence be described as a rule-based system of symbol manipulation.

The Logic Theorist (1955) by Newell and Simon is considered one of the first AI programs. It automatically proved 38 of the 52 mathematical theorems in Bertrand Russell and Alfred North Whitehead’s Principia Mathematica.^[6] Researchers were astonished that a machine could perform mathematical reasoning.

Newell and Simon followed this with the General Problem Solver (GPS) in 1957.^[6] GPS was designed not to solve any specific problem but to tackle “general problem solving,” using a means-ends analysis that imitated the way humans approach problems.

In 1966, MIT’s Joseph Weizenbaum developed ELIZA.^[7] ELIZA analyzed user input through pattern matching and kept conversations going by responding with questions, much like a psychotherapist. Remarkably, many people came to believe ELIZA genuinely understood them. Weizenbaum himself was disturbed by this phenomenon and later wrote Computer Power and Human Reason, a book warning against the human tendency to over-anthropomorphize computers.

During this period, a stream of AI programs appeared — chess-playing programs, theorem provers, and natural-language processing systems. Researchers were optimistic that human-level AI was just around the corner, and enormous government funding poured into AI research.

5. The First AI Winter: The Cost of Overpromising (1970s)

The age of optimism, however, did not last.

In 1966, the ALPAC (Automatic Language Processing Advisory Committee), commissioned by the U.S. government, published a damning report on machine translation research.^[8] The report concluded that machine translation was slower, less accurate, and more expensive than human translation. As a result, U.S. government funding for machine translation research was cut by more than 90%.

A more decisive blow came in 1973. The Lighthill Report, written by British mathematician James Lighthill at the request of the British Parliament, declared that the “grandiose objectives” of AI research had been “wholly unrealised.”^[9] The report identified the problem of “combinatorial explosion” — the observation that AI techniques might work on small problems but could not scale to realistic sizes.

The shock of the Lighthill Report spread beyond Britain. The British Science Research Council (SRC) slashed funding for AI research at British universities, and DARPA in the United States also began reducing its AI research support from 1974 onward.^[9]

Researchers began avoiding the phrase “artificial intelligence” and adopted euphemisms such as “informatics” or “computational intelligence.” The period from roughly 1973 to the early 1980s became known as the First AI Winter.

6. The Rise and Fall of Expert Systems (1980s)

As the first AI winter drew to a close, AI made a comeback with a new approach: Expert Systems.

Expert systems explicitly encoded the knowledge of a domain specialist as a set of IF-THEN rules. Rather than pursuing general AI, the goal was to achieve expert-level performance within a narrow field.

The best-known early expert system was MYCIN (1974), developed at Stanford University.^[10] MYCIN diagnosed blood infections and recommended antibiotic treatments, achieving 69% accuracy in clinical tests — a figure that surpassed the performance of human specialists at the time.

XCON (1980), adopted by DEC (Digital Equipment Corporation), demonstrated the commercial success of expert systems.^[11] Designed to automatically translate customer requirements into VAX computer system configurations, XCON grew to contain more than 15,000 rules by 1989 and saved DEC an estimated $25 million per year.

The success of expert systems drove an AI boom throughout the 1980s. The Japanese government announced in 1982 that it would invest $850 million in its Fifth Generation Computer Project, and the United States and United Kingdom also launched large-scale AI research programs.^[12]

But the golden age proved short-lived. By the late 1980s the limitations of expert systems had become apparent. The cost of maintaining the knowledge base was astronomical, the systems could not adapt to new situations, and commonsense reasoning was beyond them. In 1987, Apple’s and IBM’s inexpensive personal computers displaced the expensive proprietary hardware that expert systems depended on, collapsing the market.^[11] The Second AI Winter arrived and the slump persisted through the late 1990s.

7. The Return of Connectionism: Neural Networks and Backpropagation (1980s–1990s)

While expert systems dominated the mainstream of the 1980s, a different current was quietly gathering strength: the revival of connectionism.

In 1958, Frank Rosenblatt of Cornell University unveiled the Perceptron.^[13] Building on the McCulloch–Pitts neuron, the Perceptron was the first artificial neural network capable of genuine learning; it was publicly demonstrated in Washington, D.C. on July 7. The New York Times excitedly reported that the machine would be able to “walk, talk, see, write, reproduce itself and be conscious of its existence.”

In 1969, however, Marvin Minsky and Seymour Papert published the book Perceptrons, which mathematically demonstrated that a single-layer perceptron cannot solve even simple nonlinear problems such as XOR.^[13] The book dealt a heavy blow to neural network research, and much research funding shifted elsewhere.

The pivotal reversal came in 1986. David Rumelhart, Geoffrey Hinton, and Ronald Williams published “Learning Representations by Back-propagating Errors.”^[14] The paper popularized the backpropagation algorithm, which allows multi-layer neural networks to be trained efficiently.

The principle of backpropagation is as follows: when the network produces a wrong answer, the error is propagated backward from the output layer toward the input layer, and the weight of each connection is adjusted slightly. Repeating this process thousands or tens of thousands of times gradually teaches the network to produce correct answers. Neural network research was revived by this paper, and the age of connectionism in the 1980s and 1990s unfolded.

Artificial Neural Network Diagram — *The basic structure of an artificial neural network (ANN) — composed of an input layer, hidden layers, and an output layer, it mimics the way neurons connect in a biological brain* Source: Wikimedia Commons (Public Domain)

8. A Quiet Revolution: Machine Learning and Statistical Methods (1990s–2000s)

Coming through the second AI winter, AI researchers changed strategy. Instead of the grand dream of “general AI at human level,” practical approaches that solved specific problems well came into favor. Machine learning became the new mainstream of AI.

In 1997, IBM’s Deep Blue achieved a historic victory over world chess champion Garry Kasparov.^[15] Deep Blue operated not through neural networks but through the traditional approach of computing billions of chess positions; nevertheless, the victory powerfully announced AI’s potential to the world.

From the late 1990s into the 2000s, various statistical and probabilistic methods matured. Algorithms such as the Support Vector Machine (SVM), Random Forest, and Naive Bayes achieved results in practical domains like text classification, spam filtering, and medical diagnosis.^[16] These algorithms differed fundamentally from symbolic AI in that they learned rules from data rather than having rules explicitly programmed by humans.

In 2006, Geoffrey Hinton and Ruslan Salakhutdinov published an efficient training method for Deep Belief Networks (DBNs).^[17] The paper showed that it was feasible to train deep, multi-layer neural networks, heralding a new paradigm called deep learning.

9. The Deep Learning Revolution: Images Changed Everything (2012)

In September 2012, the deep learning model AlexNet — built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton of the University of Toronto — won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).^[18]

Before AlexNet, the error rate in this competition had been improving by small increments each year. AlexNet, however, recorded an error rate of 15.3%, opening a gap of 10.8 percentage points over the second-place entry. This was not a mere performance improvement; it was a paradigm shift.

Three key factors converged to make AlexNet’s success possible. First, big data: the ImageNet dataset, built from 2009 onward by Professor Fei-Fei Li, containing millions of labeled images. Second, GPU computing: NVIDIA’s CUDA enabled parallel computation, making it feasible to train neural networks with hundreds of millions of parameters within a realistic timeframe. Third, algorithmic maturity: techniques that stabilized neural network training — such as the ReLU activation function and dropout regularization — had advanced considerably.^[18]

The convergence of these three factors ignited the deep learning revolution. Google, Facebook, Microsoft, and other tech giants rushed to establish AI research labs and recruited deep learning researchers at astronomical salaries. Problems that had been considered long-standing challenges in AI — speech recognition, image classification, machine translation — began falling to deep learning in rapid succession.

In 2014, Ian Goodfellow proposed the Generative Adversarial Network (GAN), demonstrating that AI could “create” new images.^[19] In this technique, two neural networks compete with each other — a generator and a discriminator — producing increasingly realistic images. It would later become the foundation of generative AI.

10. The Transformer Revolution: AI That Understands Language (2017–Present)

If deep learning sparked a revolution in the world of images, natural language processing (NLP) required a different path. Unlike images, text is fundamentally about sequence and context.

In 2017, eight researchers at Google Brain — led by Ashish Vaswani — published a paper titled “Attention Is All You Need.”^[20] The Transformer architecture proposed in that paper changed the history of text processing.

The core idea is the self-attention mechanism. Every word in a sentence simultaneously computes its relationship with every other word, learning which words deserve more “attention.” In the sentence “I ate an apple by the river,” the model learns on its own how strongly the verb “ate” connects to the object “apple.”

Transformer Architecture — *The Transformer model architecture — composed of an Encoder and a Decoder, with the Attention mechanism at its core* Source: Wikimedia Commons (CC BY-SA 4.0)

The Transformer was designed for machine translation but quickly surpassed earlier methods across nearly every NLP task. More importantly, when combined with large-scale pre-training, the Transformer exhibited unexpected capabilities.

In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers).^[21] After pre-training on Wikipedia and vast amounts of text data, BERT set new state-of-the-art results across diverse tasks including question answering and sentiment analysis.

That same year, OpenAI released the first version of GPT (Generative Pre-trained Transformer).^[22] Unlike BERT, GPT was trained toward “generating” text. In 2020, GPT-3 appeared — a massive model with 175 billion parameters that astonished the world by generating text virtually indistinguishable from human writing.^[23]

11. ChatGPT and the Democratization of AI (2022–Present)

On November 30, 2022, OpenAI launched ChatGPT. This was more than a simple product release — it was a cultural event.^[24]

ChatGPT reached one million users within five days of launch and one hundred million monthly active users within two months. It was the fastest consumer application in history to reach 100 million users. People used ChatGPT to write essays, debug code, compose poetry, and explore philosophical questions.

In March 2023, OpenAI announced GPT-4.^[25] A multimodal model capable of processing both text and images, GPT-4 scored in the top 10% on the U.S. bar exam and achieved high scores on the SAT.

During this period, alongside ChatGPT, a variety of large language models (LLMs) launched in fierce competition: Google’s Gemini, Meta’s LLaMA, and Anthropic’s Claude, among others.^[26] The race to develop AI became as intense as the Space Race of the 1960s.

In 2024–2025, AI development moved beyond simple text generation toward strengthening reasoning capabilities. Reasoning-specialized models such as OpenAI’s o1 and o3 series began significantly outperforming previous models on tasks requiring complex thought — mathematics, scientific research, and coding.^[27]

12. AI Today and Tomorrow: Opportunities and Challenges

Today, artificial intelligence is not a single technology but a foundational layer woven into every level of life.

In medicine, AI detects cancer early in CT scans, searches for new drug candidates, and predicts protein structures. In 2020, DeepMind’s AlphaFold reached a level that nearly solved the protein-folding problem, which had resisted solution for decades. In education, AI tutors personalize learning to the level of each individual student. In the arts, generative AI tools like Midjourney and DALL-E have established themselves as new instruments of creativity.

At the same time, AI poses unprecedented challenges. Concerns about job displacement, misinformation through deepfakes, AI bias, privacy violations, and in the long run AI that escapes human control are growing.^[28] In 2023, Geoffrey Hinton left Google — the company he had been associated with for decades — to raise his voice about the dangers of AI. It was a symbolic moment: the so-called “godfather” of AI expressing concern about the very technology he had helped create.

Conclusion: A 75-Year Journey, and What Comes Next

When Alan Turing asked “Can machines think?” in 1950, he imagined an answer — but he surely could not have anticipated how quickly the question would become a practical reality.

The history of artificial intelligence has been a cycle of optimism and disappointment. Excessive promises brought brutal winters, and the winters became time for quiet researchers to prepare a new spring. The journey from symbolic AI to connectionism, from expert systems to machine learning, and onward to deep learning and large language models is a history of technology and an intellectual drama in which human arrogance and humility have intersected.

We now stand before a new question. Not whether AI can think, but how we should think alongside AI, and how we should live with it. The answer lies not in the technology, but in the choices made by the humans who create and use it.

References

[1]: Wikipedia, “Turing machine” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Turing_machine)

[2]: Wikipedia, “Bombe” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Bombe)

[3]: Wikipedia, “McCulloch–Pitts neuron” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/McCulloch–Pitts_neuron)

[4]: Wikipedia, “Computing Machinery and Intelligence” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence)

[5]: Wikipedia, “Dartmouth workshop” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Dartmouth_workshop)

[6]: Wikipedia, “Logic Theorist” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Logic_Theorist)

[7]: Wikipedia, “ELIZA” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/ELIZA)

[8]: Wikipedia, “ALPAC” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/ALPAC)

[9]: Wikipedia, “Lighthill report” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Lighthill_report)

[10]: Wikipedia, “Mycin” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Mycin)

[11]: Wikipedia, “Xcon” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Xcon)

[12]: Wikipedia, “Fifth generation computer” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Fifth_generation_computer)

[13]: Wikipedia, “Perceptron” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Perceptron)

[14]: Wikipedia, “Backpropagation” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Backpropagation)

[15]: Wikipedia, “Deep Blue (chess computer)” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer))

[16]: Wikipedia, “Machine learning” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Machine_learning)

[17]: Wikipedia, “Deep learning” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Deep_learning)

[18]: Wikipedia, “AlexNet” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/AlexNet)

[19]: Wikipedia, “Generative adversarial network” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Generative_adversarial_network)

[20]: Wikipedia, “Attention Is All You Need” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Attention_Is_All_You_Need)

[21]: Wikipedia, “BERT (language model)” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/BERT_(language_model))

[22]: Wikipedia, “GPT-1” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/GPT-1)

[23]: Wikipedia, “GPT-3” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/GPT-3)

[24]: Wikipedia, “ChatGPT” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/ChatGPT)

[25]: Wikipedia, “GPT-4” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/GPT-4)

[26]: Wikipedia, “Large language model” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Large_language_model)

[27]: Wikipedia, “OpenAI o1” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/OpenAI_o1)

[28]: Wikipedia, “Existential risk from artificial general intelligence” (CC BY-SA 4.0; https://en.wikipedia.org/wiki/Existential_risk_from_artificial_general_intelligence)