What Led to the AI Boom? A Short Timeline

I was recently wondering (again) about how we got to where we are: This period of rapid change where AI is developing all these astonishing capabilities. I remember as little as 2 years ago playing around with AI writers (like Jarvis or Copy.AI) and being fundamentally underwhelmed.

Then Dall-E happened, which was the first 🤯 moment.

And then came ChatGPT. And ChatGPT 4. And Midjourney. And and and.

But let’s go back a bit:

1950s:
- 1950: Alan Turing proposes the Turing Test to evaluate a machine’s ability to exhibit intelligent behavior.
- 1956: The Dartmouth Conference marks the birth of AI as a field, coined by John McCarthy.
1960s:
- Early success in AI like solving basic algebra problems, proving logical theorems, and speaking English.
1970s:
- 1973: The Lighthill Report in the UK and similar sentiment in the US leads to reduced funding, starting the first “AI Winter.”
1980s:
- Expert Systems become popular.
- 1986: Backpropagation algorithm revitalizes the field of neural networks.
1990s:
- Work on machine learning and data mining progresses.
- 1997: IBM’s Deep Blue defeats World Chess Champion Garry Kasparov.
2000s:
- Significant advancements in machine learning, speech recognition, and robotics.
- 2011: IBM’s Watson wins Jeopardy!.

2012: AlexNet

The breakthrough in image recognition that happened in 2012 was the success of a deep learning model called “AlexNet” in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, was a deep convolutional neural network (CNN). It significantly outperformed other models in the competition, achieving a top-5 error rate of 15.3% compared to the second-place error rate of 26.2%. This marked a pivotal moment for the AI community, as it demonstrated the power of deep learning, and specifically CNNs, for image recognition tasks.

After the success of AlexNet, the research and development in deep learning exploded, leading to rapid advancements in various AI domains, especially computer vision. The 2012 breakthrough paved the way for many modern AI applications and systems that leverage deep learning.

2015: AlphaGo becomes Go world champion

The victory of AlphaGo over a world champion Go player in 2015 was a monumental moment in the field of Artificial Intelligence (AI). Here are some of the key points regarding its significance:

Demonstration of Superior Strategy:
- The game of Go is known for its complexity with more possible positions than the number of atoms in the universe. AlphaGo’s win demonstrated that AI could master such a complex game and develop superior strategies to defeat human champions.
Breakthrough in Machine Learning:
- AlphaGo utilized a combination of machine learning and tree search techniques, along with extensive training on both human and computer-played games. Its success showcased the potential of combining deep learning with reinforcement learning, which was seen as a breakthrough in the field.
Earlier than Expected Achievement:
- Many experts had believed that it would take at least another decade before a machine could defeat a human champion at Go. The 2015 victory came as a surprise to many, showcasing rapid advancements in AI technology.
Inspiration for Further Research:
- The success of AlphaGo inspired further research in the field of AI. It demonstrated the potential of deep reinforcement learning and sparked interest among researchers to explore new applications of AI in other complex domains.
Public and Media Attention:
- AlphaGo’s win garnered substantial public and media attention, bringing AI into the spotlight. It helped in broadening the conversation about the capabilities and future of AI, making more people aware of its potential and the pace of advancement in the field.
Practical Implications:
- The techniques used in AlphaGo have practical implications beyond board games. The advancements in machine learning and AI demonstrated by AlphaGo’s victory have the potential to be applied in various fields including healthcare, science, and engineering to solve complex real-world problems.
Ethical and Societal Implications:
- The event also brought forth discussions on the ethical and societal implications of AI, including concerns about automation, job displacement, and the future interaction between humans and intelligent machines.

The triumph of AlphaGo was a landmark event that showcased the potential of AI, stimulated further research and development in the field, and initiated broader discussions on the impact of AI on society and various industries.

2018: GPT-2

The release of GPT-2 (Generative Pre-trained Transformer 2) by OpenAI in 2018 was a significant event in the AI community for several reasons:

Language Understanding and Generation:

GPT-2 demonstrated a remarkable ability to understand and generate human-like text. It could respond to prompts with coherent paragraphs, write essays, summarize text, and even create poetry. This showed substantial progress in natural language processing (NLP) and generation (NLG).

Pre-training and Fine-tuning:

GPT-2 utilized a two-step process involving pre-training on a large corpus of text followed by fine-tuning on smaller, more specific datasets. This approach significantly improved the model’s performance across a range of NLP tasks without task-specific architecture modifications.

Scale of the Model:

With 1.5 billion parameters, GPT-2 was one of the largest language models ever created at the time. This demonstrated the benefits of scaling up neural networks and training them on vast amounts of data.

Transfer Learning:

GPT-2 showcased the potential of transfer learning in NLP. It exhibited that a model trained on one task could excel at numerous other tasks with minimal fine-tuning, underscoring the versatility and efficiency of transfer learning.

Concerns Over Misuse:

Due to its ability to generate realistic, coherent text, there were concerns about GPT-2 being used for malicious purposes such as generating fake news or phishing emails. Initially, OpenAI refrained from releasing the fully trained model due to these concerns, sparking a wide discussion on the ethical implications of such powerful language models.

Open-Source Model and Research:

Despite concerns over misuse, the release of GPT-2 contributed to the broader AI community by providing a robust model for research. The model, along with the publication detailing its architecture and training methodology, helped propel further advancements in NLP.

Public Awareness and Ethical Dialogue:

GPT-2’s release brought more public awareness to the capabilities and potential risks associated with advanced AI, especially in the realm of misinformation and content authenticity. This helped foster a broader dialogue about ethical AI development and deployment.

Inspiration for Subsequent Models:

GPT-2 laid the groundwork for later models like GPT-3. The successes and challenges encountered with GPT-2 informed the development of more advanced and capable models, further pushing the boundaries of what AI can achieve in language understanding and generation.

GPT-2’s release was a key moment in AI history, demonstrating the potential of large-scale language models, advancing the field of NLP, and igniting important discussions on the ethical implications of AI.

However, the quality of the text generated was still very far away from what we’ve seen in 2022.

2022: DALL-E

The release of DALL-E and its subsequent updates in 2022 marked significant strides in the field of AI, particularly in generating images from textual descriptions.

It captured the imagination of the public in unprecedented ways.

Here’s a breakdown of the milestones and their significance:

Extended Creativity (July 14, 2022):
- OpenAI introduced an extension to DALL-E’s creativity on July 14, 2022, which likely included enhancements that enabled the generation of more realistic images and art from text descriptions.
Commercial Availability (July 20, 2022):
- OpenAI began selling its image-making program DALL-E 2 to the million people on its waiting list. This commercial availability allowed a broader user base to leverage DALL-E’s capabilities for creating images from text descriptions.
Public Accessibility (September 28, 2022):
- On September 28, 2022, OpenAI removed the waitlist for DALL-E, making it accessible to more users. Although it was still in its beta phase, this move signaled a significant step towards broader public accessibility.
API Release (November 3, 2022):
- OpenAI released DALL-E 2 as an API in early November, allowing developers to integrate this model into their applications. This move facilitated a broader utilization of DALL-E’s capabilities in various applications and platforms. The release was significant enough that Microsoft unveiled their implementation of DALL-E 2 in their Designer app and Image Creator tool included in Bing and Microsoft Edge.

These milestones not only showcased the progressive development and commercial availability of DALL-E but also highlighted its potential in extending the boundaries of creativity and practical utility in the digital domain. Through these developments, DALL-E became more accessible to both individual users and developers, fostering a broader scope of applications and integrations.

2022, November: ChatGPT

The release of ChatGPT by OpenAI on November 30, 2022, was a significant milestone in the development of language models and chatbots. Here are some key aspects and implications of this release:

Product Launch:

ChatGPT, standing for Chat Generative Pre-trained Transformer, was launched as a large language model-based chatbot that allows users to steer conversations towards a desired length, format, style, level of detail, and language.

Public Engagement:

Upon its release as a web app, ChatGPT quickly gained attention on social media. Remarkably, it amassed over one million users within just five days of its launch, indicating a strong public interest and engagement with this new AI tool.

Research Preview:

The release was described as a “research preview,” indicating that while it was publicly available, it was perhaps still in a phase of gathering user feedback and undergoing refinements. Despite being a research preview, there were no announced plans to take it offline or charge for access, making it a freely available tool for users to experience and interact with.

Technological Foundation:

ChatGPT was introduced using the GPT-3.5 model, showcasing the continued evolution and application of Generative Pre-trained Transformer models by OpenAI. This reflects the iterative progress in AI development, leveraging previous advancements to create more sophisticated and user-friendly AI tools.

The launch of ChatGPT not only demonstrated a significant technical achievement but also represented a step towards making advanced AI language models more accessible and interactive for the public. Through this release, users could experience firsthand the capabilities of a state-of-the-art language model in generating coherent and contextually relevant text in a conversational setting. This release further underscored the potential of AI in enhancing human-computer interaction and providing valuable tools for a wide range of applications.