Why is the release of GPT-4 a big deal?

Abhinav Jain
3 min readMar 14


It’s actually NOT the pictured chart. Open-AI CEO Sam Altman called it “complete bullshit.”

Training on more parameters doesn’t necessarily make LLMs better. That’s a form of data ‘maximalism’ only 14% of AI experts believe. But there is one huge reason to be excited. Microsoft’s Germany CTO Andreas Braun spilled the news at a recent AI event.

“We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities — for example, videos”

That’s a huge unlock for the system:

1. Images

Image generation has traditionally been the realm of tools like Dall-E, the first generative AI tool hyped last year. This represents GPT’s progress to becoming an “everything” AI.

It’s going to be much more than Dall-E.

Based on a paper Microsoft published, GPT-4 is expected to be able to READ images. So you will be able to ask it to OCR an image, or decipher the attitude of a painted face. Think of the power of a transform prompt like “age this person 10 years.”

2. Audio

It’s expected that GPT-4 will be able to have voice interaction as another modality. It will be quite fun to compare to Alexa and Siri. I expect it will fare very well. And unlike them, you’ll be able to ask it to write a song or transcribe a podcast.

3. Video

As Andreas mentioned, GPT-4 will be able to read videos as input and build them as output. One could imagine an entire show being made in GPT-4. Or YouTubers quickly editing their videos. Tons of visual & audio jobs are going to be affected by GPT-4. It will not just be the writers anymore.

The really crazy thought is: what about GPT-5? Or 10? GPT-3.5 has already passed the Turing test, Lovelace test… you name it, in our AI criteria. Who knows how powerful the next versions will be.

