Meet GPT-4o: OpenAI’s Most Human AI Model Yet | Enterprise Tech News EM360Tech

OpenAI has announced the launch of GPT-4o, a new flagship AI model and voice assistant that’s set to give other AI models and voice assistants a run for their money.

The model, announced on Monday as part of OpenAI’s spring update, shows off a new level of real-time conversational fluency, including the ability to understand context and shift gears when people talk over it, and a host of new reasoning capabilities that make it more human-like than any other model to date.

OpenAI showed off the capabilities of ChatGPT-4o in a 15-minute live-streamed demo time with OpenAI leaders on stage in front of an in-person audience of the company's employees.

In one demo, the model spoke in a sprightly female voice, responding far faster to queries than previous generations of voice bots, displaying nuanced human-like language and emotion.

It was able to analyse the video chat in real-time, describing the room for the demo when the user asked where it thought he was and correctly answering that the user was likely recording a video due to the setup.

GPT-4o even noticed that the user was wearing a hoodie with an OpenAI logo, and correctly assumed that this meant the video must have had something to do with OpenAI. And it was able to do all of this in real-time as if it were having a normal conversation with the user.

gpt-4o demo — *Demo of GPT-4o. Source: OpenAI*

In another demo, the company also showed the ability of the latest ChatGPT to read a story it drafted with both increasing levels of dramatic excitement as well as in a robot voice when asked.

That's a huge improvement from other voice assistants released in the past few years, which have been criticized for being too robotic and not being able to portray human emotions.

At another point in the demo, OpenAI’s head of frontiers research Mark Chen asked ChatGPT for tips to calm his nerves, and the chatbot suggested deep breaths.

When Chen responded by hyperventilating, ChatGPT replied "Whoa, slow down a little bit there Mark – you're not a vacuum cleaner!” demonstrating humour that’s never been seen before in other AI models.

What is GPT-4o?

GPT-4o is a new multimodal large language model (LLM) released by OpenAI. It's faster and more powerful than the previous version, GPT-4, and can process information from text, voice, and images. It can also understand and respond to spoken language in real-time, similar to humans, opening up the potential for real-time translation and other applications.

GPT-4o can respond to audio prompts much faster than previous models, with a response time close to that of a human, and its better at understanding and discussing images. For example, you can describe an image and GPT-4o can discuss it with you.

OpenAI believes GPT-4o will be a is a step towards much more natural human-computer interaction. It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.

chatgpt-4o example — *Example of GPT-4o output. Source: OpenAI*

It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation.

GPT-4o still isn’t perfect, though. At one point during the demo, it mistook the smiling man for a wooden surface, and it started to solve an equation that it hadn’t yet been shown. There’s still some way to go before the glitches and hallucinations which make chatbots unreliable and potentially unsafe, can be ironed out.

But what it does show is the direction of travel for OpenAI, which includes GPT-4o to becoming the next generation of AI digital assistants, outsmarting the likes of Siri or Amazon’s Alexa by remembering what it’s been told in the past and interacting beyond voice or text.

What can GPT-4o do?

GPT-4o is multimodal, meaning it can process images, and audio, and generate responses using any combination of these formats. It can analyze images and videos, understand audio, and translate languages in real time. This lets it better understand the information you provide, and provide better resposnes.

Not only can GPT-4o generate different creative text formats, but it can also understand and analyze the content it receives. This means it can answer your questions in an informative way, even if they are open ended or challenging thanks to its incredible natural processing abilities.

GPT-4o is also incredibly fast, responding to audio prompts close to real-time conversation speeds. This opens up the potential for it being used for real-time translation and other applications.

GPT-4o vs ChatGPT-4: Which is better?

OpenAI says GPT-4o matches ChatGPT-4 Turbo performance when it comes to text in English and code, with significant improvement in text in non-English languages. It’s also much faster than GPT-4 and better at vision and audio understanding compared to existing models.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.

This process means that GPT-4 would lose a lot of crucial information and it can’t directly observe tone, multiple speakers, or background noises, and can’t output laughter, singing, or express emotion.

But with GPT-4o – which is trained across text, vision, and audio – all inputs and outputs are processed by the same neural network. And Since this is OpenAI’s first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do compared to ChatGPT and GPT-4.

GPT-4o vs other models: Comparing the benchmarks

GPT-4o is marginally better than many of the other models on the market today, beating models, including Claude 3, Gemini and LlaMa 3 on the multiple benchmarks tested by OpenAI. These benchmarks included reasoning, text evaluation, Audio ASR performance and the M3Exam.

One of the most notable improvements with GPT-4o was in reasoning. The AI model set a new high score of 88.7% on 0-shot COT MMLU general knowledge questions, which is the highest score ever recorded of any AI model on the market.

gpt-4o benchmarks — *GPT-4o visual understanding benchmarks. Source: OpenAI*

GPT-4o also outperformed all other models on visual perception benchmarks including MMMU, and ChartQA as 0-shot CoT. It also broke the record for AI models on the MathVista exam, scoring an impressive 63.8 compared to GPT-4 Turbo’s 58/1 and Gemini’s 53.0.

How do you access GPT-4o?

You can use GPT-4o for free in the OpenAI Playground, which allows you to experiment with GPT-4o through a web interface. You can access it if you have an OpenAI API account, but creating one is free.

If you want to integrate GPT-4o into your own application, you'll need an OpenAI API account. GPT-4o is available through several OpenAI APIs, including Chat Completions, Assistants, and Batch APIs [1]. This option may require some programming knowledge.

kasjdf laskjdf asldkfj asdf

new title abc

This is title

AI chatbots must learn to say ‘help!’ says Microsoft executive

test scheduling

This is another test

test 24234

This is a test again

adejh fgbuewv

What Happened to the IRL App? Fake Users and Real Consequences

How a Labour Government Will Change UK Tech, According to Experts

Top 10 Best Public DNS Servers for 2024

The Tendency to Blame the System for Personal Mistakes: An Analysis

Top 10 Facility Management Software Solutions for 2024

Communications Tech Events to Attend: An Analyst's Take

How The Open Group Portfolio of Digital Open Standards Supports your Digital Business Transformation Journey

Test 1

All Social Security Numbers Leaked in Massive Data Breach

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

new title

What Happened to Hi5? From Social Media Star to Digital Footnote

The Growth of Enterprise Tech Podcasting

Olympic Venue Among 40 Museums Targeted by Ransomware Attack: What You Need to Know

test scheduling

test schedule publish ( 12-11-2024 9:10 am bst )

kasjdf laskjdf asldkfj asdf

new title abc

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

Empowering Local Government with FME: Fremont’s Path to RIPA Compliance

AI Strategy in Latin America: Imitation Over Innovation

Top 10 Facility Management Software Solutions for 2024

Top 10 GIS Software Tools And Solutions

Top 10 Biggest GDPR Fines in History (So Far)

Top 10 Building Automation Systems (BAS) for 2024

match

Astera: Revolutionizing Insurance Data Modeling Through Data Vaults

Astera: The Blueprint for End-to-End Data Warehouse Automation

Astera: Transitioning from Manual to Automated Data Pipelines

Cybersecurity Luminary Stephen Khan to Receive Prestigious Hall of Fame Award at Infosecurity Europe

Leadership powerhouse Claire Williams OBE reveals how to navigate change and develop a strong team culture at Infosecurity Europe 2024

Digital Transformation Week Unveils Keynote Topics: Empowering Enterprises with Real-World Insights

Generative AI and Deepfake Expert, Henry Ajder to discuss the impact of generative AI on cybersecurity at Infosecurity Europe 2024

Astera Empowers DXC Technology to Modernize their Legacy Data

Astera Code-free Automated Data Integration

Astera: Build your Custom Data Warehouse in 3 Simple Steps

Astera: Step by Step No-Code Data Preparation

What is GPT-4o?

What can GPT-4o do?

GPT-4o vs ChatGPT-4: Which is better?

GPT-4o vs other models: Comparing the benchmarks

How do you access GPT-4o?

More from Ellis Stewart

Ellis Stewart

Recommended for you

All Social Security Numbers Leaked in Massive Data Breach