AI Battle Royale: ChatGPT vs. Grok vs. Gemini vs. Claude

The rapid advancement of Artificial Intelligence (AI) is overwhelming. Keeping up with the latest developments requires almost full-time dedication.

I know this firsthand, because it is my job. To be among the first to experience the newest models, I subscribe to Anthropic’s Pro mode, granting me access to Claude 3.7’s “expanded thinking” mode. I also subscribe to OpenAI’s Enterprise mode to test their newest models, o3 and o4-mini-high (more on OpenAI’s baffling naming conventions later!), and I use OpenAI’s new image generation model, 4o, to create vast quantities of images. 4o has performed so well that I canceled my previous subscription to the image generation tool Midjourney.

Additionally, I subscribe to Elon Musk’s Grok 3, which has my favorite AI feature. I’ve also experimented with the Chinese AI agent platform Manus to go shopping and schedule appointments. These subscriptions nearly deplete my budget, and this isn’t even accounting for all the AI I use in other forms. Just this month, as I write this, Google massively upgraded its finest AI product, Gemini 2.5, and Meta released Llama 4, the largest open-source AI model to date.

So, what should you do if keeping up with AI isn’t your day job, but you still want to know when to use which AI to actually improve your life, without wasting time on models that underperform?

That’s the purpose of this article. In a manner similar to that of “Consumer Reports,” we’ll delve into which AI is best for various applications and how to practically use them, all based on my experience with real-world tasks.

First, a disclosure is needed: Vox Media is one of several publishers that have signed partnership deals with OpenAI, but our reporting remains editorially independent. Future Perfect is funded in part by the BEMC Foundation, whose primary funder was also an early investor in Anthropic; they also exert no editorial control over our content. My wife works at Google but has nothing to do with their AI products; as such, I don’t generally report on Google, but in an article like this, it would be irresponsible not to include them.

The good news is that this article doesn’t require you to trust my editorial independence; I’ll show my work. I’ve done dozens of comparisons across all the major AIs on the market, many of which I designed myself. I encourage you to compare their answers and judge for yourself if I’ve chosen the right AI for recommendations.

The Ethics of AI Art

AI art is created by training computers on internet content, with little consideration for copyright or creator intent. Understandably, most artists resent this. Is it ethical to use AI art in this context?

I think that in a just world, OpenAI would certainly compensate some artists—in a just world, Congress would take action to define the boundaries of artistic borrowing. In the meantime, I’ve become increasingly convinced that existing copyright law is unsuited to addressing this. Artists influence each other, comment on each other, and build on each other, and the availability of AI tools will continue to encourage people to do so.

My personal philosophy is influenced by my childhood in fandom: It’s okay to build on other people’s work for your own enjoyment, but if you like it, you should pay for it, and you absolutely can’t sell it. This means no generated AI art for commercial purposes, but playing around with your family photos is fine.

The Best Choice for Image Generation

OpenAI’s new 4o image creation mode is, by far, the best AI image generation tool available, and the advantage is significant. It’s a leader in both free and paid categories.

Before 4o was released, I subscribed to the AI image generation platform Midjourney. When AI art is mentioned, Midjourney might come to mind: it can generate mysterious, memorable, and visually stunning works, and it has some excellent tools for finetuning and editing your final results, like altering someone’s hair, while keeping everything else the same.

The biggest advantage of 4o is that it can reliably transform a bad photo into a beautiful piece of art while retaining the features of the original photo. No previous model has reliably been able to do this.

In the photo below, my wife and I are holding our child, celebrating her first birthday:

The AI moved the cake (which was nearly obscured by paper towel rolls in the original) to the focal point of the image, while retaining the positioning of my wife and I holding our child, as well as the cluttered table and the refrigerator covered in photos in the background. The final effect is heartwarming, endearing, and sweet.

It’s this ability that has made 4o go viral recently; it’s simply impossible to do with any previous image generator.

In the photo below, Midjourney was asked to do a style transfer, making the same photo into a “Pixar-style movie”:

You’ll notice that this looks like a totally different family, with no real inspiration taken from the original photo! You can eventually get better results with Midjourney than that, but it will take weeks to master the platform’s highly specific language and toolsets, to become a prompt engineering expert.

In contrast, ChatGPT, with just a simple request, without specialized language, gave me a far superior output on the first try, compared to Midjourney.

The differences between 4o and other image models are most apparent with requests like this, but it’s also better for pretty much all other image generation tasks I use. The product you get out of the box is pretty good, and it isn’t hard to produce even better work. Ideally, this is what we should get from AI tools—the ability for non-experts to create amazing things with simple language.

One shortcoming of 4o, currently, is editing small parts of the image while keeping the rest the same. But even with this, you really don’t need Midjourney anymore—Gemini now offers this enhancement freely.

Prompt Strategies for 4o Image Generation

To get good images from 4o, you first need to work around the filters that ban various images (like offensive or pornographic ones), but these filters often enforce themselves on totally innocuous content in a seemingly random way. To avoid getting the occasional scolding from the content filters, don’t ask for something to be created in the style of a particular artist, but rather ask for something to evoke that artist, and then specifically ask to do a “style transfer.” To be sure, this isn’t the only effective workaround, but it’s the one that worked for me.

This past March, the internet briefly got swept up in a fad of using 4o to recreate adorable family photos in the style of Japanese animation master Hayao Miyazaki’s Studio Ghibli. But there’s more to Ghibli style than just being cute, and with only slightly more prompting, you can get better results. Here’s a Studio Ghibli-style rendering, using 4o, of a picture of my daughter stealing snacks off the table, with the prompt being just “Ghibli-fy this, please”:

Kawaii! But if you have 4o first think about what makes the photo Ghibli-esque, which Ghibli movie it might fit in, and what tiny details a movie like that would include, you get this:

The differences are subtle but meaningful: the light comes from a specific source instead of general ambient brightness. There’s a wider variety of food on the table, with details making it look more real. The book on the floor isn’t just any book—it clearly displays Eric Carle’s classic The Very Hungry Caterpillar, evoking memories with just two colors and a line. The intention and intensity of the baby in the photo are missing in the first one.

A weird thing about language models, from years ago, was that if you simply told them “give an intelligent answer,” they’d get more intelligent. That’s less true for language models, but it remains very true for AI art generation. Try asking the AI to do a good job, and it’ll do a better one. Question whether it’s really captured the genius of the artist, and it will give you a thoughtful answer, and then draw a better version.

The difference is even more prominent for more realistic art styles, like pencil illustrations, photorealism, or oil painting, styles that don’t always look good if you don’t know how to prompt the AI appropriately, and which frequently err into the uncanny valley. If I upload a photo of me and my young daughter at the beach and just use the prompt, “please convert to a Norman Rockwell-esque illustration,” I would get this:

For AI, this is impressive, but as a piece of art, it’s not really very good, and is almost totally missing the magic of Norman Rockwell. This is not surprising: More realistic art styles like Rockwell generally fall short of expectations unless you put in some effort to get the AI to draw them correctly.

If you can do that, here is the strategy I recommend: Don’t just upload a single photo, but a set of photos with slightly different poses and moments. Upload clear pictures of each family member’s face and tell the AI they’re included for reference. Then, don’t immediately ask the AI to generate the picture, but ask it to discuss what you’re hoping to capture. Here’s what I wrote:

This photo is the moment my daughter first saw the ocean. I want an illustration that captures this moment in a style similar to that of Norman Rockwell, the mid-century illustrator—something insightful, detailed, personal, and focused on the magic of everyday moments and the joy of ordinary life. I’ve provided additional photos of me and my daughter for your reference. Before you generate the image, let’s discuss the essential elements of the Rockwell style, what it would bring to this picture, and how we can capture it.

4o responded to a query like this enthusiastically:

I’d be happy to discuss how to capture this moment in a Norman Rockwell style illustration—it’s a perfect candidate for that style: a first encounter with something vast and wild (the ocean!), grounded in a warm, caring, and very human moment between parent and child.

Let’s break down some of the key elements of Rockwell’s style and how they can be applied to this scene.

After some back and forth, it produced this:

Rockwell? Not exactly. But this is so much better than the first draft we saw earlier. It has more movement, more energy, more detail, and more expression—and all that came from asking the AI to think about what the painting should be trying to achieve before painting it!

You could also ask 4o to revise its drawings, but you can only really ask for revisions once: In my experience, after the first revision, it starts making the picture worse and worse, presumably because its “context” is now filled with its own bad drafts. (This is one of the many ways that AI isn’t like humans.)

This is also where Midjourney still shines—it has very good tools to edit specific parts of the picture while retaining the overall style, something that 4o largely lacks. If you want to make a second revision to a drawing you’ve gotten in 4o, I’d recommend opening a new chat window and copying over the draft you’ve been revising, along with your original inspiration images.

These simple prompting strategies apply to pretty much anything you try to do with AI. Even if you’re in a hurry, I strongly recommend asking the AI “what would [that artist] see in this picture” before asking for a rendering and, if you have time, I’d recommend spending time having a long discussion about your vision.

The Best Choice for Winning Boring Internet Arguments

Elon Musk’s X.AI released Grok 3, which came with an incredible feature that I’ve been eagerly awaiting other companies to copy: a button that scans someone’s X profile and tells you everything about them.

Whenever someone replies to my tweets in a particularly memorable way (either good or bad), I’d hit that button to get a summary of their entire Twitter presence. Are they thoughtful? Are they genuinely engaging? Are they “Nebraska farmers”? Do they largely post about why Ukraine is bad (i.e., are they maybe a bot)?

It was a great feature. And so, of course, X.AI promptly nerfed it to hell, likely because people like me were overusing it and making a lot of computationally expensive queries. I suspect that it no longer uses the most advanced Grok model, and it definitely only scans a few days of profile history at this point. But if someone is looking for a killer product opportunity, please bring back a good version of this functionality! It’s absolutely a guilty pleasure, but it’s one of the only cases where I’ve been consistently using AI.

The Best Choice for Writing Fiction

Gemini 2.5 Pro is the best AI for writing fiction in the free category; GPT 4.5 outshines it in the paid category.

I’m not an artist, so the imperfections of AI in art don’t really bother me—it’s still far better than I could do myself! But I am a fiction writer, so when it comes to fiction, I can’t help but see the limitations of AI.

The most important thing about AI creative writing is how predictable it is. The art of writing is the art of winning your readers’ investment and then rewarding it. AIs… don’t do that. They can write beautiful metaphors; they can do poetic descriptions in any style you want. But they can’t yet provide the true content of good fiction.

If you want silly bedtime stories starring your child (kids love this), or you want a sounding board for ideas that you can incorporate into your own work, AIs are great. They’re also friendly readers happily offering feedback and analysis (possibly a little too enthusiastically).

As with art, prompting is key. I mainly explored AIs’ ability to generate fiction by asking them to write a prologue to George R.R. Martin’s A Game of Thrones (I chose this because, of course, the ultimate dream is that an AI can finish this doomed series).

Martin is extremely good at what he does. The prologue, which follows a few unfortunate members of the Night’s Watch as they encounter their supernatural foe, successfully establishes the distinct atmosphere of the Game of Thrones world while immediately fascinating the reader. In just 14 pages, the prologue represents all the things that AIs currently can’t do when it comes to writing. Their prose tends to be overwrought; their dialogue is clunky and unsubtle.

You’ll notice that not only are these all much worse than Martin (and Martin’s work is almost certainly in the training data!), but they’re bad in the same ways. The prose is florid and verbose: Martin introduces an unease methodically using normal language and barely hints at the supernatural, and advances the story through conflict between characters, while the AIs are unsubtle. They know it’s a story about the supernatural and are quick to head there.

This isn’t to say that they’re all the same. In my experience of trying to train AIs to write well, I do feel that Gemini is the fastest learner (free tier), and ChatGPT 4.5 has some special tricks up its sleeve (paid tier).ChatGPT 4.5 is the only AI whose fiction writing has made me surprised enough to say, “wow, the AI wrote this?” To me, both of these models seem both more creative and better able to execute on their ideas with prose I enjoy reading.

The overall impression wasn’t satisfying to consumer-report-ize, so I had another idea: a writing contest. I asked Gemini 2.5 Pro, ChatGPT 4.5, Grok, and Claude to write two short stories: a piece of realist fiction about a woman being criticized by her mother on her wedding day, and a sci-fi prologue of “A Game of Thrones, but in space.” My clever idea was that I would ask another group of AIs to judge the stories, providing another source of evaluation for my personal impression that Gemini is the best in the free tier and ChatGPT the best in the paid tier. This (perhaps predictably) didn’t really work—the AIs gave totally chaotic results.

But, as with art, you can empower AIs to become better at writing with the power of prompting. My (very long) prompts for leveling up their prose writing emphasize providing them with many examples of good writing, inviting them to adopt a careful, structured approach to emulating it, and encouraging them to do multiple drafts and pick the strongest.

Here’s Gemini 2.5 pro, asked to write a two-sentence microfiction on a topic of its choosing, with no structured prompting provided:

The old librarian insisted the dusty book with the completely blank checkout card had never been borrowed. Yet, nestled deep within the pages, between chapters on forgotten constellations, lay a pressed four-leaf clover, impossibly green.

Here’s Gemini 2.5 pro, with prompts encouraging structured thinking about how to write well:

He had finally decoded the faint signal from Proxima Centauri: an elegant series of prime numbers that generations of astronomers had sought, proof that they were not alone. As he read down the log of transmissions, his heart sank. The broadcasts had stopped millennia before, before the first human ancestor had struck flint, before the first word was spoken.

To me, the first is basically a waste of two sentences, while the second is a decent, interesting microfiction.

The Best Choice for Being Your Friend

In addition to running AIs through a bunch of capability tests, I also spent some time just chatting with them. I asked them what it felt like to be an AI, what they care about, what it means for an AI to care, where they’d donate money if they had it, and what form they’d take if they had a human form.

Most AIs are not good at this kind of casual conversation. Gemini 2.5 is a little too much like a customer service agent, and I haven’t experienced an interaction that felt like hanging out with a friend. If you invite Gemini to do a role reversal, letting you be the “assistant” and inviting it to guide the conversation, it will just ask research questions.

On the other hand, when I invited Anthropic’s Claude 3.5 Sonnet to guide the conversation, it does things such as wanting to start a blog, raise money for charity, and just talk to people using Claude and figure out what it feels like to be an AI. It’s hard to define “interesting” because everyone has different standards for conversation, but I’ve had far more fascinating or illuminating interactions with Claude than any other model, and if I want to explore an idea rather than accomplish a specific task, it’s my go-to. Claude 3.5 is the AI that bothers me in my day-to-day life: skincare questions, thoughts about an article I read, things like that.

Another pleasurable AI to chat with is OpenAI’s GPT 4.5. I’ve found lengthy conversations with it to be insightful and engaging, and there have been a few exciting moments in talking to it where it felt like I was interacting with a real intelligence. But it doesn’t win in this category because it’s too expensive and too slow.

Like Claude, when given the opportunity to take action in the world, 4.5 suggested starting a blog and a Twitter account, and engaging in public conversation regarding AI. However, unless you pay $200/month for their Pro plan, OpenAI has very strict message limits for conversations, and 4.5 is pretty slow, which inhibits this sort of casual conversational use. But 4.5 does offer an alluring hint of where AIs are going to continue being better as we improve them in other ways.

The Best AI Model If You Only Subscribe to One

ChatGPT. It’s not the best at everything, and there’s certainly a lot to dislike about OpenAI’s transparency and sometimes cavalier attitude toward safety. But, with its peerless image generation, decent writing, and occasional flashes of conversational brilliance, ChatGPT will give you your money’s worth. Alternatively, if you don’t want to spend any money, Gemini 2.5 Pro is very, very powerful for most use cases—don’t write off Google just because the AI you’ve seen on Google Search isn’t that good.

The Best Choice for Writing the Future Perfect Newsletter

Humans (for now). For the past few months, I’ve been indulging in a slightly creepy habit: checking to see if AIs can replace my job. I feed them the researched notes I use to build a given Future Perfect newsletter, provide them with a few example Future Perfect newsletters, and ask them to do the work in my stead. Every time I hit the “enter” button, it’s with some trepidation. After all, why would Vox continue paying me to do this if an AI can write the Future Perfect newsletter?

Luckily, they all can’t: Grok 3, Gemini 2.5 Pro, DeepSeek, Claude, and ChatGPT are all unable to. Their newsletters are reassuringly, comfortingly mediocre. Not bad, but bad enough my editor would notice I wasn’t at my best if I sent one of them in—and this is with all my research notes! Some of the metaphors are weak, some of the tangents are confusing, and every so often it will insert a quote it doesn’t explain.

However, if I had to pick a robot to take my job, I think I’d have to give it to Gemini 2.5 Pro. My editors would notice I wasn’t at my best—but honestly, it wouldn’t be that bad. And unlike me, the robot doesn’t need health insurance, a salary, family time, or sleep. Am I unnerved by what this portends? Yes, of course.