Tested Nvidia RTX 5090 vs 4090 GPUs for AI: You Won’t Believe the Winner!

Tested Nvidia RTX 5090 vs 4090 GPUs for AI: You Won’t Believe the Winner!

Tested RTX 5090 vs 4090 GPUs for AI: You Won’t Believe the Winner!

Speed Testing NVIDIA GPUs for LLM inferencing and fine-tuning

Photo by Nana Dua on Unsplash

Hey there, GPU geeks and AI tinkerers. Today, we’re stepping away from models and stepping right into silicon showdown territory — we’re talking about NVIDIA’s RTX 5090 vs the reigning champion, RTX 4090.

Now if you’ve been dreaming about squeezing every millisecond out of your model training, this blog’s for you. The 5090 just dropped with NVIDIA’s Blackwell architecture, promising firepower that could melt your desk. But is it really an upgrade for AI workloads?

Let’s break it all down — benchmarks, real-world tests, gotchas, and the final verdict.

Quick Specs Showdown: 5090 vs 4090

On paper, the 5090 is a monster. More TFLOPs. Smaller form factor. Newer everything.

To actually test whether 5090 is better than 4090, I conducted 3 experiments. Shockingly, in all 3, 4090 exceeded 1590 by a huge margin. Let’s check them out.

The Experiments: 3 AI Tasks, One Goal — Speed

All experiments were run with identical code and setups — just the GPU changed. Here’s what went down:

1. Summarizing 100 Articles with T5-Large

T5-Large, a Google-born text summarization model with ~770M parameters, was used to summarize 100 dummy articles.

Codes used

from transformers import pipeline
import torch
import time

# Detect device
device = 0 if torch.cuda.is_available() else -1
print("Device:", "GPU" if device == 0 else "CPU")

# Load summarizer
summarizer = pipeline("summarization", model="t5-large", device=device)

# Create dummy "articles" (10K repetitive samples)
fake_article = "The quick brown fox jumps over the lazy dog. " * 30
articles = [fake_article for _ in range(100)]
# Run summarization in batches
batch_size = 32
summaries = []start = time.time()

print("Starting summarization")
for i in range(0, len(articles), batch_size):
batch = articles[i:i+batch_size] result = summarizer(batch, do_sample=False)
summaries.extend(result)
end = time.time()

print(f"Summarized {len(articles)} articles in {end - start:.2f} seconds using {torch.cuda.get_device_name(0)}")

Wait… what? Yep. The older 4090 won by 6 seconds. Not a huge deal, but it’s like losing a sprint to your grandpa in new sneakers.

2. Fine-Tuning DistilBERT on 7.5K Rows

Next up: fine-tuning DistilBERT for sentiment classification. Small model, small dataset — just 5 epochs.

You read that right. 4090 was 2x faster.

Codes used

import time
import pandas as pd
import numpy as np
from datasets import Dataset
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
TrainingArguments,
Trainer,
)

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english" # Define which pre-trained model we will be using
classifier = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2) # Get the classifier
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

import pandas as pd
# Load the training data
train_path = 'train.csv'
df = pd.read_csv('train.csv')

print("dataset size",len(df))

df = df.loc[:,["text", "target"]]
from sklearn.model_selection import train_test_split
df_train, df_eval = train_test_split(df, train_size=0.8,stratify=df.target, random_state=42) # Stratified splitting

from datasets import Dataset, DatasetDict
raw_datasets = DatasetDict({
"train": Dataset.from_pandas(df_train),
"eval": Dataset.from_pandas(df_eval)
})

tokenized_datasets = raw_datasets.map(lambda dataset: tokenizer(dataset['text'], truncation=True), batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["text", "__index_level_0__"])
tokenized_datasets = tokenized_datasets.rename_column("target", "labels")

from transformers import DataCollatorWithPadding, TrainingArguments, Trainer
import numpy as np

# Padding for batch of data that will be fed into model for training
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Training args
training_args = TrainingArguments("test-trainer", num_train_epochs=5,
weight_decay=5e-4, save_strategy="no", report_to="none")

# Metric for validation error
def compute_metrics(eval_preds):
metric = evaluate.load("glue", "mrpc") # F1 and Accuracy
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)

# Define trainer
trainer = Trainer(
classifier,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["eval"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)

# Start the fine-tuning
start = time.time()
trainer.train()
end = time.time()
print("Training time using {} ".format(torch.cuda.get_device_name(0)), end-start)

Here’s where things start smelling funny. How is the newer, pricier card falling behind this badly?

3. Image Generation with Stable Diffusion Turbo

Now for something more GPU-intensive — generating 100 images with Stable Diffusion Turbo.

Codes used

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/sd-turbo",
torch_dtype=torch.float16,
use_safetensors=True,
)

pipe.to("cuda" if torch.cuda.is_available() else "cpu")
pipe.enable_attention_slicing()

start = time.time()

prompt = "a photo of an astronaut riding a horse on mars"
for _ in range(100):
image = pipe(prompt).images[0] # image.show() or save image if needed

end = time.time()
print("Inference time using {}".format(torch.cuda.get_device_name(0)), end-start)

Again, the 4090 dusted the 5090. Over 2x faster. So what gives?

Why Is 4090 Still the King of AI?

Let’s address the $2,000 elephant in the room:

1. Library Optimization

The software stack matters more than you think. Transformers, Diffusers, Torch — these libraries have been battle-tested on the 4090. But they’re still catching up to Blackwell’s 5090.

You need both hardware and software to be upgraded to utilize GPUs fully

2. CUDA Compute Compatibility

The RTX 5090 introduces new compute capabilities (SM 120, Hopper 120). Many older versions of PyTorch and HuggingFace tools either don’t support them, or need very specific versions to even run.

In the wild: RTX 5090 needs bleeding-edge library versions — but those versions aren’t fully tuned for it yet. Classic chicken-and-egg problem.

3. Gaming vs AI Prioritization

NVIDIA’s own marketing has been pushing 5090 as a gaming/rendering beast, showing up to 30x performance gains in real-time rendering.

But for AI? No official benchmarks yet. And that speaks volumes.

Verdict: Should You Buy RTX 5090 for AI in 2025?

If your primary job is gaming, rendering, or bragging on Reddit, the RTX 5090 is shiny, sleek, and sexy.

But if you’re doing LLM inference, fine-tuning, or generative AI, here’s your reality check:

  • 4090 wins in real AI workloads
  • Library support is mature
  • Stable, fast, and half the price (used market)

So unless you’re a future-proofing fanatic or you’re building for workflows that’ll benefit from 5090 once the libraries catch up, stick to the 4090 for now.

Let’s give the 5090 a few months to grow into its shoes.


Tested Nvidia RTX 5090 vs 4090 GPUs for AI: You Won’t Believe the Winner! was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

Data Science Interview Questions PDFs

Next Post

PlayDiffusion : Edit Audios using AI

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..