Nvidia GeForce RTX 3090 and GA102: Everything We Know – Toms Hardware

Loading ....

The Nvidia GeForce RTX 3090 is now confirmed as the next halo graphics card from Team Green, thanks to Micron’s inadvertent posting of memory details (the PDF is now removed). With that piece of knowledge, we’ve dissected the rest of what we expect to find in the RTX 3090. Nvidia has a countdown to the 21st anniversary of its first GPU, the GeForce 256, slated for September 1. The battle for the best graphics cards and top of the GPU hierarchy is about to get heated.

We’ve talked about the Nvidia Ampere and RTX 30-series as a whole elsewhere, so this discussion is focused purely on the GeForce RTX 3090. Let’s dig into the details of what we know about the GeForce RTX 3090, including the expected GPU and memory specifications, release date, price, features, and more.

First, the GeForce RTX 3090 branding is the first 90-series suffix we’ve seen since the GTX 690 back in 2012. That was a dual-GPU variant of the GTX 680, but based on the Micron documentation, RTX 3090 will still be a single GPU. Spoiler: multi-GPU support in games is practically dead, at least on life support. Why bring back the 90 branding? Simple: It opens the door for a new tier of performance and pricing. That’s not good news for our wallets.

We discussed the Micron inadvertent posting of details and more in a recent Tom’s Hardware show, which you can view below. Let’s dig into the details.

Nvidia GeForce RTX 3090 At A Glance: 

  • 12GB GDDR6X at 21Gbps (1TBps)
  • More SMs, cores, and 21 TFLOPS (e.g., could be 84 SMs at 2GHz or 118 SMs at 1.4GHz)
  • 7nm part should be much more efficient than Turing
  • Release Date: RTX 3090 unveiling expected on September 1, 2020
  • Price: RTX 3090 could cost a lot, but no confirmed price yet

(Image credit: Micron)
Nvidia GeForce RTX 3090 Potential Specifications
GPU GA102
Graphics Card GeForce RTX 3090
Process (nm) 7
Transistors (billion) 30-40?
Die Size (mm^2) 500~700?
SMs 84~118?
CUDA Cores 5376~7552?
RT Cores 84~118?
Tensor Cores 336~472?
Boost Clock (MHz) 1400~2000?
VRAM Speed (Gbps) 21 (GDDR6X)
VRAM (GB) 12
Bus Width 384
ROPs 128?
TMUs 672?
GFLOPS FP32 19~21?
RT Gigarays ~40?
Tensor TFLOPS (FP16) 600?
Bandwidth (GB/s) 1008
TBP (watts) 250~350??
Launch Date September 2020
Launch Price $1,500~$2,000??

Nvidia GeForce RTX 3090 GDDR6X Memory

The Micron posting gives us one extremely concrete set of data. Unless Nvidia changes something between now and the unveiling, the GeForce RTX 3090 will have 12GB of GDDR6X memory clocked at somewhere between 19-21 Gbps per pin. Let’s be clear: It’s 21Gbps.

Nvidia’s GTX 1080 Ti was the first 11GB GPU, and it was a surprise. Nvidia had multiple references to build off: Turning the dial to 11, 11GB, 11Gbps clocks. The same applies to 21Gbps. This is the 21st anniversary of the GeForce 256, the “world’s first GPU” according to Nvidia, who coined the GPU acronym for the occasion. There’s also a 21 day countdown going on right now. Add that to the specs from Micron and 21Gbps is effectively confirmed. If I’m wrong, I’ll eat my GPU hat.

[Disclaimer: I don’t have a GPU hat. Someone will need to send me one to munch on if Nvidia uses something other than 21Gbps.]

This is a big deal, as it’s the first time a GPU will have over 1TBps of memory bandwidth while using something other than HBM2 memory. (AMD’s Radeon VII has 1TBps as well, via 16GB of HBM2.) We don’t have exact details on how much companies pay for HBM2 vs. GDDR6X, but there’s a big premium with HBM2 — you need a silicon interposer, plus the memory itself costs more.

To put this in perspective, the RTX 2080 Ti ‘only’ has 616GBps, so this is effectively a 64% boost in memory performance. That leads into the rest of the GPU specs, but let’s fist point out that the RTX 2080 Ti has 27% more memory bandwidth than the GTX 1080 Ti. It also has 20% more theoretical computational performance (TFLOPS), and architectural updates mean it makes better use of those resources. In short, GPU TFLOPS is often scaled similarly to bandwidth.

The A100 proves that Nvidia can go very big if it wants to, but consumer chips will be much smaller. (Image credit: Nvidia)

Nvidia GeForce RTX 3090 Performance

As we’ve already pointed out, the move to 21Gbps GDDR6X increased raw memory bandwidth by 64% relative to the RTX 2080 Ti. That means we also expect the RTX 3090 to deliver around 50-75% more computational performance. You know what would make for a nice target? 21 TFLOPS. Yeah, baby! How it gets there isn’t critical, but there are a few options.

We know from the Nvidia A100 that Ampere can reach massive sizes on TSCM’s 7nm process. It’s an 826mm square package, which is relatively close to the maximum reticle size — you can’t make a chip physically larger than the reticle. The GA100 at the heart of the A100 also supports FP64 (64-bit floating-point) computation, which is necessary for the target market of scientific research. GeForce cards don’t need FP64 and typically only have 1/32 the performance in FP64 vs. FP32 instead of the 1/2 performance found in the bigger GP100, GV100, and GA100 chips.

Option one is that Nvidia strips out all the FP64 functionality, adds ray tracing RT cores in its place, and still ends up with a big chip that has up to 128 SMs. This is more or less what happened with the Pascal generation: GP100 used HBM2, GP102 used GDDR5/GDDR5X, but both had a maximum configuration of 3840 FP32 CUDA cores. Some of these would end up disabled to improve yields via binning, but if Nvidia goes with 118 SMs and 7,552 CUDA cores, then clocks the chip at 1.4GHz (boost), it would have a theoretical performance of 21.1 TFLOPS. That’s a 57% improvement over the RTX 2080 Ti, so well within the expected range.

Option two would result in a smaller chip that might end up using more power, but end up with the same level of performance. We could fudge the numbers in a variety of ways, but let’s go with six clusters of 15 SMs each, or 90 SMs total, and then disable six SMs. That leaves 84 functional SMs and 5376 CUDA cores. Clock those at 2.0GHz and Nvidia would have a 21.5 TFLOPS chip, and this could be tuned via clock speed adjustments.

Option three (and there are more, but we won’t belabor the point): GP102 could have a maximum configuration of eight clusters of 10 SMs, and disable eight to end up at 72 SMs and 4608 CUDA cores. This is the same number as Titan RTX, if you’re keeping track. If Nvidia can clock these cores at 2.3GHz, it would end up at the same ballpark target of 21.2 TFLOPS.

The point isn’t the exact SM and shader core counts. It’s the combination of core counts and clockspeeds and architectural efficiency that determines the final performance. It’s entirely possible that clocks will be lower but architecture changes still end up yielding higher performance. Just look at AMD’s Navi 10: the Radeon RX 5700 XT has 9.8 TFLOPS of compute and 448GBps of bandwidth, while the Radeon VII (Vega 20) does 13.8 TFLOPS and has 1028GBps of bandwidth. In practice, the Radeon VII across our complete test suite is 4% faster than the 5700 XT … and uses 50W more power.

The Ampere GA100 dwarfs Nvidia’s previous GPUs, with 2.5X as many transistors as GV100. (Image credit: Nvidia)

GeForce RTX 3090 Ray Tracing and DLSS

The GeForce RTX 3090 will of course support ray tracing and DLSS as well, via RT cores and Tensor cores. These will be 2nd gen RT and 3rd gen Tensor (most likely), and exact core counts and performance aren’t known.

Nvidia did reveal that the 3rd gen Tensor cores in the A100 are four times as fast as the gen1 and gen2 Tensor cores in Volta and Turing. Instead of eight Tensor cores per SM, the A100 has four Tensor cores per SM, but it still delivers twice the performance (or more, with sparsity) per core.

The RT cores could certainly see a similar boost in performance, and in fact it’s hard to imagine a world in which Nvidia’s 2nd gen RT implementation doesn’t ‘fix’ some of the design from the 1st gen parts. However, while rumors have been floated about a 4x increase in RT performance, none of that is confirmed, and the similarity to the Tensor core improvement makes me skeptical.

The black box design of the ray/triangle BVH hardware used in Turing is quite different from the Tensor clusters. Tensor cores are basically lots of matrix multiplication hardware, while RT cores are far more specialized. Again, we fully expect Ampere to have superior ray tracing performance, and it might even be four times (or more!) faster per RT core. Or it might be twice as fast or some other result.

We’d bet on it landing at the higher end of the spectrum, though, because long-term we want games that can do ray-tracing for everything, like in Minecraft RTX (or sort of like Control), rather than just using ray tracing for shadows or reflections. That means game developers need more RT performance, at least double what we’ve got in Turing. Or maybe not; increase use of DLSS, and maybe DLSS 3.0, could make up for a lot of deficits in raw processing power.

(Image credit: Shutterstock)

GeForce RTX 3090 Power Requirements

Rumors have surfaced, for example from Igor’s Lab, that the RTX 3090 will have a TDP of 350W. Again, that seems very excessive. GA102 will move to 7nm from 12nm, which should allow for improved performance and less power. Just look at AMD’s Zen+ vs. Zen 2 CPUs as an example: Performance improved and power requirements dropped by 30% for the 8-core parts. Also note that that “very good but secret source” said the RTX 3090 would have 24GB of GDDR6X, and we now know that’s incorrect.

Nvidia currently has two variants of it’s Nvidia A100 product: the HGX version for scaling out, and the PCIe version that works in a standard PCIe x16 slot. The HGX models support NVSwitch, allowing up to eight GPUs to work together almost as a single massive GPU. The PCIe slots have to make do with NVLink connecting at most two GPUs, still at the same 600 GBps for the two GPUs, or else drop to much lower interlink speeds of 64GBps (PCIe Gen4) for the PCIe slot.

All of those NVSwitch links contribute to the higher 400W TDP for the HGX parts, while the PCIe part only has a 250W TDP. Nvidia claims that for apps that run on a single GPU (like games, for example), the PCIe card delivers 90% of the performance of the HGX card. It’s only in multi-GPU workloads that it starts to fall behind.

There are also rumors of a new 12-pin PCIe power connector. The idea would be to cut the number of connectors down, so instead of three 8-pin PEG the RTX 3090 could use two of some newfangled 12-pin PEG connector. This would be even worse in our opinion. Lots of people have PSUs with three or more 8-pin PEG connectors available. Moving to a 12-pin connector would require new PSUs, or — even worse — using converters that might take two 8-pin PEG and spit out a 12-pin. Yuck. We really hope the power and connectors end up being fake, but stranger things have happened.

GeForce RTX 3090 Release Date

This is perhaps the easiest one to answer. Nvidia CEO Jensen Huang is slated to deliver a keynote at the virtual Gamescom on September 1, 2020. There’s an “Ultimate Countdown” currently ticking away. It started at 21 days, and mentions 21 years, the anniversary of the GeForce 256. Again, 21Gbps GDDR6X seems inevitable.

Maybe Jensen will reveal the RTX 3080 instead. Maybe it will be called the GeForce RTX 3090 Ultimate, because “ultimate” is for sure better than “super.” All will be made clear, or at least clearer, on September 1.

Will the cards actually launch on that date? Probably not. Nvidia revealed the GeForce RTX 20-series on August 20, 2018 at Gamescom. The cards actually went on sale on September 20, 2018 for the RTX 2080, September 27 for the 2080 Ti, and October 17 for the RTX 2070. As much as we hate soft launches, that appears to be the way Nvidia is going. Maybe we’ll be surprised, but we expect products to go on sale several weeks after September 1.

GeForce RTX 3090 coming in September (we're pretty certain)

(Image credit: ShutterStock)

How Much Will GeForce RTX 3090 Cost? 

You can listen to our above speculation on the pricing of the GeForce RTX 3090, and the fact is no one knows for sure what it will be right now. Here’s my argument for why I think it’s going to set a new record for a GeForce card. I’d really love to be proven wrong, but I’m cynical about GPU prices these days after seeing the RTX 2080 Ti never really sell at $1,000 in any reasonable quantities, even two years after the launch.

First, the RTX 2080 Ti is a $1,200 part, and RTX 3090 is reintroducing the -90 model number. The last time we had such a part was the GTX 690 back in 2012, and it was priced at $1,000, exactly double the price of the GTX 680. Nvidia drew a lot of criticism for the generational increase in pricing over the past couple of launches. The GTX 970 launched at $330, the GTX 1070 was $380-$450, and the RTX 2070 was $500-$600. What do you compare an RTX 3090 with?

Answer: Nothing. GTX 690 is too far in the past. But the RTX 3090 is logically a tier above the RTX 2080 and RTX 2080 Ti, and that means it can be priced higher than the RTX 2080 Ti as well. Which would be terrible for anyone hoping to buy one, who lives on a reasonable salary, but Nvidia’s halo cards have typically been a small fraction (less than one percent) of the total market. It’s all about bragging rights.

Depending on how big Nvidia chooses to go — in terms of die size, performance, and TDP — it’s not a stretch to imagine launch pricing of anywhere from $1,500 to $2,000. Maybe it could be a bit lower, but that will depend on what AMD’s Big Navi brings to the table. Intel’s Xe HPG isn’t coming until 2021, though, so there’s no pressure on that front.

What’s more, $2,000 would be a step below the current Titan RTX, which sits at $2,500. And since the RTX 3090 is ‘only’ a 12GB card, there’s still potential for Nvidia to do a new Titan RTX 3000 card with 24GB of GDDR6X and a fully enabled GA102 GPU. Or if AMD is really competitive, it might even put GA100 into a consumer model, but that seems quite the stretch since it lacks RT cores. Maybe a Titan V2?

How much are enthusiasts willing to pay for bragging rights? We’ve seen $2,000 CPUs from Intel in the past, and Nvidia’s Titan line has inflated from $1,000 to as high as $3,000. We maintain that no one should buy a Titan for gaming or bragging rights, though there are potentially some professional workloads where it could (maybe) make sense. Maybe it will also be $2,100? There’s that darn 21 again!

(Image credit: ShutterStock)

Nvidia GeForce RTX 3090: The Bottom Line

There are plenty of unknowns, and since there pre-orders aren’t available right now, no one can actually buy the GeForce RTX 3090. We’ll keep an eye on things, and once the September 1 ultimate countdown ends, we’ll update the question marks and blank spaces.

Until then, given what we do know, specifically the 12GB of GDDR6X memory clocked at 21Gbps, we expect the GeForce RTX 3090 will deliver on the performance front if nothing else. The move to 7nm, GDDR6X, and Ampere architecture enhancements could easily make this the biggest jump in performance we’ve seen since the days of Maxwell (GTX 900 series), maybe even bigger.

Is that enough to warrant a jump in pricing? Let’s hope not. Let’s hope AMD and maybe even Intel can respond with enough competition to keep things in check. Otherwise, PC gamers are going to be sorely tempted by the next-generation consoles. If a PlayStation 5 costs $500 and an Xbox Series X costs $600, conceivably you could buy two, three, or maybe even four of those to share with friends and family for the price of a single RTX 3090.

21 Gbps GDDR6X. 21 TFLOPS. 21st anniversary of the first GPU. It’s almost like it was destined to be. Tune in early next month and we’ll see what happens.

 

Loading ....