Nvidia GeForce RTX 4060 Ti Review: 1080p Gaming for $399

The Nvidia GeForce RTX 4060 Ti brings true mainstream pricing to the Ada Lovelace architecture and RTX 40-series GPUs, starting at $399 for the Founders Edition and reference-clocked models. Unfortunately, it also brings a lot of potential compromises into play, chief among them being the 128-bit memory interface and 8GB of VRAM. Nvidia has a potential solution for the capacity problem with a 16GB model planned for release in July, but it won’t address any concerns with the memory interface.

Is the RTX 4060 Ti one of the best graphics cards? That largely depends on how many games you play support DLSS 3 and whether you’re willing to trade latency for AI-interpolated FPS. Looking at native performance in our GPU benchmarks hierarchy (which will be updated later today), the RTX 4060 Ti comes in just ahead of the RTX 3070 at 1080p, but falls behind the RTX 3060 Ti at 1440p and 4K.

So the good news is that the RTX 4060 Ti is generally faster than the previous generation RTX 3060 Ti at the same price while using less power. It also supports new Ada features like DLSS 3 Frame Generation, SER, DMM, and OMM. The bad news is that it barely surpasses its predecessor overall, and design decisions made years ago are certainly at play.

Let’s dive into the spec sheet to see what the Nvidia RTX 4060 Ti offers.

Swipe to scroll horizontally
Nvidia RTX 4060 Ti and Other GPU Specifications
Graphics Card RTX 4060 Ti RTX 4060 Ti 16GB RTX 4060 RTX 4070 RTX 3070 RTX 3060 Ti RX 6750 XT RX 6700 Arc A770 16GB
Architecture AD106 AD106 AD107 AD104 GA104 GA104 Navi 22 Navi 22 ACM-G10
Process Technology TSMC 4N TSMC 4N TSMC 4N TSMC 4N Samsung 8N Samsung 8N TSMC N7 TSMC N7 TSMC N6
Transistors (Billion) 22.9 22.9 18.9 32 17.4 17.4 17.2 17.2 21.7
Die size (mm^2) 187.8 187.8 158.7 294.5 392.5 392.5 336 336 406
SMs / CUs / Xe-Cores 34 34 24 46 46 38 40 36 32
GPU Cores (Shaders) 4352 4352 3072 5888 5888 4864 2560 2304 4096
Tensor Cores 136 136 96 184 184 152 N/A N/A 512
Ray Tracing “Cores” 34 34 24 46 46 38 40 36 32
Boost Clock (MHz) 2535 2535 2460 2475 1725 1665 2600 2450 2100
VRAM Speed (Gbps) 18 18 17 21 14 14 18 16 17.5
VRAM (GB) 8 16 8 12 8 8 12 10 16
VRAM Bus Width 128 128 128 192 256 256 192 160 256
L2 / Infinity Cache 32 32 24 36 4 4 96 80 16
ROPs 48 48 48 64 96 80 64 64 128
TMUs 136 136 96 184 184 152 160 144 256
TFLOPS FP32 (Boost) 22.1 22.1 15.1 29.1 20.3 16.2 13.3 11.3 17.2
TFLOPS FP16 (FP8) 177 (353) 177 (353) 121 (242) 233 (466) 163 130 26.6 22.6 138
Bandwidth (GBps) 288 288 272 504 448 448 432 320 560
TDP (watts) 160 160 115 200 220 200 250 175 225
Launch Date May 2023 Jul 2023 Jul 2023 Apr 2023 Oct 2020 Dec 2020 May 2022 Mar 2021 Sep 2022
Launch Price $399 $499 $299 $599 $499 $399 $549 $479 $349
Current Price $399 N/A N/A $599 $442 $377 $379 $269 $349

That’s a crowded table, but the first column is the most pertinent. The RTX 4060 Ti uses Nvidia’s new AD106 GPU — the same chip found in the RTX 4070 Laptop GPU, incidentally. You can also see the block diagram for the AD106 chip and the 4060 Ti below, which we’ll get to in a moment.

The result is fewer GPU cores than the RTX 3060 Ti, but Nvidia makes up for that with significantly higher core clocks — 2535 MHz boost versus 1665 MHz. As usual, real-world clocks will exceed those values, but in terms of theoretical compute from the shaders and tensor cores, the 4060 Ti delivers 22.1 teraflops versus 16.2 teraflops for FP32, and 177 teraflops versus 130 teraflops for FP16 (with sparsity). The 4060 Ti also supports FP8 mode on its tensor cores, so if/when AI applications add support for that, it can deliver a potential 353 teraflops.

Looking at the competition based on relatively similar pricing, we have AMD’s RX 6750 XT with 12GB and Intel’s Arc A770 16GB. It’s a safe bet that Nvidia can match or exceed those cards when it comes to ray tracing performance and AI workloads — winning the latter by default since it’s often the only GPU option supported. On the other hand, rasterization performance will be a lot closer and a more interesting comparison point.

Memory capacity and bandwidth are going to be major factors in performance. 8GB of VRAM shouldn’t be a problem for most games running at 1080p, but 1440p and especially 4K could prove problematic. We talked recently about why 4K gaming requires so much more VRAM, and that applies here. Nvidia isn’t marketing the RTX 4060 Ti as a 1440p or 4K gaming solution, probably precisely due to its lack of VRAM capacity and bandwidth. That’s interesting, as the 3060 Ti and 3070 two years ago were both targeting 1440p.

Fundamentally, Nvidia had a design decision to make several years back when the Ada Lovelace architecture and chips were in the planning phase. There were plenty of other factors to consider, but the ones we’re talking about here are simply this: How wide should the memory interface be, and how much L2 cache should there be?

For the RTX 30-series, Nvidia had up to a 384-bit width on GA102 (RTX 3090 Ti down to RTX 3080), up to 256-bit on GA104 (RTX 3070 Ti down to RTX 3060 Ti), up to 192-bit on GA106 (RTX 3060 and 3050), and up to 128-bit on GA107 (mobile RTX 3050 / 3050 Ti and later a desktop 3050) — all with a 1MB L2 cache per 64-bits of interface width. For the RTX 40-series, rather than sticking with similar widths, Nvidia opted for 384-bit on AD102, 256-bit on AD103, 192-bit on AD104, and 128-bit on AD106 and AD107. The L2 cache meanwhile received a big upgrade, up to 16MB per 64 bits of interface width.

The bigger caches certainly pay off in effective bandwidth. There’s no question about that. AMD proved that larger caches were a viable tradeoff with the RDNA 2 architecture and Infinity Cache, and Nvidia is doing something similar with Ada. But bigger caches only alleviate the memory bandwidth issue. There’s still a capacity constraint with using a narrower interface, and while using “clamshell” memory on both sides of the PCB can at least partly overcome that, it’s a more costly approach and doesn’t address the raw bandwidth aspect.

Let’s be blunt: Nvidia made a cost-saving architectural design decision, and I think many gamers and enthusiasts fundamentally disagree with Nvidia’s choices on everything except perhaps AD102 and AD107. If the RTX 4080 had come with a 320-bit interface and 20GB, then 256-bit and 16GB for AD104 and the RTX 4070 Ti/4070, we’d be looking at a 192-bit interface and 12GB on AD106 and the RTX 4060 Ti. And looking at prices right now, we can’t even argue that the added cost of more VRAM and a wider memory interface would have been too high.

All indications are that the RTX 40-series GPUs aren’t selling particularly well. Part of that comes from the changing economic environment, and part of it certainly comes from the end of GPU mining. But it’s also fair to say that a big factor for many gamers and even AI researchers is the lack of tangible performance increases at various tiers, coupled with limited memory options and higher prices. Yes, there’s an RTX 4060 Ti 16GB card coming in July. It will still have a 128-bit interface and it will cost $100 extra.

AMD hasn’t been doing much better, but the 7900 XTX comes with 24GB, and the 7900 XT offers 20GB. Presumably, we’ll see a 16GB 7800 XT and a 12GB 7700 XT at some point in the future. That will be more palatable on some levels than Nvidia’s offerings, but then you’ll lose out on AI performance and support and ray tracing performance. As the market leader, Nvidia can and really should have done better than this. We had GTX 1070 with 8GB for $379 back in 2016 — 12GB for a $399 graphics card should be the bare minimum we can expect in 2023.

Here are the block diagrams for the RTX 4060 Ti, the full AD106 chip, and the upcoming RTX 4060 / AD107. RTX 4060 Ti is nearly the fully enabled chip, with the only disabled portions being one NVDEC (Nvidia Decoder) block and two SMs (Streaming Multiprocessors). In addition, there’s up to 8MB of L2 cache per 32-bit memory channel, for a total of 32MB on the 128-bit interface.

As with other Ada Lovelace chips, AD106 includes Nvidia’s 4th-gen Tensor cores, 3rd-gen RT cores, new and improved NVENC/NVDEC units for video encoding and decoding (now with AV1 support), and a significantly more powerful Optical Flow Accelerator (OFA). The latter is used for DLSS 3, and while it’s “theoretically” possible to do Frame Generation with the Ampere OFA (or using some other alternative), so far, only RTX 40-series cards can provide that feature. Given the marketing hype around Frame Generation, we also suspect we will never see it on previous-generation GPUs — it’s one of the few selling points for the 4060 Ti.

As noted already, the tensor cores now support FP8 with sparsity. It’s not clear how useful that is in all workloads, but AI and deep learning have certainly leveraged lower precision number formats to boost performance without significantly altering the quality of the results — at least in some workloads. It will ultimately depend on the work being done, and figuring out just what uses FP8 versus FP16, plus sparsity, can be tricky.

But the VRAM capacity comes up again with AI workloads. Many large language models (LLMs) benefit from lots of memory, and 8GB isn’t enough for even some “medium” sized models. So the 4060 Ti 16GB will probably find some uptake by AI researchers just because of its memory capacity — the same as the RTX 4080, for less than half the price.