Co.Design

The Magic Number For Making Virtual Reality Feel Like Reality

Motion controls. VR goggles. Whatever. Let’s get down to brass tacks: How fast do these need to be to feel real? Twenty milliseconds or less.

Kinect can be fun, but the experience rarely feels real. You’re always cognizant of your body, not just as your body, but as a controller--a once-removed human input device. The same has been true about every natural movement system to date--PS Move, the Wiimote, all those VR headsets that were popular in the '90s and the AR apps that are on our iPhones today.

So why don’t they work? Graphics? UI? Maybe. But the big-elephant-in-the-room problem is horsepower. It’s speed. (Or maybe better put, it’s latency.) What’s the delay between when you move and when you see that you’ve moved on the screen?

Valve’s Michael Abrash recently wrote an epic breakdown of latency and it’s hurdle in realistic UI. And despite being a few thousand words in length, it’s a must-read for anyone in the space. He starts with the obvious, that “if you don’t have low enough latency, it’s impossible to deliver good experiences.”

To move your mouse and see it on the screen, he points out, that’s a 50-millisecond delay. So most of us would probably conclude, 50 milliseconds is great! 50 milliseconds feels perfect! That’s the sweet spot. The problem is, he explains, that human perception has entirely different standards for virtual and augmented reality applications.

“I can tell you from personal experience that more than 20 milliseconds is too much for VR and especially AR, but research indicates that 15 milliseconds might be the threshold, or even 7 milliseconds,” Abrash writes on his blog.

Seven milliseconds. Think about that number for a moment. That’s .007 seconds--no time at all in its own right--but even worse, an impossibly short period of time, as LCDs alone can take that long to refresh an image at the pixel level (with no processing included). Indeed, if the true threshold of human perception is seven milliseconds, we’re in trouble for augmented and virtual reality. If it’s 15 milliseconds, we’re a lot better off.

But why do we need so much speed? Why do we have grander expectations for these reality-based experiences than our gamepad or mouse-based first-person shooters? The short answer is that we’re all experts on what being human feels like. But that’s a cop-out, isn’t it? The better, far more technical answer is that natural human motions--as simple as turning one’s head--are incredibly ripe with information.

Suppose you rotate your head at 60 degrees/second. That sounds fast, but in fact it’s just a slow turn; you are capable of moving your head at hundreds of degrees/second. Also suppose that latency is 50 milliseconds and resolution is 1K x 1K over a 100-degree FOV. Then as your head turns, the virtual images being displayed are based on 50 milliseconds-old data, which means that their positions are off by three degrees, which is wider than your thumb held at arm’s length. Put another way, the object positions are wrong by 30 pixels. Either way, the error is very noticeable.

Was that technical enough for you? Abrash goes even deeper. He breaks down the processing, rendering, and display pipeline that video-game companies worry about for every frame they put on your TV. He shows in simple math just how far we are from these “natural” experiences that really feel natural.

Eventually, Abrash settles on the magic number of 20 milliseconds as a goal for virtual and augmented reality experiences. In a best-case scenario, he estimates we can reach about 36 milliseconds today in the best conditions (though in reality, motion control rigs like Kinect may hover closer to 150 milliseconds!). So that means we’re 16 milliseconds away from true, simulated experiences--or as he puts it, “a long way from 20 milliseconds, and light-years away from 7 milliseconds.”

So what can we do until then, until the technical breakthroughs come through that make displays and processing faster to build the feel we want? I think we’re entering the era of almost-virtual. Look at the way Harmonix handled Kinect in their Dance Central series. Rather than display a dancer’s 1:1 movements on screen, they used a pre-animated avatar that merely glows in the areas a real dancer is off.

Because Harmonix designed around this inherent latency, Dance Central continues to be Kinect’s crown jewel. We need such clever design to bridge the gap to augmented realities, to fool us into thinking that we’re dancing or flicking in true, real time. And according to Abrash, we may need these solutions for a long while to come.

Read more here.

[Hat tip: Opposable Thumbs]

[Correction: Due to an editing error, a previous version of this story measured the delay in microseconds, rather than milliseconds. We apologize for the mistake.]

[Illustration via Shutterstock]

Add New Comment

2 Comments

  • Wmacura

    I'm pretty sure you meant milliseconds. A microsecond is 0.001ms (a millionth of a second).

  • guest

    Interesting article, but with a confusing misuse of 'microsecond'.  A millisecond is .001 seconds, and is abbreviated as ms.  A microsecond is .000001 seconds, and is abbreviated as μs.