VR’s Biggest Problem and How We Could (Maybe) Solve It
When people see themselves move somewhere in VR, while their inner ear doesn’t feel any movement, they often get motion sickness.
And this is a huge problem, because it doesn’t matter how good the gameplay or narrative or art is…if you feel like throwing up the entire time.
Yes, it’s true that you can get used to motion sickness and many VR players have.
But good luck asking most people to go through weeks of genuine discomfort just to play games that may or may not be good.
Why not just play Baldur’s Gate 3 instead? Or any of the thousand other great PC, console, or mobile games?
So, unless we want to immediately lose half of our potential audience, we end up having to either (a) not have any movement or (b) spend a lot of time designing a custom, restricted movement system that doesn’t cause motion sickness.
Either way, movement is one of the most important actions in all of videogames, so we’ll have to spend a lot of effort just to make up for what we lost by not being able to do proper movement.
Just imagine trying to make an FPS game without flanking, a MOBA without the ability to sidestep abilities, or an action-adventure game without running and jumping.
Yes, there’s always a solution. But coming up with new ideas and then getting them to work takes a lot of development time away from other things, like progression and combat systems.
So, without proper movement, it ends up being very difficult to create games with anywhere near the depth of a PC or console game.
But it gets even worse because, even if we avoid movement entirely, we’ll still have to fight a never-ending battle against motion sickness.
As it turns out, motion sickness can be triggered by all sorts of seemingly random things, like a UI popping up in the wrong way or an enemy sword bouncing around your peripheral vision.
And even small amounts of motion sickness can stack up over 30 minutes or more to create an inexplicable feeling of discomfort that players won’t necessarily know is motion sickness. They’ll just come away not enjoying your game “for some reason”, which makes testing a pain.
So, every time a developer makes a VR game, they’ll have to spend months of time just on testing for motion sickness and coming up with ways to mitigate it.
If we sum up all of the above problems and consider that VR is already a multi-billion dollar market, I don’t think it’s radical to propose that motion sickness is a billion dollar problem.
And that’s not even taking into account how much bigger the VR market could be if motion sickness wasn’t a thing and we’d be able to make much better games.
Ok, it’s clearly a problem. But what do we do about it?
A GENERAL approach
Remember that motion sickness was all about the mismatch between seeing yourself move with your eyes, but not feeling it with your inner ear.
So there are three important systems that matter when it comes to motion sickness:
The visual motion processing system that takes in inputs from your eyes
The vestibular motion processing system that takes in inputs from your inner ear
And the system that compares outputs from the two and detects a mismatch
Now, while the latter two are very important for motion sickness, we don’t really have easy access to them as VR developers:
To access the vestibular system, we’d have to exert influence inside your inner ear. That’s not impossible to do and there are already pills and vibrating devices that can do that. But we’d have to convince players to buy hardware or pop some pills, which wouldn’t be easy.
And, to access the comparison system, we’d have to affect something deep inside the brain, likely the medial superior temporal cortex (MST). And that seems way harder to access than your inner ear, especially since we don’t even know much about that system yet.
So we believe it’s best to focus on the visual motion processing system, since there’s already a screen on the player’s face and we won’t have to try and sell hardware to anyone. We already have free access to their eyes.
Our goal, then, is to trick that system into NOT seeing any movement when you’re moving around in VR, which should prevent the mismatch from occurring in the first place.
And we think that messing with this system should work, since we’re accidentally already doing that in VR — you’re not actually moving around in a 3D world, you’re just seeing pixels move on a 2D screen.
Plus, it’s also just very easy to fake movement with optical illusions, so it should (at least theoretically) be possible to cheat the system with even relatively simple approaches.
That’s why we think it’s worth putting our eggs into this basket.
If you put your cursor above one of the dots, you’ll realize that nothing is actually moving in this illusion by Dr. Kitaoka.
Visual motion processing
To trick the visual motion processing system, we’ll have to understand EXACTLY what that system sees.
It turns out that, according to the neuroscience literature, motion processing happens mostly through the magnocellular pathway to the visual cortex.
And, like any other biological system, this pathway doesn’t actually use the full, raw data that reality provides us. That would be exceptionally inefficient.
Instead, it only uses what’s needed to reliably process motion:
It only sees fast movement (high temporal frequency)
It won’t see things that are moving very slowly. We can intuitively experience this when animations stop feeling fluid when we go below around 10 frames per second. Movement below a certain speed doesn’t register as strongly.
But, it’ll also see much faster things than most other systems involved in vision. That’s why you’re able to recognize that something moved, even if you didn’t recognize, say, its form or color.
We ran some unofficial tests in VR and, yeah, very slow movement doesn’t really cause motion sickness. Neither does insanely fast movement or movement over a very short period of time.
Studies have found that around 10hz (cycles of stripes per second as they pass by your visual field) is its preferred frequency (Mikellidou et al., 2018). Above 30hz is where it starts dropping off hard (Derrington and Lennie, 1986) and above 20hz is the frequency at which most other components of your vision will stop being able to keep up (Skottun, 2013; Pawar et al., 2019; Chow et al., 2021; Edwards et al., 2021).
For some evidence of how this affects VR: a study on driving in VR found that driving at 120 mph caused around 1.4x more motion sicknes than at 60 mph (Hughes et al., 2024).
It only sees big things (low spatial frequency)
It won’t see small details, since small details that move are noisy/blurry and likely not as important as bigger objects. Specifically, the magnocellular pathway will stop responding to things above the 1.5 cycles per degree (cpd). That’s about the size of a small tree that’s 10 meters away (Skottun and Skoyles, 2010; Edwards et al., 2021).
Our unofficial tests in VR showed that the stripes that caused the most motion sickness were indeed around the 1-1.5 cpd range. Stripes smaller than that didn’t do much.
An approximation of what the motion processing system would see if it had 1.5 cpd vision.
An approximation of what the motion processing system would see if it filtered for <1.5 cpd spatial frequency (Edwards et al., 2021)
It mostly sees brightness (luminance contrast)
It (mostly) does not see color, which we can experience when some object moves by us super fast and that object will just appear as a shadow.
Color is still somewhat involved in motion processing, but brightness just seems much more important (Derrington, 2000; Aleci and Belcastro, 2016; Edwards et al., 2021).
It can apparently see brightness contrasts as low as 0.5% (Aleci and Belcastro, 2016) and responds clearly to contrasts of around 4-8% (Butler et al., 2007).
There’s also a fascinating “motion standstill” illusion, where high spatial frequency dots with high brightness contrast can completely hide the motion of color-only shapes with smooth edges (Dürsteler, 2014).
Our own experiments revealed that there was a pretty substantial drop-off in motion sickness when we went below 20% brightness contrast (in whatever units Unity uses) and an even bigger one below around 5%. Motion sickness also didn’t get much worse going from 40% to 100%.
Keep in mind that it isn’t tricked by global brightness changes, like what happens when the sun pops up from behind the clouds.
This is because the system has cells that look for brightness changes in a particular direction, as well as cells that look for brightness changes in all directions. If the latter cells spot a brightness change in all directions, they cancel out the signal from the directional cells and there’s no perception of movement (Im and Fried, 2016).
What it sees depends on speed, spatial frequency, and temporal frequency
It sees fast motion at low spatial frequencies just as well as slow motion at high spatial frequencies (Mikellidou et al., 2018)
More reading: Wichmann and Henning, 1998, O’Carroll and Wiederman, 2014
Contrast sensitivity = how easily people can tell that something is moving (Mikellidou et al., 2018)
It’s more sure of what it sees if there are lots of edges in different orientations that move together
Intuitively, there’s a pretty low chance of a bunch of random stuff all moving together in the same way, unless you’re the one moving. So the system seems to rely on a lot of stuff in different orientations moving together in a correlated way.
More on this: Diels and Howarth, 2009; Palmisano et al., 2015
Not directly related, but it probably feels much worse to move diagonally than forward in VR because you’re seeing things move along both the z- and x-axis, rather than just one. That’s double the visual information and therefore double (or more) of a mismatch with what your inner ear is feeling.
Isotropic noise should theoretically be the most effective at causing motion sickness, since it’s noise “in all orientations” and therefore makes for the most reliable signal. Image from 3Delight Cloud
It sees things in peripheral vision
It’s much better at responding to stuff that’s happening in your peripheral vision than other systems involved in vision (Baseler and Sutter, 1997).
You can experience this by testing how hard it is to see an object far in your peripheral vision when it’s not moving versus when it is. Moving objects are much more noticeable in peripheral vision, since this system is taking care of that.
An interesting example of this is that people with vision loss in their central vision appear to feel more motion sickness than people with normal vision, while people with peripheral vision loss seem to feel less of it (Luu et al., 2021).
It gets used to things after they move at a constant speed for a while
Basically, it’ll stop responding as strongly to the constant movement and start looking for smaller relative speed differences between objects in a scene (Clifford and Wenderoth, 1999).
This is probably because relaying the same “you’re going this fast” information over and over again isn’t particularly useful, so it’s mostly communicating important changes (i.e., accelerations).
One study found that this adaptation started to happen after 3 seconds at the neuron level (Wezel and Britten, 2002). Our (non-rigorous) tests found that it started to happen around 4-10 seconds of being at the same constant speed in VR.
This effect is why games where you constantly run forward, like Pistol Whip, don’t cause much motion sickness.
It sees nothing during eye movements and is most sensitive right after them
Our eyes do small, subconscious movements multiple times a second to correct for drift. Plus, they’ll also do bigger movements as you focus on new things in your environment. These eye movements are called saccades.
To prevent us from confusing our saccades with actual movements in the scene, our motion processing turns off during that small period of time (Binda and Morrone, 2018).
There’s also a moment right after a saccade where the system tries to catch up to what it missed by being extra sensitive to motion (Frost and Niemeier, 2016).
The system also obviously turns off when you’re blinking. So, if we do eye tracking to track blinks and saccades, we can use those periods to hide all sorts of stuff.
One well-known technique shifts the entire scene around during these moments, so that you can do what seems like walking forward, but actually be walking around in a circle perpetually (Sun et al., 2018).
It is, at least in part, a subconscious process that’s distinct from the rest of our vision.
A study by Levulis et al. (2025) found that even imperceptible levels of camera jitter caused motion sickness in a VR game.
We’ve also managed to cause it with camera movements (or screenshake) so small that you couldn’t tell that anything was happening.
It’s even possible to notice motion despite otherwise being blind (if this is the result of damage to a specific region in our brain). This is called Riddoch syndrome.
These findings are very important because it means we might be able to invisibly attack motion processing.
It seems to care more about things happening in the perceptual background.
If the motion occurs in what we perceive to be the foreground against a stationary background, it might cause less motion sickness (Nakamura, 2006).
The system probably does this separation because we don’t want to falsely think we’re moving if just some objects in the foreground move. But the entire background is unlikely to move unless we’re the ones moving.
The background is probably determined with a combination of depth and motion cues, like background objects being blocked by foreground objects or being far in the distance and therefore all moving at the same or slower speed than things in the foreground (motion parallax).
To what extent the motion processing uses the full spectrum of depth cues is something I don’t know, since it makes sense that it would only use the depth cues that are necessary for reliable motion processing.
And here are some other things we found:
Depth and form information might be used more when other inputs are otherwise ambigious. We can theoretically determine motion purely from depth or form, through the cues like the ones we just mentioned (objects increasing/decreasing in size as we get closer/further away from them). However, these mostly seem to be used when we can’t use simpler cues instead (Kim et al., 2022). I suspect that this is because these depth and form cues aren’t super accurate and/or because they might involve costlier, higher-order processing, so it’s best to use them only when needed.
Horizons seem to provide important, stable reference points. Namely, at least one study has shown that having a stable horizon line reduces motion sickness (Hemmerich et al., 2020). I’m not sure if this is because the system is explicity tracking a horizon using things like vanishing points or if the horizon is just treated as an obvious brightness contrast on a very large background object. I suspect the latter, based on how much more costly it would be to do extra processing all that depth information for basically the same effect.
The system seems very good at dealing with visual noise. As the theory predicts, noise of roughly a 1 cpd spatial resolution felt the most disorienting in our brief tests. But no amount of noise seemed to significantly prevent motion sickness. We only tried noise on a two dimensional layer, though. So maybe three dimensional noise with actual depth would work better, since our motion processing system may simply discount the noise as irrelevant foreground movement.
Having no brightness contrast at all might make motion sickness worse (Bonato et al., 2004).
It might be turned off if there’s diffuse red background and rapid flickering (Lee et al., 1989; Hugrass et al., 2018; Edwards et al., 2021).
It might struggle when there are lots of objects right next to each other (Millin et al., 2013; Atilgan et al., 2020).
It should shift to relying purely on low spatial frequency luminance contrasts when the scene is dark. This is because cones (that provide color input) don’t work in low-light conditions (Zele and Cao, 2015). It’s why colors become less saturated when you’re walking around in darkness.
It might discount movement when we expect it to happen or, conversely, enhance movement that we don’t expect to happen. There are some studies that found motion sickness to be worse when it was unexpected (Kuiper et al., 2019; Teixeira et al., 2022). We haven’t tested it ourselves, but many of the VR developers I’ve talked to have noticed this as well. One logical explanation is that our brains are primed to notice unexpected changes to ensure that we respond quickly. So unexpected signals may have higher weights in motion processing too.
It seems to discount movement if you move your body, perhaps even in ways that don’t really match the movement. For example, just moving your hands around during any movement seems to help prevent motion sickness. This could be because moving your hands with any kind of force also causes your head to move, which means less mismatch between your inner ear and what you see. But it might also be that your moving hands keep “telling” your brain to ignore them, through something called efference copies. Otherwise, if your hands passed right by your face, your brain might mistakenly think you moved somewhere (since your whole view is moving in that situation). And those “ignore me” messages might not be perfectly specific, so any big body movements might cause you to ignore some random motion cues. But that’s just a wild theory.
The system seems to struggle when there are harsh changes between several contrast levels, especially on small objects in our peripheral vision. For example, smooth luminance transitions aren’t as problematic as step changes and long edges aren’t as problematic as fragmented or curved edges (Kitaoka, 2003). It likely has something to do with how motion processing in our peripheral vision is updated after a saccade, given that some motion illusions trigger whenever you look somewhere else and stop working if you make the image small enough (see Dr. Kitaoka’s illusions or these illusions by Michael Bach).
It gets fooled by pure contrast changes, even if no movement happens. The illusion at the start of this post was one example, but here are some more detailed explanations of how these illusions work (Kitaoka, 2014; Rogers et al., 2019; Bach et al., 2020).
There are mathematical models of neurons in this system that could open up more avenues of attack, but we haven’t looked into them much (Adelson and Bergen, 1984; Simoncelli and Heeger, 1998; Mather, 2013; Soto et al., 2020; Eskikand et al., 2023; Clark and Fitzgerald, 2024; Su, 2025)
How to trick the system
Ok, so we have a motion processing system that only responds to the most important image features for that task. These are called motion cues.
And, critically, these cues are different from those used in other sub-systems within human vision, especially those responsible for our central vision.
Why is that? Well, because our motion processing system isn’t uniquely efficient. Every other visual processing task will also only process what it needs for the given task.
So, as long as two tasks differ to the extent that the same data can’t be efficiently reused, the input data (or cues) will be different between them.
And the processing tasks that constitute our central vision are quite different from those required to see ourselves move, so we should expect cues to be quite different too.
This means, at least theoretically, we should be able to selectively target ONLY motion cues and prevent motion sickness without significant visual disruption — at least to the extent that motion cues are different from other visual cues.
This line of reasoning gives us two general options for countering motion sickness:
Remove motion cues from the image you see, in a way that minimally affects the rest of your vision
Add motion cues from another, stationary environment, in a way that minimally affects the rest of your vision
Now, the importance of the first option is obvious, but why would we have to do the second one? Well, there are a few reasons.
The first is that completely removing all motion cues may paradoxically cause more disorientation or motion sickness, since we’re used to seeing stuff move when we move our heads.
If we were to remove all motion cues, we effectively wouldn’t see any of the cues we’d expect from moving our heads.
This does naturally happen when we’re in complete darkness, but it’s uncertain what’ll happen in this case.
And the second reason is that it probably gets harder and harder to remove the last little motion cues in an image.
So, to truly prevent all feeling of motion, we might have to use stronger and stronger removal techniques, which will cause more visual disruption.
If we bring in cues from a stationary environment instead, those might help counteract the motion cues you’re seeing on the screen. And that would let us get away with less removing and therefore less visual disruption.
But, to get back to the important part, here’s an example of what this process of adding and removing motion cues might actually look like: we already know that motion processing is mostly about brightness contrast in your peripheral vision.
So, to reduce motion sickness, we could remove ONLY BRIGHTNESS CONTRAST in your PERIPHERAL VISION.
We wouldn’t touch color or your central vision, so you’d still be able to see stuff. Objects in your peripheral vision would just all be at the same brightness level.
Then we could also add brightness blobs from a stationary environment, so that you’d get the motion cues you’d expect if you were just chilling in your room and moving your head around.
This combination of techniques would theoretically remove motion sickness in a way that isn’t very perceptible.
From top to bottom: a color gradient with all brightness differences removed > a more normal gradient > greyscale version of gradient with no brightness differences > greyscale version of normal gradient. Note how we just removed brightness without touching color. Image source: Ottoson, 2020
So the general idea is to:
Do further research into what exactly counts as a motion cue
Then go through each cue and figure out ways to add or remove it
While doing so in a minimally noticeable way.
And here are a few more examples to illustrate what that could look like:
The system “sees” low spatial frequency data = use a high-pass filter to remove those low frequency motion cues from the image. And then add low frequency cues from a stationary room to maximally communicate that you’re not actually moving anywhere.
Sees high temporal frequency data = remove stuff that’s moving fast from the image and then add fast-moving stuff from your stationary room (can be very, very fast)
Sees directional brightness changes = make brightness blobs move in the exact opposite direction of where the objects in the scene are moving (or use the earlier optical illusion to counteract the direction of movement)
Left = applying a high-pass filter to remove low frequency information. Right = if we now apply a blur that represents how your motion processing system sees things, it no longer sees anything from the filtered image.
If we generalize even further, our goal is simply to make sure that the following basic equation holds true:
Where Smov is the total strength of motion cues from the moving game scene, and Sstat is the total strength of motion cues implying that you’re stationary.
Depending on how the comparison between visual and vestibular inputs actually works, it’s also possible that we’ll also need Smov to be below some threshold, even if there’s enough stationary cues to theoretically counterbalance it.
Regardless, if we satisfy the above equations, we should no longer have a mismatch strong enough to cause motion sickness.
How to trick the system V2
But, all that being said, machine learning methods have proven to be way, way better at finding hidden patterns than we humans are.
So, if we do manage to find some set of parameters that reduce motion sickness without too much visual disruption, it’s almost certain that machine learning will be able to do it much better — and it might be the only path to a completely imperceptible and 100% effective solution to motion sickness.
But there’s a problem. For a machine learning model to learn the right patterns, we’ll need a lot of video footage of:
Scenarios that cause motion sickness vs. scenarios that don’t
Scenarios that have obvious visual disruptions vs. scenarios that don’t
And it’ll be a huge pain to get that data, especially data on which scenarios cause motion sickness.
The reason is that motion sickness stacks up slowly and people can get used to it, so it’s hard to study, especially when you can’t properly control conditions.
And then it’s also just hard to get a lot of people that both have headsets and are willing to feel like throwing up for the sake of a study.
So the best option is likely having headset manufacturers continually record video of what the player sees in a game and then save, say, the last 10 seconds when a player experiences motion sickness.
The question then becomes: how do we tell when players felt motion sickness?
Well, here are a few options I briefly thought of:
Add new sensors to devices or use existing sensors to track physiological signs of motion sickness (like heart rate variability or abnormal eye or head movements). Although, there’s a strong chance that these aren’t correlated enough with motion sickness to provide a proper signal for training a model.
Add a “panic button” that players can press the moment they start feeling motion sickness to escape into a re-orienting scene (and thereby giving us an example scenario that caused motion sickness). But, having observed a lot of people get motion sickness, I suspect that players will just take their headset off instead.
Or add a simple “did you get motion sickness?” question the first time a player quits a new game (or suddenly removes their headset), which is probably the easiest and most reliable option.
If we can’t get data from a headset manufacturer, we might still be able to record our own game footage and simply label each clip according to whether the game as a whole causes motion sickness for most players.
A decent approximation for this is the comfort label found on the Meta store. At least in my experience, they match reality decently well.
Of course, labelling clips like this would make for some very noisy data, but there’s a small chance it might be good enough.
Then, regardless of how we get the motion sickness label, we’ll also have to attach further data on controller and headset positions to make sure the model learns which types of visual-vestibular mismatch unexpectedly don’t cause problems.
But, even with high quality labels and controller data, we still might not get a good enough signal for training a model.
That’s because, as mentioned earlier, subtle levels of motion sickness usually aren’t noticeable until it stacks up to some threshold after tens of minutes, after which it’ll quickly become overwhelming.
This delay between motion sickness onset will probably cause a lot of noise, since a large portion of footage that isn’t labelled for motion sickness will actually cause it eventually.
That being said, training should still be able to extract a pattern if there’s enough signal to outweigh the noise and biases in the dataset — which will probably require a lot of footage from a lot of different games.
And, just to make things clear, the exact signal we’re looking for is the voodoo magic behind anomalies that developers have discovered through trial and error.
One example is Until You Fall’s miraculous ability to use noticeable screenshake without causing motion sickness. Nobody knows how they did it, since screenshake is usually a guaranteed path to instant motion sickness.
Another anomaly is the finishers in Batman Arkham Shadow, which move the camera a crazy amount without being triggering as much motion sickness as you’d expect.
Cases like these (and many more we don’t know about) are where ML might find hidden patterns that we have no hope of finding manually.
As for data on which artefacts are noticeable to players, that’s comparatively easy. We can just pay people to watch videos and label them based on whether they noticed anything off (or rely on existing models trained to spot visual artefacts).
But it’s possible that 2D videos don’t adequately capture everything you’d see in VR, so testing might have to be done using VR headsets instead. That’d more costly, but still a solvable problem.
Regardless, once we eventually do have all that data, we have a lot of options.
The one that first comes to mind is to train a model to determine whether a scene causes motion sickness (a motion sickness discriminator) and then another that can spot things looking “off” (a visual artefact discriminator).
Then we can train a third model to generate noise on top of videos (a motion noise generator) and reward it when this noise successfully convinces the motion sickness and artefact discriminators.
What we’d end up with, at least theoretically, is a noise generator that generates the exact right type of noise to counter motion cues in a minimally perceptible way.
After that, we have a few options, one of which is adding the noise on top of the player’s screen in real-time. This will likely yield the best effect, but we’ll probably have to spend a lot of time optimizing the noise generator to run on VR hardware.
Alternatively, we can also try to decipher roughly which cues the noise generator found most effective, and then manually filter those out from the image. This might be less computationally expensive, but also less effective.
Now, you might’ve noticed that this approach is very similar to how AI researchers have attacked image recognition neural networks in the past.
Basically, they can get a neural network to, say, falsely “see” a cat when shown an image of a dog. And this is done by adding some noise that’s imperceptible to humans, but completely messes with the network.
So the idea is to do exactly that, but imperceptibly mess with the human motion processing network instead.
All that being said, the data that we need for this approach doesn’t yet exist (as far as I’m aware) and the only organizations capable of getting high quality data are likely hardware providers like Meta and Valve.
Since we aren’t either of those companies, we’ll probably have to instead stick to manually understanding and selectively removing every type of motion cue.
Current Progress
But do these approaches work? Well, we’re still at the very start of our research journey.
To summarize our current progress: since we don’t have the data for the machine learning approach, we’ve started going down the list of motion cues and figuring out ways to selectively filter them from the image you see in VR.
There are some promising early signs and I’d say we’re on track to beat the existing state of the art (not that you should believe me).
However, existing techniques aren’t exactly great, so beating them isn’t anywhere near enough.
They solve (very roughly) 50% of the issue and we’ll likely have to get to over 90% to truly enable proper movement in VR games.
Regardless, we won’t know if the approach works for a while, since it’ll take a long time to test everything with our limited resources.
So, if you’d like to help us with this research, feel free to send me an email.
Our first iteration of a “brightness vignette” technique that removes ONLY brightness contrast from the player’s peripheral vision. It (mostly) leaves color and detail alone, unlike a traditional vignette. Top shows the technique at work (it’s much less obvious in VR) and bottom shows what it looks like in greyscale.