You see things one way, and your camera sees things differently.
So, what is the resolution of the human eye, at which a camera would be able to truly capture an image the way the eye sees it?

The answer is 576MP, the technical breakdown is as follows:

Consider a view in front of you that is 90 degrees by 90 degrees, like looking through an open window at a scene. The number of pixels would be 90 degrees * 60 arc-minutes/degree * 1/0.3 * 90 * 60 * 1/0.3 = 324,000,000 pixels (324 megapixels).

At any one moment, you actually do not perceive that many pixels, but your eye moves around the scene to see all the detail you want. But the human eye really sees a larger field of view, close to 180 degrees. Let’s be conservative and use 120 degrees for the field of view.

Then we would see 120 * 120 * 60 * 60 / (0.3 * 0.3) = 576 megapixels.

The full angle of human vision would require even more megapixels. This kind of image detail requires A large format camera to record.

Therefore, once our cameras reach 576MP, the pictures you take with your camera will look exactly the same as you see them.

It wouldn’t directly match a real-world camera… but read on.

On most digital cameras, you have orthogonal pixels: they’re in the same distribution across the sensor (in fact, a nearly perfect grid), and there’s a filter (usually the “Bayer” filter, named after Bryce Bayer, the scientist who came up with the usual color array) that delivers red, green, and blue pixels.

So, for the eye, imagine a sensor with a huge number of pixels, about 130 million. There’s a higher density of pixels in the center of the sensor, and only about 6 million of those sensors are filtered to enable color sensitivity. Somewhat surprisingly, only about 100,000 sense for blue! Oh, and by-the-way, this sensor isn’t made flat, but in fact, semi-spherical, so that a very simple lens can be used without distortions — real camera lenses have to project onto a flat surface, which is less natural given the spherical nature of a simple lens (in fact, better lenses usually contain a few aspherical elements).

This is about 22mm diagonal on the average, just a bit larger than a micro four-thirds sensor… but the spherical nature means the surface area is around 1100mm^2, a bit larger than a full-frame 35mm camera sensor. The highest pixel resolution on a 35mm sensor is on the Canon 5Ds, which stuffs 50.6Mpixels into about 860mm^2.

So that’s the hardware. But that’s not the limiting factor on effective resolution. The eye seems to see “continuously”, but it’s cyclical, there’s kind of a frame rate that’s really fast… but that’s not the important one. The eye is in constant motion from ocular microtremors that occur at around 70-110Hz. Your brain is constantly integrating the output of your eye as it’s moving around into the image you actually perceive, and the result is that, unless something’s moving too fast, you get an effective resolution boost from 130Mpixels to something more like 520Mpixels, as the image is constructed from multiple samples.

Except you don’t. For one, your luminance-only rod cells, being sensitive in low light, actually saturate in bright light. So in full daylight or bright room light, they’re completely switched off. That leaves you 6 million or so cone cells alone as your only visual function. With microtremors, you may have about 24 million inputs at best… not exactly the same as 24 megapixels. And per eye, of course, so call it 48 megapixels if you want to draw that equivalence.

In the dark, the cones don’t detect much, it’s all rods at that point. Technically that’s more “pixels,” but your eye and brain are dealing with a low photon flux density — the same thing that causes ugly “shot noise” in low light photographs. So you brain is only getting input from rods that actually detect something.

And all of the 130 million sensors are “wired” down to about 1.2 million axions of the ganglion cells that wire the eye to the brain. There is already processing and crunching on your visual data before it gets to the brain,

Which makes perfect sense — our brains can do this kind of problem as a parallel processor with performance comparable to the fastest supercomputers we have today. When we perceive an image, there’s this low-level image processing, plus specialized processes that work on higher level abstractions. For example, we humans are really good at recognizing horizontal and vertical lines, while our friendly frog neighbors have specialized processing in their relatively simple brains looking for a small object flying across the visual field — that fly he just ate. We also do constant pattern matching of what we see back to our memories of things. So we don’t just see an object, we instantly recognize an object and call up a whole library of information on that thing we just saw.

Another interesting aspect of our in-brain image processing is that we don’t demand any particular resolution. As our eyes age and we can’t see as well, our effective resolution drops, and yet, we adapt. In a relatively short term, we adapt to what the eye can actually see… and you can experience this at home. If you’re old enough to have spent lots of time in front of Standard Definition television, you have already experienced this. Your brain adapted to the fairly terrible quality of NTSC television (or the slightly less terrible but still bad quality of PAL television), and then perhaps jumped to VHS, which was even worse than what you could get via broadcast. When digital started, between VideoCD and early DVRs like the TiVo, the quality was really terrible… but if you watched lots of it, you stopped noticing the quality over time if you didn’t dwell on it. An HDTV viewer of today, going back to those old media, will be really disappointed… and mostly because their brain moved on to the better video experience and dropped those bad-TV adaptations over time.

Back to the multi-sampled image for a second… cameras do this. In low light, many cameras today have the ability to average several different photos on the fly, which boosts the signal and cuts down on noise… your brain does this, too, in the dark. And we’re even doing the “microtremor” thing in cameras. The recent Olympus OM-D E-M5 Mark II has a “hires” mode that takes 8 shots with 1/2 pixel adjustment, to deliver what’s essentially two 16Mpixel images in full RGB (because full pixel steps ensure every pixel is sampled at R, G, B, G), one offset by 1/2 pixel from the other. Interpolating these interstitial images as a normal pixel grid delivers 64Mpixel, but the effective resolution is more like 40Mpixel… still a big jump up from 16Mpixels. Hasselblad showed a similar thing in 2013 that delivered a 200Mpixel capture, and Pentax is also releasing a camera with something like this built-in.

We’re doing simple versions of the higher-level brain functions, too, in our cameras. All kinds of current-model cameras can do face recognition and tracking, follow-focus, etc. They’re nowhere near as good at it as our eye/brain combination, but they do ok for such weak hardware.

Share this