The Future Of
Photography Is About Computation
Throwing processing
power at raw images lets smartphones and cameras do some amazing things—and the
best is yet to come.
Since the birth of photography almost 180 years ago, the
relationship between a photographer and a camera has remained mostly unchanged.
You open a shutter and capture an image. Though you might manipulate lenses,
exposures, and chemicals—or, in recent years, bits—there was a nearly
one-to-one relationship between what the lens saw and what you captured. But
you've likely taken thousands, if not tens of thousands, of pictures in recent
years that break that relationship without knowing it.
Computational photography takes a swarm of data from images or
image sensors and combines it algorithmically to produce a photo that would be
impossible to capture with film photography or digital photography in its more
conventional form. Image data can be assembled across time and space, producing
super-real high-dynamic range (HDR) photos—or just ones that capture both light
and dark areas well. Multiple cameras' inputs can be fused into a single image,
as on some Android phones and the iPhone 7 Plus, allowing for crisper or richer
images in a single shot and a synthetic zoom that looks nearly as good as one
produced via optical means.
But as much as computational photography has insinuated itself
into all major smartphone models and some standalone digital cameras, we're
still at the beginning. Google, Facebook, and others are pushing the concept
further, and researchers in the field say there are plenty of new ideas
circulating that will make their way into hardware—mostly as part of
smartphones, the biggest platform for taking pictures and leveraging innovative
imaging techniques.
The coming developments will allow 3D object capture, video
capture and analysis for virtual reality and augmented reality, better
real-time AR interaction, and even selfies that resemble you more closely.
In recent years, we've watched the just-good-enough cameras in
smartphones become better-than-good-enough, eating the heart out of what was
once a fast-growing market for point-and-shoot digital cameras. While
smartphones can't beat the combination of lens, high-count sensors, and other
factors that make digital single-lens reflex (DSLR) cameras the pinnacle of the
market, they continue to creep up the curve, with computational photography
providing some of the tricks.
When HDR first appeared in the iPhone's iOS 4.1 release in 2010,
it followed a typical practice by professional and serious photographers of
bracketing shots: taking multiple images manually or with automatic settings at
different exposures or other settings. Before image-editing software,
photographers would pick among their photos and sometimes use darkroom techniques
to combine them. Photoshop and other apps could mix multiple exposures of the
same space to great effect, and some iOS apps were already offering this as a
feature when iOS 4.1 shipped.
Having HDR built directly into a smartphone OS transformed it from
a trick into a mainstream technique, even though the early versions weren't
great. (Android followed the iPhone's lead and added it as a core feature.)
Apple gradually shifted from capturing three bracketed images to what photo app
developers tell me is a much more elaborate set of captures and adjustments
that are analyzed and fused in software to produce the HDR result.
And that's where things mostly stalled for years, despite a
proliferation of academic investigation. Gordon Wetzstein, a professor who
leads the Stanford Computational Imaging Group, an interdisciplinary research
group at Stanford University, says that of hundreds of papers in the field on
computational photography, it "boils back down to one, two, maybe three
different incarnations that end up being simple enough that they're actually
useful." This is partly because of power constraints, phones' and cameras'
form factors, and other elements that limit practical use.
Adding multiple rear-facing
cameras was an idea that kicked around for quite a while. While the first
dual-camera phone shipped was the HTC One (M8) in early
2014, its abilities were ahead of the software and image-processing
hardware. The potential started to be realized with the Huawei P9 (April 2016),
which combined color and grayscale cameras, and Apple's iPhone 7 Plus (October
2016), which has a wide-angle and nominally telephoto pair. In both cases, the
multiple cameras' images capture different aspects simultaneously, which
software combines for an arguably better result.
With two cameras combined
with software that performs object recognition in scene, a system can extract
depth. The iPhone 7 Plus uses this with Apple's still-in-beta Portrait mode,
which fillets a subject in the foreground from all the background layers,
allowing it to pleasingly blur the background and thereby create the effect
known as bokeh.
This look simulates the one that a photographer would previously get by using a
DSLR paired with a lens with a very short depth of field.
Wetzstein notes the potential for the depth recognition to have an
impact behind photographic effects. By analyzing objects in a scene by depth, a
two-camera system could automatically produce better pictures, building on the
face, smile, and blink recognition features that are standard in cameras and
smartphones today.
But if two lens/camera
combos are good, surely more are better? Researchers have tested
cobbled-together multi-input systems, sometimes quite elaborate, as with the Stanford Multi-Camera Array, which
sported 128 separate cameras. These were fixed installations and not practical
for commercial (or amateur) use.
The low cost of
smartphone-size lenses may change that. Instead of using a single, large
expensive lens, as on a DSLR, performing computation on photos collected from
many smaller lenses and integrating the results computationally could achieve
high-quality results. This is the thinking behind the L16, cited by Wetzstein as an example.
It's a camera made by a company simply called Light, with 16 camera elements
across three focal lengths. (The $1,700 device isn't shipping yet and its
preorder allotment sold out.)
Depending on lighting and zoom factor, the L16 fires off a different
combination of 10 of those lenses across three focal lengths to fuse a
52-megapixel image using a package not much bigger than a smartphone or typical
digital snapshot camera. It may be a gimmick or it may be a way to pack a
wallop in one's pocket; we'll know when it hits photographers' hands.
A different hardware
approach brought Lytro to the market, a single-lens camera that could
refocus a photograph after it was taken and produce 3D images. Lytro's
technology relies on a large image sensor, the elements of which were grouped
into super-pixels that allowed its software to capture a light field,
effectively knowing the incoming direction of light as it hit the sensor. This
light field could be reconstructed by its software later. The system never caught on in
either its original prosumer or later professional model, and the
company adapted its approach to VR capture hardware.
Here's a refocusable photo taken with Lytro's ultimately
unsuccessful consumer camera:
Rather than capturing light
fields or combining image data, some experimental efforts in the hands of
developers rely on a synchronized infrared (IR) sensor that captures depth
information. Google's Tango is
a practical testbed for this approach, allowing the capture of structured light
and time of flight.
Structured light relies on projecting a pattern onto a scene that
a sensor then reads and uses to estimate distance and surface displacement.
Time of flight, by contrast, measures the time between projecting a signal and
its reflection, omitting a grid and providing more direct measurement. IR is
invisible to the naked eye, and is most commonly used.
Microsoft's Kinect sensor add-on for the Xbox started with structured
light and shifted to time of flight, and in both versions were the first
mainstream uses, but in a fixed location and for a single purpose: capturing
motion for gaming and other inputs. Tango, while still a work in progress
relevant to developers rather than the masses, brings the technology to mobile
devices in a practical form. It's already available in Lenovo's Phab 2 Pro smartphone.
At first glance, these types of depth-finding may not seem to meet
the definition of computational photography. In effect, an IR sensor (paired
with an emitter) is a camera, paired with a standard photographic camera to
build a depth and object map.
Any method of obtaining depth plays right into advancing augmented
and virtual reality systems and practicality by allowing a mobile device to
better identify what's in its visual field. The more immediate benefit is for
AR: Overlaying an existing scene with information requires vastly less
computational power than generating VR's full-blown 3D graphics and letting
people interact with that world.
Wetzstein says that structured light is a power-hungry technique
because it requires the constant projection of a grid. Time of flight should
have greater impact, but he says it will require years more development to make
it fully capable.
3D VR photographic capture
could come at some point from a combination of multiple lenses and depth
perception, but probably not any time soon. Wetzstein says that although phones
can capture panoramas easily enough, creating both video and stereo panoramas
that can be stitched together and remain synchronized currently requires gear
in the $15,000 to $30,000 range, such as that used with Facebook 360 and Google Jump, relying
on more than a dozen cameras and huge apparatuses.
Besides its role in AR and VR, computational photography could
help solve much more routine problems by marrying itself with computer vision
(the study of machine-based perception) and machine learning (teaching machines
to recognize what they perceive).
By better analyzing the
contents of a scene, photo software could automatically identify the best
pictures.
Irfan Essa, a professor at the Georgia Institute of Technology,
heads the school's Interdisciplinary Research Center for Machine Learning. He
says that an ever-stronger connection among those areas "has grown into
more object-centric thinking." Computational photography moves beyond just
capturing pixels, he says, into capturing light, which allows it to extract the
geometry of a scene. "If you know where the object is and what surface
it's on, you can do more with it," he says.
This helps with depth, as noted above, but also with one of the
most common problems facing average smartphone owners: It's easier to take
photos than manage them. "We're just capturing too many pictures,"
Essa says. "I take pictures at the dinner table with my family and I end
up having 40 to 50 pictures." By better analyzing the contents of a scene,
photo software could automatically identify the best pictures.
Some third-party apps already do this, and Apple's burst mode in
its Camera app tries to detect the "best" pictures of a set taken in
fast succession. But these early stabs at the idea rely on a handful of cues
instead of full-blown recognition. As the photographic tech in smartphones gets
better, researchers will be able to take the idea further, Essa says.
Essa also expects to see
improvements in color matching, tone adjustment, and selfie correction. He
notes that despite the decades of work that Adobe and Kodak have put into
technologies to allow the same color to appear in the same way everywhere, it's
only recently that these ideas have hit the mass market. Apple's 9.7-inch iPad Pro, for
instance, introduced what Apple calls "True Tone," a sensor that
measures ambient light color and conditions and adjusts the display to provide
a consistent set of colors to the viewer, no matter the temperature of the
light in which they're using the tablet.
Better color management relies on better cameras as well as better
displays, and Essa says it will ultimately produce a pipeline that
computational photography will aid by integrating similar sampling technologies
into the image-creation chain. He notes that skin tone is an area where the
most improvement could come. "Most selfies look like crap, but they're
getting better," he says.
One of the pioneering
academics of computational photography, Marc Levoy, taught at Stanford,
inspired and advised the founders of Lytro, and released an early iPhone app
that created faux bokeh. He's now at Google, and deferred my
questions to the firm's press relations department, which didn't respond to a
request for an interview. This isn't unusual: Many researchers in this field
have founded or joined startups or become part of teams at computer companies
and dotcoms. That's a reminder that there's likely a fair amount more happening
behind the scenes at smartphone makers, some of which will find its way into
our hands.
GLENN FLEISHMAN
https://www.fastcompany.com/3067252/the-future-of-photography-is-about-computation
No comments:
Post a Comment