Deeper and Cheaper Machine Learning




LAST MARCH, GOOGLE’S COMPUTERS
roundly beat the world-class Go champion
Lee Se

Deeper and
Cheaper Machine
Learning

dol, marking a milestone in artificial
intelligence. The winning computer program,
created by researchers at Google
DeepMind in London, used an artificial neural
network that took advantage of what’s
known as deep learning, a strategy by
which neural networks involving many layers
of processing are configured in an automated
fashion to solve the problem at hand. Unknown to the public
at the time was that
Google had an ace up its
sleeve. You see, the computers
Google used to
defeat Sedol contained
special-purpose hardware—
a computer card
Google calls its Tensor
Processing Unit.
Norm Jouppi, a hardware
engineer at Google,
announced the existence
of the Tensor Processing
Unit two months after the
Go match, explaining in
a blog post that Google
had been outfitting its
data centers with these
new accelerator cards for
more than a year. Google
has not shared exactly
what is on these boards,
but it’s clear that it represents
an increasingly
popular strategy to speed
up deep-learning calculations:
using an applicationspecific
integrated circuit,
or ASIC.
Another tactic being
pursued (primarily by
Microsoft) is to use fieldprogrammable
gate arrays
(FPGAs), which provide
the benefit of being reconfigurable
if the computing
requirements change. The
more common approach,
though, has been to use
graphics processing units,
or GPUs, which can perform
many mathematical
operations in parallel.
The foremost proponent
of this approach is GPU
maker Nvidia.
Indeed, advances in
GPUs kick-started artificial
neural networks back in
2009, when researchers at   Stanford showed that such
hardware made it possible
to train deep neural
networks in reasonable
amounts of time.
“Everybody is doing
deep learning today,” says
William Dally, who leads
the Concurrent VLSI
Architecture group at
Stanford and is also chief
scientist for Nvidia. And
for that, he says, perhaps
not surprisingly given his
position, “GPUs are close
to being as good as you
can get.”
Dally explains that there
are three separate realms
to consider. The first is
what he calls “training
in the data center.” He’s
referring to the first step
for any deep-learning system:
adjusting perhaps
many millions of connections
between neurons so
that the network can carry
out its assigned task.
In building hardware
for that, a company called
Nervana Systems, which
was recently acquired by
Intel, has been leading
the charge. According to
Scott Leishman, a computer
scientist at Nervana,
the Nervana Engine,
an ASIC deep-learning
accelerator, will go into
production in early to
mid-2017. Leishman notes
that another computationally
intensive task—
bitcoin mining—went
from being run on CPUs
to GPUs to FPGAs and,
finally, on ASICs because
of the gains in power efficiency
from such customization.
“I see the same
thing happening for deep
learning,” he says.
A second and quite distinct
job for deep-learning
hardware, explains
Dally, is “inference at the
data center.” The word
inference here refers to
the ongoing operation of
cloud-based artificial neural
networks that have
previously been trained
to carry out some job.
Every day, Google’s neural
networks are making
an astronomical number
of such inference calculations
to categorize images,
translate between languages,
and recognize spoken
words, for example.
Although it’s hard to say
for sure, Google’s Tensor
Processing Unit is presumably
tailored for performing
such computations.
Training and inference
often take very different
skill sets. Typically for
training, the computer
must be able to calculate
with relatively high precision,
often using 32-bit
floating-point operations.
For inference, precision
can be sacrificed in
favor of greater speed or
less power consumption.
“This is an active area of
research,” says Leishman.
“How low can you go?”
Although Dally declines
to divulge Nvidia’s specific
plans, he points out that
the company’s GPUs have
been evolving. Nvidia’s
earlier Maxwell architecture
could perform
double- (64-bit) and single-
(32-bit) precision operations,
whereas its current
Pascal architecture adds
the capability to do 16-bit
operations at twice the
throughput and efficiency
of its single-precision calculations.
So it’s easy to
imagine that Nvidia will
eventually be releasing
GPUs able to perform 8-bit
operations, which could
be ideal for inference
calculations done in the
cloud, where power efficiency
is critical to keeping
costs down.
Dally adds that “the final
leg of the tripod for deep
learning is inference in
embedded devices,” such
as smartphones, cameras,
and tablets. For
those applications, the key
will be low-power ASICs.
Over the coming year,
deep-learning software
will increasingly find its
way into applications for
smartphones, where it is
already used, for example,
to detect malware or
translate text in images.
And the drone manufacturer
DJI is already
using something akin to a
deep-learning ASIC in its
Phantom 4 drone, which
uses a special visualprocessing
chip made by
California-based Movidius
to recognize obstructions.
(Movidius is yet another
neural-network company
recently acquired by Intel.)
Qualcomm, meanwhile,
built special circuitry into
its Snapdragon 820 processors
to help carry out
deep-learning calculations.
Although there is plenty
of incentive these days
to design hardware to
accelerate the operation
of deep neural networks,
there’s also a huge
risk: If the state of the art
shifts far enough, chips
designed to run yesterday’s
neural nets will be
outdated by the time they
are manufactured. “The
algorithms are changing
at an enormous rate,” says
Dally. “Everybody who is
building these things is
trying to cover their bets.”

No comments:

Post a Comment

കേരള ബ്ലാസ്‌റ്റേഴ്‌സ് രണ്ടു ടീമാകുന്നു

കേരള ബ്ലാസ്‌റ്റേഴ്‌സ് രണ്ടു ടീമാകുന്നു ഐഎസ്എല്ലില്‍ മലയാളികളുടെ സ്വന്തം ക്ലബ് കേരള ബ്ലാസറ്റേഴ്‌സ് രണ്ട് ടീമാകാന്‍ ഒരുങ്ങുന്നു. പ്രധാന ട...