LAST MARCH, GOOGLE’S COMPUTERS
roundly beat the world-class Go champion
Lee Se
Deeper and
Cheaper Machine
Learning
dol, marking a milestone in artificial
intelligence. The winning computer program,
created by researchers at Google
DeepMind in London, used an artificial neural
network that took advantage of what’s
known as deep learning, a strategy by
which neural networks involving many layers
of processing are configured in an automated
fashion to solve the problem at hand. Unknown to the public
at the time was that
Google
had an ace up its
sleeve.
You see, the computers
Google
used to
defeat
Sedol contained
special-purpose
hardware—
a
computer card
Google
calls its Tensor
Processing
Unit.
Norm
Jouppi, a hardware
engineer
at Google,
announced
the existence
of
the Tensor Processing
Unit
two months after the
Go
match, explaining in
a
blog post that Google
had
been outfitting its
data
centers with these
new
accelerator cards for
more
than a year. Google
has
not shared exactly
what
is on these boards,
but
it’s clear that it represents
an
increasingly
popular
strategy to speed
up
deep-learning calculations:
using
an applicationspecific
integrated
circuit,
or
ASIC.
Another
tactic being
pursued
(primarily by
Microsoft)
is to use fieldprogrammable
gate
arrays
(FPGAs),
which provide
the
benefit of being reconfigurable
if
the computing
requirements
change. The
more
common approach,
though,
has been to use
graphics
processing units,
or
GPUs, which can perform
many
mathematical
operations
in parallel.
The
foremost proponent
of
this approach is GPU
maker
Nvidia.
Indeed,
advances in
GPUs
kick-started artificial
neural
networks back in
2009,
when researchers at Stanford
showed that such
hardware
made it possible
to
train deep neural
networks
in reasonable
amounts
of time.
“Everybody
is doing
deep
learning today,” says
William
Dally, who leads
the
Concurrent VLSI
Architecture
group at
Stanford
and is also chief
scientist
for Nvidia. And
for
that, he says, perhaps
not
surprisingly given his
position,
“GPUs are close
to
being as good as you
can
get.”
Dally
explains that there
are
three separate realms
to
consider. The first is
what
he calls “training
in
the data center.” He’s
referring
to the first step
for
any deep-learning system:
adjusting
perhaps
many
millions of connections
between
neurons so
that
the network can carry
out
its assigned task.
In
building hardware
for
that, a company called
Nervana
Systems, which
was
recently acquired by
Intel,
has been leading
the
charge. According to
Scott
Leishman, a computer
scientist
at Nervana,
the
Nervana Engine,
an
ASIC deep-learning
accelerator,
will go into
production
in early to
mid-2017.
Leishman notes
that
another computationally
intensive
task—
bitcoin
mining—went
from
being run on CPUs
to
GPUs to FPGAs and,
finally,
on ASICs because
of
the gains in power efficiency
from
such customization.
“I
see the same
thing
happening for deep
learning,”
he says.
A
second and quite distinct
job
for deep-learning
hardware,
explains
Dally,
is “inference at the
data
center.” The word
inference here
refers to
the
ongoing operation of
cloud-based
artificial neural
networks
that have
previously
been trained
to
carry out some job.
Every
day, Google’s neural
networks
are making
an
astronomical number
of
such inference calculations
to
categorize images,
translate
between languages,
and
recognize spoken
words,
for example.
Although
it’s hard to say
for
sure, Google’s Tensor
Processing
Unit is presumably
tailored
for performing
such
computations.
Training
and inference
often
take very different
skill
sets. Typically for
training,
the computer
must
be able to calculate
with
relatively high precision,
often
using 32-bit
floating-point
operations.
For
inference, precision
can
be sacrificed in
favor
of greater speed or
less
power consumption.
“This
is an active area of
research,”
says Leishman.
“How
low can you go?”
Although
Dally declines
to
divulge Nvidia’s specific
plans,
he points out that
the
company’s GPUs have
been
evolving. Nvidia’s
earlier
Maxwell architecture
could
perform
double-
(64-bit) and single-
(32-bit)
precision operations,
whereas
its current
Pascal
architecture adds
the
capability to do 16-bit
operations
at twice the
throughput
and efficiency
of
its single-precision calculations.
So
it’s easy to
imagine
that Nvidia will
eventually
be releasing
GPUs
able to perform 8-bit
operations,
which could
be
ideal for inference
calculations
done in the
cloud,
where power efficiency
is
critical to keeping
costs
down.
Dally
adds that “the final
leg
of the tripod for deep
learning
is inference in
embedded
devices,” such
as
smartphones, cameras,
and
tablets. For
those
applications, the key
will
be low-power ASICs.
Over
the coming year,
deep-learning
software
will
increasingly find its
way
into applications for
smartphones,
where it is
already
used, for example,
to
detect malware or
translate
text in images.
And
the drone manufacturer
DJI
is already
using
something akin to a
deep-learning
ASIC in its
Phantom
4 drone, which
uses
a special visualprocessing
chip
made by
California-based
Movidius
to
recognize obstructions.
(Movidius
is yet another
neural-network
company
recently
acquired by Intel.)
Qualcomm,
meanwhile,
built
special circuitry into
its
Snapdragon 820 processors
to
help carry out
deep-learning
calculations.
Although
there is plenty
of
incentive these days
to
design hardware to
accelerate
the operation
of
deep neural networks,
there’s
also a huge
risk:
If the state of the art
shifts
far enough, chips
designed
to run yesterday’s
neural
nets will be
outdated
by the time they
are
manufactured. “The
algorithms
are changing
at
an enormous rate,” says
Dally.
“Everybody who is
building
these things is
trying to cover their
bets.”
No comments:
Post a Comment