Waymo
has
long
touted
its
ties
to
Google’s
DeepMind
and
its
decades
of
AI
research
as
a
strategic
advantage
over
its
rivals
in
the
autonomous
driving
space.
Now,
the
Alphabet-owned
company
is
taking
it
a
step
further
by
developing
a
new
training
model
for
its
robotaxis
built
on
Google’s
multimodal
large
language
model
(MLLM)
Gemini.
Waymo
released
a
new
research
paper
today
that
introduces
an
“End-to-End
Multimodal
Model
for
Autonomous
Driving,”
also
known
as
EMMA.
This
new
end-to-end
training
model
processes
sensor
data
to
generate
“future
trajectories
for
autonomous
vehicles,”
helping
Waymo’s
driverless
vehicles
make
decisions
about
where
to
go
and
how
to
avoid
obstacles.
But
more
importantly,
this
is
one
of
the
first
indications
that
the
leader
in
autonomous
driving
has
designs
to
use
MLLMs
in
its
operations.
And
it’s
a
sign
that
these
LLMs
could
break
free
of
their
current
use
as
chatbots,
email
organizers,
and
image
generators
and
find
application
in
an
entirely
new
environment
on
the
road.
In
its
research
paper,
Waymo
is
proposing
“to
develop
an
autonomous
driving
system
in
which
the
MLLM
is
a
first
class
citizen.”
End-to-End
Multimodal
Model
for
Autonomous
Driving,
also
known
as
EMMA
The
paper
outlines
how,
historically,
autonomous
driving
systems
have
developed
specific
“modules”
for
the
various
functions,
including
perception,
mapping,
prediction,
and
planning.
This
approach
has
proven
useful
for
many
years
but
has
problems
scaling
“due
to
the
accumulated
errors
among
modules
and
limited
inter-module
communication.”
Moreover,
these
modules
could
struggle
to
respond
to
“novel
environments”
because,
by
nature,
they
are
“pre-defined,”
which
can
make
it
hard
to
adapt.
Waymo
says
that
MLLMs
like
Gemini
present
an
interesting
solution
to
some
of
these
challenges
for
two
reasons:
the
chat
is
a
“generalist”
trained
on
vast
sets
of
scraped
data
from
the
internet
“that
provide
rich
‘world
knowledge’
beyond
what
is
contained
in
common
driving
logs”;
and
they
demonstrate
“superior”
reasoning
capabilities
through
techniques
like
“chain-of-thought
reasoning,”
which
mimics
human
reasoning
by
breaking
down
complex
tasks
into
a
series
of
logical
steps.
Waymo’s
EMMA
model.Screenshot:
Waymo
Waymo
developed
EMMA
as
a
tool
to
help
its
robotaxis
navigate
complex
environments.
The
company
identified
several
situations
in
which
the
model
helped
its
driverless
cars
find
the
right
route,
including
encountering
various
animals
or
construction
in
the
road.
Other
companies,
like
Tesla,
have
spoken
extensively
about
developing
end-to-end
models
for
their
autonomous
cars.
Elon
Musk
claims
that
the
latest
version
of
its
Full
Self-Driving
system
(12.5.5)
uses
an
“end-to-end
neural
nets”
AI
system
that
translates
camera
images
into
driving
decisions.
This
is
a
clear
indication
that
Waymo,
which
has
a
lead
on
Tesla
in
deploying
real
driverless
vehicles
on
the
road,
is
also
interested
in
pursuing
an
end-to-end
system.
The
company
said
that
its
EMMA
model
excelled
at
trajectory
prediction,
object
detection,
and
road
graph
understanding.
“This
suggests
a
promising
avenue
of
future
research,
where
even
more
core
autonomous
driving
tasks
could
be
combined
in
a
similar,
scaled-up
setup,”
the
company
said
in
a
blog
post
today.
But
EMMA
also
has
its
limitations,
and
Waymo
acknowledges
that
there
will
need
to
be
future
research
before
the
model
is
put
into
practice.
For
example,
EMMA
couldn’t
incorporate
3D
sensor
inputs
from
lidar
or
radar,
which
Waymo
said
was
“computationally
expensive.”
And
it
could
only
process
a
small
amount
of
image
frames
at
a
time.
There
are
also
risks
to
using
MLLMs
to
train
robotaxis
that
go
unmentioned
in
the
research
paper.
Chatbots
like
Gemini
often
hallucinate
or
fail
at
simple
tasks
like
reading
clocks
or
counting
objects.
Waymo
has
very
little
margin
for
error
when
its
autonomous
vehicles
are
traveling
40mph
down
a
busy
road.
More
research
will
be
needed
before
these
models
can
be
deployed
at
scale
—
and
Waymo
is
clear
about
that.
“We
hope
that
our
results
will
inspire
further
research
to
mitigate
these
issues,”
the
company’s
research
team
writes,
“and
to
further
evolve
the
state
of
the
art
in
autonomous
driving
model
architectures.”
Original author: Andrew J. Hawkins
Comments