A
few
months
ago,
my
doctor
showed
off
an
AI
transcription
tool
he
used
to
record
and
summarize
his
patient
meetings.
In
my
case,
the
summary
was
fine,
but
researchers
cited
by
ABC
News
have
found
that’s
not
always
the
case
with
OpenAI’s
Whisper,
which
powers
a
tool
many
hospitals
use
—
sometimes
it
just
makes
things
up
entirely.
Whisper
is
used
by
a
company
called
Nabla
for
a
medical
transcription
tool
that
it
estimates
has
transcribed
7
million
medical
conversations,
according
to
ABC
News.
More
than
30,000
clinicians
and
40
health
systems
use
it,
the
outlet
writes.
Nabla
is
reportedly
aware
that
Whisper
can
hallucinate,
and
is
“addressing
the
problem.”
A
group
of
researchers
from
Cornell
University,
the
University
of
Washington,
and
others
found
in
a
study
that
Whisper
hallucinated
in
about
1
percent
of
transcriptions,
making
up
entire
sentences
with
sometimes
violent
sentiments
or
nonsensical
phrases
during
silences
in
recordings.
The
researchers,
who
gathered
audio
samples
from
TalkBank’s
AphasiaBank
as
part
of
the
study,
note
silence
is
particularly
common
when
someone
with
a
language
disorder
called
aphasia
is
speaking.
One
of
the
researchers,
Allison
Koenecke
of
Cornel
University,
posted
examples
like
the
one
below
in
a
thread
about
the
study.
The
researchers
found
that
hallucinations
also
included
invented
medical
conditions
or
phrases
you
might
expect
from
a
YouTube
video,
such
as
“Thank
you
for
watching!”
(OpenAI
reportedly
used
to
transcribe
over
a
million
hours
of
YouTube
videos
to
train
GPT-4.)
The
study
was
presented
in
June
at
the
Association
for
Computing
Machinery
FAccT
conference
in
Brazil.
It’s
not
clear
if
it
has
been
peer-reviewed.
OpenAI
spokesperson
Taya
Christianson
emailed
a
statement
to
The
Verge:
We
take
this
issue
seriously
and
are
continually
working
to
improve,
including
reducing
hallucinations.
For
Whisper
use
on
our
API
platform,
our usage
policies prohibit
use
in
certain
high-stakes
decision-making
contexts,
and
our model
card for
open-source
use
includes
recommendations
against
use
in
high-risk
domains.
We
thank
researchers
for
sharing
their
findings.
Original author: Wes Davis
Comments