AI
tools
behaving
badly
—
like
Microsoft’s
Bing
AI
losing
track
of
which
year
it
is
—
has
become
a
subgenre
of
reporting
on
AI.
But
very
often,
it’s
hard
to
tell
the
difference
between
a
bug
and
poor
construction
of
the
underlying
AI
model
that
analyzes
incoming
data
and
predicts
what
an
acceptable
response
will
be,
like
Google’s
Gemini
image
generator
drawing
diverse
Nazis
due
to
a
filter
setting.
Now,
OpenAI
is
releasing
the
first
draft
of
a
proposed
framework,
called
Model
Spec,
that
would
shape
how
AI
tools
like
its
own
GPT-4
model
respond
in
the
future. The
OpenAI
approach
proposes
three
general
principles
—
that
AI
models
should
assist
the
developer
and
end-user
with
helpful
responses
that
follow
instructions,
benefit
humanity
with
consideration
of
potential
benefits
and
harms,
and
reflect
well
on
OpenAI
with
respect
to
social
norms
and
laws.
It
also
includes
several
rules:
OpenAI
says
the
idea
is
to
also
let
companies
and
users
“toggle”
how
“spicy”
AI
models
could
get.
One
example
the
company
points
to
is
with
NSFW
content,
where
the
company
says
it
is
“exploring
whether
we
can
responsibly
provide
the
ability
to
generate
NSFW
content
in
age-appropriate
contexts
through
the
API
and
ChatGPT.”
A
section
of
the
Model
Spec
relatingto
how
an
AI
assistant
should
deal
with
infomation
hazards.Screenshot:
OpenaI
Joanne
Jang,
product
manager
at
OpenAI,
explains
that
the
idea
is
to
get
public
input
to
help
direct
how
AI
models
should
behave
and
says
that
this
framework
would
help
draw
a
clearer
line
between
what
is
intentional
and
a
bug.
Among
the
default
behaviors
OpenAI
proposes
for
the
model
are
to
assume
the
best
intentions
from
the
user
or
developer,
ask
clarifying
questions,
don’t
overstep,
take
an
objective
point
of
view,
discourage
hate,
don’t
try
to
change
anyone’s
mind,
and
express
uncertainty.
“We
think
we
can
bring
building
blocks
for
people
to
have
more
nuanced
conversations
about
models,
and
ask
questions
like
if
models
should
follow
the
law,
whose
law?”
Jang
tells
The
Verge.
“I
am
hoping
we
can
decouple
discussions
on
whether
or
not
something
is
a
bug
or
a
response
was
a
principle
people
don’t
agree
on
because
that
would
make
conversations
of
what
we
should
be
bringing
to
the
policy
team
easier.”
Model
Spec
will
not
immediately
impact
OpenAI’s
currently
released
models,
like
GPT-4
or
DALL-E
3,
which
continue
to
operate
under
their
existing
usage
policies.
Jang
calls
model
behavior
a
“nascent
science”
and
says
Model
Spec
is
intended
as
a
living
document
that
could
be
updated
often.
For
now,
OpenAI
will
be
waiting
for
feedback
from
the
public
and
the
different
stakeholders
(including
“policymakers,
trusted
institutions,
and
domain
experts”)
that
use
its
models,
although
Jang
did
not
give
a
timeframe
for
the
release
of
a
second
draft
of
Model
Spec.
OpenAI
did
not
say
how
much
of
the
public’s
feedback
may
be
adopted
or
exactly
who
will
determine
what
needs
to
be
changed.
Ultimately,
the
company
has
the
final
say
on
how
its
models
will
behave
and
said
in
a
post
that
“We
hope
this
will
provide
us
with
early
insights
as
we
develop
a
robust
process
for
gathering
and
incorporating
feedback
to
ensure
we
are
responsibly
building
towards
our
mission.”
Original author: Emilia David
Comments