Technical
Explainer
Alamy
The
buzz
around
artificial
intelligence
has
created
a
rush
to
craft
useful
generative
AI
models
and
assistants.
Much
of
the
latest
work
centers
on
open
source
large
language
models
(LLMs).
However,
as
the
use
of
data
depicting
individuals
and
their
actions
in
these
models
grows
for
delivering
services
and
making
assessments,
technologists
are
increasingly
concerned
about
the
portrayal
of
inclusivity
within
the
technology.
Related:
Testing
for
AI
Bias:
What
Enterprises
Need
to
Know
One
model
that
is
addressing
these
concerns
is
Latimer,
a
large
language
model
that
has
been
designed
to
provide
cultural
information
related
to
African
American
and
Hispanic
cultures
and
to
address
bias
in
AI
models.
Its
model
development
combined
with
responsible
management
of
training
data
can
light
the
way
for
reducing
data
bias
in
GenAI
models.
Latimer
What
Is
Latimer?
Latimer
was
created
by
John
Pasmore
as
a
teaching
tool
to
help
people
understand
how
to
craft
better
prompts,
particularly
ones
that
involve
cultural
norms
and
history.
Related:
5
Ways
to
Prevent
AI
Bias
Latimer
is
named
after
Lewis
Latimer,
an
African
American
inventor
and
technologist
who
is
best
known
for
refining
the
carbon
filament
in
the
electric
light
bulb
and
later
becoming
a
chief
draftsman
at
Thomas
Edison's
lab.
Other
inventions
included
an
evaporative
air
conditioner
and
an
improved
toilet
system
for
railcars.
In
addition,
Latimer
used
his
patent
expertise
to
help
Alexander
Graham
Bell
file
a
patent
for
the
telephone.
Alamy
Figure
1:
Latimer
was
named
in
honor
of
African
American
inventor
Lewis
Latimer.
How
Latimer
Works
The
look
of
Latimer's
prompt
entry
page
is
similar
to
that
of
ChatGPT's.
But
the
difference
is
more
than
skin
deep.
While
Latimer
relies
on
Meta's
Llama-2
GPT,
it
differs
in
that
it
uses
unique
foundation
model
augmented
with
data
representing
historical
events,
oral
traditional
stories,
local
archives,
literature,
and
current
events
related
to
communities
of
color.
Figure
2:
Latimer
answers
the
prompt:
"Who
is
Lewis
Latimer?"
Another
unique
technical
aspect
is
Latimer's
retrieval-augmented
generation
(RAG)
model.
RAG
is
a
pipeline
data
model
designed
to
manage
a
set
of
documents
such
that
the
most
relevant
documents
are
matched
to
prompt
queries.
There
are
several
steps
to
the
RAG
model,
including
splitting
documents
from
data
sources
into
vector
databases,
then
comparing
data
and
documents
as
"chunks"
of
information
for
accuracy
and
recency.
The
collective
information
is
passed
to
the
LLM
to
derive
the
final
response.
RAG-based
LLMs
are
meant
to
improve
the
accuracy
of
finding
and
citing
information
for
highly
complex
queries
or
knowledge-intensive
tasks.
Many
organizations
crafting
their
LLMs
are
optimizing
them
with
a
variation
of
the
RAG
pipeline
model.
Latimer
used
its
RAG
model
to
reduce
bias
in
the
prompt
responses.
Latimer
enhances
its
technological
edge
by
leveraging
resources
dedicated
to
ensuring
data
accuracy.
The
effort
begins
with
the
development
team,
which
collaborates
with
prominent
cultural
scholar
Molefi
Kete
Asante,
a
distinguished
professor
of
African,
African
American,
and
communication
studies
at
Temple
University,
to
help
continuously
refine
the
model.
Latimer
also
has
an
exclusive
contract
to
use
licensed
content,
including
from
the
New
York
Amsterdam
News,
a
traditional
newspaper
that
covers
news
and
events
for
the
Black
community.
All
of
these
measures
have
been
implemented
to
ensure
Latimer
uses
high-quality
data
and
avoids
disseminating
inaccurate
cultural
information.
How
Addressing
AI
Bias
Can
Help
Combat
Systematic
Discrimination
Technology
experts
have
long
been
worried
about
the
role
of
AI
in
data
bias.
AI
models
have
an
inherent
potential
to
scale
data
bias
into
their
model
decisions
and
related
outcomes.
Artificial
intelligence
enables
programmed
devices
to
execute
tasks
that
once
required
human
intelligence
to
do.
Yet
AI
systems
are
susceptible
to
scaling
biases
because
they
can
tackle
tasks
quickly
while
having
their
insights
limited
to
training
data
that
could
be
missing
vital
information.
For
example,
the
ACLU
warns
that
AI
systems
being
used
to
evaluate
potential
tenants
can
inadvertently
perpetuate
housing
discrimination.
These
AI-based
decision
systems
rely
on
court
records
and
other
datasets
that
have
their
own
built-in
biases
that
reflect
systemic
racism,
sexism,
and
ableism,
datasets
that
are
notoriously
full
of
errors.
As
a
result,
people
are
being
denied
housing
because
they
are
deemed
ineligible
regardless
of
their
actual
capacity
to
afford
rent.
One
factor
contributing
to
the
perpetuation
of
biases
in
automated
decisions
is
how
the
input
data
is
associated.
This
can
be
particularly
tricky
with
LLMs.
Large
language
models
make
associations
based
on
how
they
are
trained,
allowing
the
prompt
to
emphasize
which
elements
are
associated
and
which
are
not.
For
example,
when
a
programming
language
reads
"1
+
1,"
it
uses
parameters
to
tell
the
computer
that
the
numerals
are
numbers,
not
text,
and
that
they
can
be
added
to
form
the
number
2.
In
contrast,
LLMs
assume
that
"1
+
1"
is
"2"
based
off
of
seeing
examples
in
the
prompts.
This
approach
is
called
chain-of-thought
prompting.
Chain
of
thought
is
a
prompting
technique
in
which
the
user
instructs
the
model
with
choices
for
exploring
wording,
tense,
and
mathematical
approach
of
a
given
prompt.
Thus,
LMMs
operate
as
a
data
layer
using
concepts
and
data
to
craft
an
outcome
or
condition,
rather
than
relying
on
syntax
in
how
it
interprets
information
for
an
outcome.
This
difference
is
also
why
processes
such
as
RAG
have
become
important
in
crafting
model
accuracy
beyond
prompt
engineering.
This
underscores
the
importance
of
ensuring
genuine
diverse
representation
in
AI
development,
necessitating
methodologies
that
go
beyond
mere
platitudes
and
concepts
to
establish
robust
safeguards
against
potentially
harmful
outcomes.
If
there
is
data
representing
activity
from
specific
communities,
it
is
vital
not
to
omit
that
data
from
model
training
datasets.
Such
omission
leads
to
discrimination
when
deciding
which
communities
receive
services
and
investment,
denying
opportunities
and
hindering
progress.
If
we
treat
data
the
same
way
as
the
number
in
the
addition
example,
it
becomes
clear
that
guardrails
for
LLM
assumptions
are
essential
for
applications
in
which
there
are
community
and
cultural
concerns.
Technologists
keen
on
AI
should
pay
attention
to
how
results
from
models
such
as
Latimer
influence
how
people
prompt
and
how
representative
the
outcome
from
a
model
can
be.
Technologists
should
also
pay
attention
to
research
being
done
on
such
GenAI
techniques
as
chain
of
thoughts
—
such
as
this
white
paper
from
the
Google
Brain
team
—
and
emerging
discoveries
in
how
RAGs
are
used
for
models.
At
this
stage,
it's
too
early
to
assess
the
results.
Currently,
Latimer
is
managing
a
gradual
rollout
of
expanded
access,
starting
with
limited
organizational
access
for
universities.
Miles
University,
a
historically
black
college
or
university
(HBCU),
is
the
first
to
give
its
students
access
to
the
model,
while
another
HBCU,
Morgan
State,
was
added
in
January.
A
waitlist
for
public
use
is
available
on
the
Latimer
website.
However,
because
of
significant
interest
and
investment,
Latimer
shows
promise
in
tackling
technological
bias
when
it
comes
to
social
and
educational
issues,
as
well
as
in
implementing
inclusive
data
techniques.
Comments