***
Scouring the Web to Make New Words ‘Lookupable’
By NATASHA SINGER | OCT. 3, 2015
A couple of weeks ago, two of
my New York Times colleagues chronicled digital culture trends that are so
newish and niche-y that conventional English dictionaries don’t yet include
words for either of them.
In an article on Sept. 20,
Stephanie Rosenbloom, a travel columnist, reviewed flight apps that try to
perfect “farecasting” — that is, she explained, the art of “predicting the best
date to buy a ticket” to obtain the lowest fares.
That same day Jenna Wortham,
a columnist for The Times Magazine, described a phenomenon she called “technomysticism,” in which Internet
users embrace medieval beliefs, spells and charms.
These word coinages may be
too fresh — and too little used for now — to be of immediate interest to major
English dictionaries. But Erin McKean, a lexicographer with an egalitarian
approach to language, thinks “madeupical” words such as these deserve to be
documented.
Ms. McKean started a campaign
last month on Kickstarter, the crowdfunding site, to unearth one million “missing” English words
— words that are not currently found in traditional dictionaries. To locate the
underdocumented expressions, she has engaged a pair of data scientists to
scrape and analyze language used in online publications. Ms. McKean said she
planned to incorporate the found words in Wordnik.com, an online dictionary of which she is a
co-founder.
“We really believe that every
word should be lookupable,” Ms. McKean told me recently. “That doesn’t mean
that every word should be used in every situation. But we think that people by
and large are entirely capable of making that decision for themselves.”
Before her analytics project
gets underway next month, Ms. McKean is crowdsourcing a list of missing words for
possible inclusion in Wordnik. Candidates so far include: procrastatweeting,
dronevertising and roomnesia, a condition in which people forget why they
walked into a room.
Ms. McKean, who is a former
editor of the New Oxford American Dictionary, and two colleagues introduced the
Wordnik site in 2009 with the aim of addressing some limitations they had
encountered while working for dictionary publishers.
Traditional print
dictionaries employ lexicographers to track and assess words, selecting the
worthiest candidates to be included in published editions. But printed lexicons
naturally have limited space. And with only periodic updates, they are not
intended to keep pace with contemporary spoken language.
In a recent quarterly online
update, the Oxford English Dictionary added the word “hoverboard” — 26 years
after the floating skateboards were first mentioned in the movie “Back to the
Future II.” An editor’s note explained that the O.E.D. had
decided to add “hoverboard” now because the dictionary’s word-monitoring system
had recently detected an increased use of the term, most likely, the note says,
related to a 2015 date that is an important plot element in the film.
(It doesn’t always take
decades to document a new word. The O.E.D. added “podcast” in 2008 just four
years after it says the word emerged.)
With no space limitations or
publication deadlines, Wordnik is able to incorporate a vast number of new
words on a continuing basis. In addition to human contributors, the site uses
automated online searches to locate sentences that contain certain words on
blogs, social media, news and other sites.
When a person looks up a term
on Wordnik, the site displays full-sentence examples of its usage, taken from
sources like The Huffington Post and Boing Boing. If the word already has an
entry in certain more traditional dictionaries, the site also provides that
definition.
Ms. McKean said Wordnik had
accumulated some information on eight million words, both old and new. Its
inclusive approach makes the site more of a word welcomer than a winnower.
“The question is no longer, ‘Is this a good
word?’ ” Ms. McKean said. “The question is: ‘What is this word good for? Is
this word good for what I need?’ ”
She now plans to expand
Wordnik’s word-acquisition system by turning to data analytics to pinpoint
emerging terms, like farecasting, that writers explained in passing when they
mentioned them. Ms. McKean refers to these readily available explanations as
“free-range definitions.” They are easy to locate, she said, because writers
often use stock phrases, like “also known as” or “scientists term this” to
signal to their readers that they’re about to introduce a new or unfamiliar
term.
To cast a wider net for her
project, Ms. McKean has enlisted Summer.ai, a data analytics firm. The company
plans to use computational techniques to analyze online publications for
language structure and patterns — like quotation marks and dashes — that are
likely to indicate new words accompanied by self-contained definitions.
Some lexicographers already
track whether words are nearing the end of their useful life spans. But Manuel
Ebert, a former neuroscientist who is the co-founder of Summer.ai, said the Wordnik research
might help track the speed of new-word adoption.
“We can actually measure when
words get adopted in mainstream lingo,” he said, by looking at when writers
stop explaining neologisms like “infotainment” and start using them as if their
meanings were commonly understood. “It will be interesting to see which words
will very quickly get adopted and which words remain outsiders.”
Researchers like Paul Cook, an assistant
professor of computer science at the University
of New Brunswick in Canada, are
using similar techniques to find other kinds of novel words.
Mr. Cook developed a program
several years ago to analyze posts on Twitter that included new lexical blends
— like “jeggings,” a combination of jeans and leggings — and their definitions.
Among other portmanteau words, his Twitter research
turned up “awksome” (awkward plus awesome) and “hilazing” (hilarious plus
amazing). He hopes eventually to use his program to generate a blended-word
lexicon.
“We could have some sort of
automatically generated blend dictionary,” Mr. Cook said. “If you had
information like this, some dictionaries might be interested in providing this
kind of information, as opposed to none.”
This more-words-the-merrier
approach is one that lexicographers like Ms. McKean favor.
“Every new word added to the
expressiveness of English adds to the things that it’s possible to say,” she
says. “English already has one of the world’s largest installed user bases. So
why wouldn’t we want to add to it?”
***
No comments:
Post a Comment