Tags: A BL CF D DEF E EQ ET ETN HG HL HO IL IPR L LB LF LQ PQP PS PSA Q QP RX S0 S1 S2 S3 S4 S5 S6 S7 S8 SE SF SN SQ ST su T VD VF VL W XDAT XIL XL XR #
OED documents are structures that allow you to restrict your search to specific dictionary components, such as etymologies, definitions, or quotations. In the database, each such component is defined by descriptive tags which delimit its text. For example, etymologies are preceded by a "begin" tag "<ET>" and followed by a matching "end" tag "</ET>". The structures available for searching are listed in two alphabetical sequences, with the more commonly used documents in the first sequence. A highly simplified outline indicates prototypical organization of an entry.
Remember that, in searching the OED, PAT considers all letters and symbols, including tags and spaces as characters. PAT also regards any query as a prefix and locates occurrences that begin with the characters you type. These factors, combined with the fact that PAT interprets the left angle bracket of a tag as a character, have implications if you wish to locate exactly what you type, and nothing else, within a document structure. For instance, you may wish to restrict an "Author" search to "Blake" by using "<A>Blake</A>", avoiding matches to "F.R. Blake" and "O. Blakeston".
For additional information, see:
Note the effect of name variations and tags (<A>...</A>) on searches.
Querying "W. Scott" within the author structure will locate not
only that form, but "Sir W. Scott" and an "E.W. Scott". Further,
since PAT considers "Scott" as a prefix, it matches "W. Scott-Taggart".
Hitting the space bar after "Scott" (the usual way of specifying a
word ending) results in no matches because authors' surnames are
followed by the left angle bracket of the "end" tag which PAT sees
as a character; thus one could specify "W. Scott</A>". Similarly,
to exclude matches such as "E.W. Scott", the "begin" tag <A> would
also be needed.
Bold Sub-Headword <BL>
One of two types of subordinate headword
included within entries; so called because they appear in bold type
in the printed text. These word forms are commonly either derivatives
(typically formed by adding a suffix to the headword), or combinations
(separate, hyphenated or single words that combine the headword,
usually as the first element, with another existing word). Note,
however, that many derivatives and combinations have entries of their
own because they have developed meanings and histories distinct from
their main word. Bold Sub-Headwords are usually defined and illustrated
by quotations, sometimes grouped within psuedo quotation paragraphs.
(See Subentry and compare with Italic Sub-Headword).
IMPORTANT: Note that three forms are included in the Bold Sub-Headword
structure: the Lookup Form <LF>, the Stressed Form <SF>, and the Murray
Form (the stressed form used in the first edition of the OED, but not
included in the printed second edition, nor as a document structure).
Date <D>
Normally the first element in an illustrative quotation.
The date given is usually the year in which the cited work was
first published, although there are some discrepancies, especially
in the dating of texts prepared for the first edition of the OED.
Where precise dates could not be established, the date may be qualified
by "C." ("c" in the printed text meaning "circa" or "about") or "A."
("a" in the printed text meaning "ante" or "before"), or by replacing
the last one or two digits with dots, e.g., 17.. Date of composition
is usual for letters, journals, and diaries, while lectures and speeches
are assigned the date of their first appearance in print. Although most
quotations specify some form of date, there are a few exceptions, the
most notable being the many quotations from the Old English epic poem
"Beowulf".
Definition <DEF>
Generally, a statement explaining the meaning of a
headword, sense, or sub-sense, although definitions in the OED take
several other forms, including cross-references to another sense within
the same entry or within another entry. In addition, a definition may
simply describe the way the word functions in some grammatical or
syntactical context. Definitions should always be read in conjunction
with supporting quotations, since in a historical dictionary, the
latter play an important role in establishing meaning and context.
In fact, in some cases, the actual explanation of the meaning of a
sense is contained in a quotation (see Cross-Reference Date).
Note, however, that this structural tagging was inserted automatically
at Waterloo and not confirmed by OED editors, so its utility remains
somewhat controversial.
Earliest Quotation <EQ>
The quotation having a date which is chronologically
the first in an OED entry. While this facility can be useful, many
words have multiple senses and sub-senses, either in current use or in
their historical development. The earliest chronological date in the
entry must therefore be viewed in the context of the sense which it
supports.
Note also that this structural tagging was inserted automatically
at Waterloo and not confirmed by OED editors, so its utility remains
somewhat controversial.
Entry <E>
Entries are the major structural components of most
modern dictionaries. In the printed OED, entries are arranged
alphabetically by their headwords (the "subject" of the entry)
which appear in dark bold type. There are two types of entries in
the OED: main entries and cross-reference entries. Main entries
contain comprehensive information about the history and meaning of
"main form" headwords. The primary function of cross-reference
entries is to direct the user from an obsolete or variant spelling
of a word to its relevant main entry (see also Status). Specifying
Entry as the document you wish to search for a word or phrase, or
as a match point for "combining and comparing" two or more sets,
means, therefore that PAT
searches the entire Dictionary and identifies in which entries your
results are located.
Etymology <ET>
Etymologies trace the origin or derivation of
headwords and are enclosed in square brackets in the printed text,
normally following a variant form list, if included. Since the
the OED was conceived as a history of the English language, the
original policy was to trace non-native words to the foreign word
or word element from which they were immediately adopted or formed,
and native words to their earliest English form. In practice, however,
OED etymologies sometimes exceed these guidelines.
Some etymologies include as their final element a paragraph in small
print, tagged as "<note>" in the database. These are referred to as
"etymological notes" by OED editors and include supplementary comments
or information of an unsubstantiated nature such as "folk" or popular
theories. ("<note>" tags are also used to identify various editorial
comments in small print in other entry elements.) Etymologies are
sometimes attached to individual senses or sub-senses
(see Sub-Etymology).
Headword Group <HG>
Defines the initial group of elements in an entry
and includes headword, pronunciation,
part of speech, and homonym number.
Note that, with the exception of the headword, not all of these
elements necessarily appear in every entry.
Italic Sub-Headword <IL>
One of two types of subordinate headwords
which are included within entries, and so called because they appear
in heavy italics in the printed text. This category consists primarily
of minor combinations (separate, hyphenated or single words that combine
the headword of the entry, usually as their first element, with another
word form, but which do not require definition since their meaning is
obvious), although it may also include phrases and idioms. Groups of
combinations are usually listed alphabetically within one or more senses
and are followed by a pseudo quotation paragraph, containing quotations
illustrating their use in the same order.
(Compare with Bold Sub-Headword.)
IMPORTANT: Note that three forms are included within the Italic Sub-Headword
document structure: the Lookup Form <LF>,
the Stressed Form <SF>, and the Murray Form (the stressed form used in the first edition
of the OED, but not included in the printed second edition, nor as a
document).
It is important to note that subject labels in particular are not
consistently used and their specificity may vary, often because of
historical change. For instance, the label "Natural History" (Nat.
Hist.) is found in a number of older entries. Since this discipline
has been largely superseded and sub-divided, labelling of more
current entries reflects these changes.
(For definitions and explanations of terms used in OED labels,
see D.L. Berg, 1993, and for an example of a search for words used
in a particular subject field, see D.L. Berg, 1989.)
Label <LB>
In the printed OED, labels are italicized designations,
usually abbreviated, which inform Dictionary readers of the boundaries
within which a word or sense is, or was, used. In current OED
terminology, there are five categories of labels: status (obsolete,
rare, colloquial, etc.); regional (indicating a geographical area
of usage, such as the U.S.); grammatical (describing the syntactical
role of the word or sense, such as plural or collective); semantic
(indicating the interpretation given to a word or sense in a
particular context, such as figurative, transferred, specific, etc.);
and subject (specifying the discipline, profession, trade, etc. in
in which a word or sense is used).
Language <L>
This structure contains language references in
etymologies and sub-etymologies. OED lexicographers identified
over 1,000 different language forms (including abbreviations and regional
variations) used in these contexts. While the structure is of
considerable assistance in extracting languages that played a part
in the origin or history of a word, care must be exercised in using this
facility to identify the language from which a word passed directly
into English (for examples of problems and techniques associated
with such searches, see D.L. Berg, 1989.) Also, some further
identification refinement is necessary since automatic tagging of
forms includes instances where language names appear attributively
as adjectives specifying nationality, e.g., Italian wine-makers.
Note that language forms are usually abbreviated, not always
consistently, and full forms can be found in the "List of Abbreviations" which appears at the front of each Dictionary volume.
Latest Quotation <LQ>
This term refers to the quotation in an entry
for an obsolete word which exemplifies the last located use of the
form. In other words, the criterion used for the category is the
chronologically most recent date in entries preceded by a "dagger"
status symbol indicating that the headword is an obsolete form
(see Status).
Lookup Form <LF>
This structure includes most word forms defined
in the OED, and includes Headwords <HL>,
Bold Sub-Headwords <BL>
and Italic Sub-Headwords <IL>. As the document name suggests these
forms are given in the way most users would look them up, that is,
without diacritics, stress marks, etc. (see Stressed Form). However,
there are some combinations included within entries that may not be
located because the OED sometimes lists minor forms in a style similar
to the following example from the entry for "orange": "orange-bloom,
-grove, -juice, kernel, leaf, -pip..". A computer program inserted the
first element in front of (or, in some cases, following) hyphens. Thus,
"orange-grove" and "orange-juice" will be located, but further refinement
of the program is needed in order to find unhyphenated minor combinations
such as "orange kernel" and "orange leaf". These combinations can often
be located by searching quotation texts.
Part of Speech <PS>
A grammatical category (verb, adjective,
adverb, etc.). In print, in the case of headwords, the part of
speech normally appears in abbreviated form following the
pronunciation. A part-of-speech identification may also be used to
describe a sense or subordinate headword (see Subentry). Where
no part of speech is included, the form may be assumed to be a
noun in most cases. Note that in all instances, the OED employs the
term "substantive" (abbreviated "sb.") instead of "noun", in keeping
with the tradition in early grammars of distinguishing between a
"noun substantive" and a "noun adjective". In general, the term
"sb." is only applied when it is necessary to differentiate a noun
entry from an entry for a word of the same spelling, but with a
different part of speech, or sometimes in instances where there are
several noun homonyms, in which case a homonym number is added.
The more usual convention for noun homonyms is to add the number
to the headword itself.
Pronunciation <IPR>
The second edition of the OED employs the
International Phonetic Alphabet for transcribing pronunciation, in
contrast to the first edition which used a system invented by its
primary editor, James Murray (still shown in the database and tagged
<MPR>). In print, pronunciation, when given, appears in brackets
immediately following the headword. The Dictionary gives the
pronunciation of most current, "main" headwords, with the exception
of some derivatives and combinations, and some single-syllable words,
where pronunciation is self-evident. Stress-marks, indicating
emphasis, are sometimes included for these exceptions as well as for
obsolete words for which pronunciation is not normally supplied.
(See also Stressed Form.)
Pronunciation is, in most cases, in accordance with standard
southern British speech, although alternative British or non-British usages may sometimes be included. A special parallels
symbol precedes some foreign pronunciation alternatives (see
Status).
Pseudonym <PSA>
Where the author of a quotation used an assumed
or pen name, he or she is usually cited by the pseudonym which
appears in print in the OED within single quotation marks. The
latter are eliminated in the case of certain well-know pseudonymous
authors such as George Eliot. For authors who have used both their
real names and one or more pseudonyms, the name under which the
particular cited work was published is normally given.
Pseudo Quotation Paragraph <PQP>
Identifies paragraphs of
quotations that illustrate a number of word forms, rather than a
single word or sense. These forms are usually Bold Sub-Headwords
or Italic Sub-Headwords included within entries and often listed in alphabetical sequence within a single sense, e.g., "television announcer,
audience, broadcast, commercial, crew, critic, discussion ..".
The accompanying so-called "pseudo" quotation paragraph usually
organizes citations in the same order. As an aid to readers, an
asterisk often precedes the initial, i.e., chronologically first,
quotation in each grouping.
Quotation <Q>
The second edition of the OED contains nearly two
and a half million quotations which perform the important function of
illustrating the use, form, history, and meaning of word forms in a
given sense. Normally quotations pertaining to a particular sense
are organized in a quotation paragraph in chronological order by
date of publication or composition. Citations typically include the
include the following elements: date <D>;
author <A>; work <W>
(i.e., title), with the location within the work, such as chapter, page,
act, scene, etc.; and the quotation text <T>. Quotations are drawn
from all forms of written and published works, including books,
manuscripts, journals, newspapers, letters, and diaries, and represent
both literary and popular sources.
The policy of the first edition, which dealt with most of the "core" words in the English language, was to include at least one example of use per century. This ratio, however, was increased considerably for entries added in the 1972-86 Supplement and the second edition.
Occasionally, in entries compiled for the first edition, no
examples of contemporary usage could be found and illustrations
were "made up". Such quotations are introduced by the abbreviation
"Mod." (for "modern") and usually appear without a date. (See also
Subsidiary Quotation.)
Quotation Paragraph <QP>
Definitions of words and senses are
generally followed by a paragraph in smaller print which lists
illustrative quotations in chronological sequence (earliest date
first). Occasionally, when a sense covers both the literal and
figurative use of a word, more than one quotation paragraph is
used. (For an exception to these conventions, see Pseudo Quotation
Paragraph.)
Quotation Text <T>
This structure contains the actual phrase or
passage extracted from the text, as compared to the full citations
included in the Quotation document structure, of all the Dictionary's
illustrative quotations. The texts are printed and spelled as they
appear in the source edition used. Occasionally, a portion of a
quotation text is eliminated and the omission is indicated by two
dots (..), or three (...), if the elision includes a period. Sometimes
an explanatory word may be inserted in square brackets, and the insert
may be preceded by the abbreviation "sc." for "scilicet", meaning
"understand" or "supply". In instances where the text quoted is a
song title, advertisement or other unusual source, this information
is usually given in brackets.
Status <ST>
Status tags enclose several types of symbols that
usually precede a headword or sense and indicate the form's
status in the language. These include the dagger symbol which identifies
an obsolete entry or sense (also usually further identified by a label
"Obs." following the headword); parallels signifying non-naturalized
words or pronunciations; and the so-called "catachrestic" symbol (a reversed
paragraph symbol) identifying a confused or erroneous sense. Within the
<ST> tag field, these symbols are interpreted by the abbreviations "obs"
(for "obsolete"), "ali" (for "alien), and "err" (for "erroneous").
In addition, status tagging identifies two types of entries:
1. the numerous cross-reference entries, the headwords of which represent obsolete or variant spellings of main words, and which refer the user from these forms to the relevant "main" entry. These are identified by the abbreviation "xref".
2. a small number (387) of "spurious" entries which are entirely enclosed
in square brackets. All of these entries were compiled for the first
edition and consist of words that are erroneous, false, or could not
be authenticated. Their purpose was primarily to correct errors found
in earlier dictionaries resulting from copyists' or translators' errors,
misprints, or misreadings of the text. These are identified by the
abbreviation "spu".
Subentry <SE>
This structure consists mainly of Bold Sub-Headwords
(i.e., defined and illustrated combinations and derivatives included
within the entry for their main word) together with their definition
text. Corresponding quotations can often be found in
pseudo quotation paragraphs.
(See also Headword and Italic Sub-Headword).
Variant Form <VF>
The OED attempts to include all documented
earlier spellings, irregular inflexions, unusual plurals, etc. of
headwords, where appropriate or known. These are contained in a
Variant Forms List preceding the etymology. Regional labels are
sometimes included to indicate the geographic area in which the
particular form prevailed (or prevails). Many of these forms also
appear as headwords in cross-reference entries (see Status).
Work <W>
Refers to the title of the work which was the source for a
quotation. The title usually appears in italics following the author's
name and preceding the actual quotation text. The work's text normally
includes reference to the specific chapter, page, act, scene, etc.
where the cited quotation can be found. Titles are frequently
abbreviated and the definite articles "the" and "and", as well as the
preposition "of", are routinely omitted. Abbreviations used for a
single work can vary; for example, Shakespeare's "Comedy of Errors"
appears as "Com. Err.", "C. Err." and "Err." (for an example of a
search by title, see D.L. Berg, 1989). Some works, such as anonymous
early texts like "Beowulf" and "Cursor Mundi" are cited by title only.
The Bible is a special case; for example, books of the Bible are
sometimes tagged <W> with "Bible" as author, especially for the 1611
King James version (for a discussion of the numerous variations in
citing translations of the Bible, see D.L. Berg, 1993).
Identification of early and obscure works is frequently difficult and can be aided by reference to the Bibliography which appears at the end of Volume 20 of the printed text, and which includes most, but not all, of the titles cited. A notable exception, since this is a bibliography of English works, are the many foreign dictionaries and other word books often referred to in etymologies. (For a discussion of problems associated with matching citations in the Dictionary text to the Bibliography, see G.V.J. Townsend, "Citation Matching in the Oxford English Dictionary". UW Centre for the New OED, 1989.)
Note that it cannot be concluded that a word form is not defined or
its use illustrated in the OED if it does not appear as a headword.
Many other forms are defined and/or illustrated within entries for
their "main" words (see also Bold Sub-Headword and
Italic Sub-Headword).
Homonym Number <HO>
Homonym numbers are used to distinguish between
or among headwords with the same spelling and part of speech, but
which warrant separate entries because of their distinct meanings and
histories. The number appears in the text as a superscript attached
to a part-of-speech designation, or in the case of some nouns, to the
headword itself. The number gives each headword a specific "address"
which can be used in Dictionary cross-references
(see Cross-Reference Headword).
Relative Cross-Reference <RX>
The OED contains a number of
cross-references which use the terms "prec." (preceding) or "next"
to indicate to Dictionary users that they should refer to the
preceding or next entry, or, in some cases, to the preceding or
next sense in the same entry. A frequent use of "prec.", for
example, is found in etymolologies of entries for derivatives
or combinations which combine the headword of the previous or
"preceding" entry with a suffix, combining form, or another word.
The document file distinguishes this particular type of reference by
tagging all the occurrences of "prec." and "next" within
cross-references <XR>.
Sense Level 0 <S0>
The various senses and sub-senses in the OED are organized
in a hierarchical scheme utilizing numbers and letters to distinguish steps
in a headword's development. Sense development is usually chronological,
starting with the earliest sense, except for some entries which follow
"logical order". The simplest form of identifying senses is linear (1, 2, 3..),
but often further subdivisions are required which are ordered a, b, c.. (with
the letters in bold type). Further subdivisions are made by italicized
series (a), (b), (c).. or (i), (ii), (iii).. , or, occasionally, small Greek
letters (alpha, beta..).
When a word's development is not straightforwardly linear (for example, when groups of senses developed simultaneously or diversely), a second level of numbering and lettering employing upper case roman numerals (I, II, III..) identifies branches. Sometimes two parts of speech, such as noun and adjective, are included in one entry, and each "fork" is then identified by the highest level of the scheme, upper case letters (A, B, C..). The two upper levels may be integrated in one entry, and are also occasionally used for other purposes, such as organizing groups of senses syntactically or semantically.
Sense levels 1, 2, 4, 6, and 7 identify groups and senses numbered
according to this scheme. Level 1 refers to A, B.. groups; Level 2 to
the I, II.. groupings; Level 4 to structures numbered 1, 2..; Level 6
to the a, b.. sub-senses; and Level 7 to the italicized bracketed
sub-division of sub-senses - (a), (i), or Greek letters. The remaining
numbers are used as follows: Level 0 (zero) identifies unnumbered sense
sections, such as initial over-arching text preceding a regular sense
numbering, or unnumbered final paragraphs beginning with the word "hence" that
usually contain one or more derivatives. Levels 3 and 5 contain increasing
numbers of asterisks (*, **, ***..) that provide another means of grouping
senses by semantic or syntactical headings in lengthy entries. Level 8
consists of irregular unnumbered senses, such as paragraphs preceded by a
a "catachrestic" symbol illustrating erroneous use of a sense (see Status).
Sense Level 1 <S1>
Identifies groups of senses lettered A, B, C.. and is
primarily used to separate two (or more) parts of speech (e.g., noun
adjective) when they are included in a single entry.
For a further explanation of sense structure and groupings, see
Sense Level 0.
For a further explanation of sense structure and groupings, see
Sense Level 0.
For a further explanation of sense structure and groupings, see
Sense Level 0.
Sense Level 2 <S2>
Used to identify groups of senses numbered I, II.. ,
representing branches of meanings which developed simultaneously or
diversely.
Sense Level 3 <S3>
A structure which takes the form in the printed text
of an increasing number of asterisks (*, **, *** ..), and is sometimes
used in complex and lengthy entries to group senses under semantic or
syntactical headings.
Level 5 <S5> is also sometimes used for the same purpose.
Sense Level 4 <S4>
The most common type of sense development
structure in which senses are numbered consecutively 1, 2, 3..
For a further explanation of sense structure and groupings, see
Sense Level 0.
Sense Level 5 <S5>
A structure which takes the form in the printed
text of an increasing number of asterisks (*, **, *** ..), and
is sometimes used in complex and lengthy entries to group senses under
semantic or syntactical headings. Sense
Level 3 <S3> is also sometimes used for a the same purpose.
For a further explanation of sense structure and groupings, see
Sense Level 0.
For a further explanation of sense structure and groupings, see Sense Level 0.
Sense Level 6 <S6>
Identifies the lower-case bold letter structure
(a, b, c..) used to subdivide senses. For a further explanation of
sense structure and groupings, see Sense Level 0.
Sense Level 7 <S7>
Identifies the structure using italicized and bracketed
letters (a), (b).., or numbers (i), (ii), (iii).., or, rarely, lower case
Greek letters (alpha, beta ..) attached to sub-divisions of sub-senses,
and usually found in lengthy and complex entries.
Sense Level 8 <S8>
Used to identify irregular unnumbered senses. For
a further explanation of sense structure and groupings, see Sense Level 0.
Sense Number <#>
A sense is a numbered and/or lettered entry
component which includes as its major elements a definition and
supporting quotation paragraph. The number or letter enclosed
by sense number tags not only serves to organize the senses, it
also provides a unique address for each sense, an important feature
for cross-referencing. Sense identification is especially important
in the OED since some entries contain 100 or more senses; for
example, the verb "run" has 82 main senses and over 350 sub-senses.
(For an explanation of how senses are structured, see Sense Level 0,
and also compare with Cross-Reference Sense Number.)
Stressed Form <SF>
The full form of main headwords, bold sub-headwords,
and italic sub-headwords. "Full form" means that each form
incorporates diacritics, diphthongs, punctuation, stress marks, etc. as
they appear in the printed Dictionary. In the database, each of these
typographical elements is tagged, although not all headwords contain such
elements, e.g., monosyllabic words and most combinations and derivatives,
for which stress is self-evident. (Compare with Lookup Form.)
Sub-Etymology <ETN>
An etymology attached to a particular sense of
a headword. These subordinate etymologies appear in square brackets
in the printed text, and normally contain historical information
relating to the sense of a word which does not lend itself to inclusion
in the etymology at the head of the entry.
its use illustrated in the OED if it does not appear as a headword.
Many other forms are defined and/or illustrated within entries for
their "main" words (see also Bold Sub-Headword and Italic Sub-Headword).
Subsidiary Quotation <SQ>
This structure contains quotations in square
brackets which are occasionally found in quotation paragraphs, usually
as the first citation(s). The convention is used when a quotation does
not actually employ the word in context, but is in some way relevant to
its history. For example, in the case of a word borrowed from another
language, the quotation may document its use in the language of origin.
Superscript <su>
Typographical tagging in this category is attached
to most text in the Dictionary which appears in superscript, with the
exception of homonym numbers. Superscript text includes miscellaneous
typographical conventions used in printing Murray pronunciations (see
Pronunciation), mathematical functions, etc. In addition, it contains
two special superior numbers preceded by a dash (-0 and -1) that sometimes
further define the label "rare". In the first instance, the -0 indicates
the word was found only in an earlier dictionary rather than a contextual
quotation, while -1 means that only one quotation from a text other than
a dictionary was found.
Variant Date <VD>
Earlier forms of spelling, irregular inflexions,
etc. included in variant forms lists, are assigned century ranges
indicating when their usage was prevalent. Centuries appear in abbreviated
form, for example, "5-6" indicates fifteenth to sixteenth century.
Variant Forms List <VL>
Lists of documented historical, or sometimes
contemporary, variants of a headword's spellings, irregular inflexions,
and unusual plurals that normally appear in the printed text immediately
before the etymology. Forms are further identified by the century range
in which they prevailed. Lists are arranged in chronological order with
the earliest variant(s) first. In some cases, two or more branches of
forms may have developed simultaneously and these are grouped by
lower case italic Greek letters. Illustrative quotations that follow
are often referenced by the same Greek letters. (See also Variant
Date for conventions used for century ranges.)