It is important to say that the idea for this project came about when the native Korean speaker of the group noticed differences between the Korean and English versions of the same stories.

Due to the nature of this project, the majority of the hard decisions were made during the marking-up process. We started by understanding that we needed to keep certain elements of organization inherent to the text. This starts with the root element, “collection”, which contains “stories”. These contain paragraphs (“p”), which contain dialog (“q”). This is a pretty straightforward organization that fits into the tree structure of XML.

The question concerning our curiosities in the similarities and differences in formality between English and Korean translation revolved around how we were going to mark-up dialog, specifically. Therefore, all the interesting things happen inside “q”.

Firstly, there are a few unique qualities to dialog that set it apart from the rest of the text. Dialog is spoken by at least one person, of course, but can be spoken to multiple people. Not only that, but a speaker can talk to their friends (potential speakers) about other people. So, our team started by generating a characters list appropriate to our project. This list comprised of any and all characters speaking, spoken about, or spoken to. This means that it is not a characters list in the usual sense, because it is possible that characters can exist in a story without speaking or being referenced by other speakers.

The differences in types of references also needed to be categorized. We marked all the references (instances when someone is being spoken to or about) as “address” elements. These elements are the foundations that set up our future analysis. Within the “address” elements are other pronouns (marked as “pronoun1st”, “pronoun2nd”, or “pronoun3rd”; for when a speaker is speaking to someone), names (marked as “name”), and possessives (“poss1st”, “poss2nd”, or “poss3rd”). Within each of these is either text or an element called “word” (to be explained shortly).

In Korean, each of the different types of honorific forms were marked accordingly: plain, high, higherstatus, low, praying, and humble. In the English versions, we read over each story in context and selected any and all words that would lend themselves to pushing the dialog in an informal or formal direction. These are marked-up as “word” with an attribute value of either “formal” or “informal.” Of course, this was a highly subjective process; we marked-up “word”s based on our instincts as native English speakers and judgment about the context of the story as best we could. We also considered identifying each particular quote as either formal or informal, but felt that this wasn’t specific enough as it didn’t give a clue as to what was causing the tone to lean one direction or the other.

A Korean translator would have to use their instincts as a native Korean speaker to make certain decisions too, but because the Korean honorific system is more organized than the English honorific system, a more systematic approach to mark-up was possible. Sentences ending with the "-yo" suffix were considered as plain forms (e.g., 해요,먹어요). Sentences with the ending "-pnida" form were considered as high forms (e.g., 습니다.합니다).

Low form is a special kind of form because it can only used by a male speaker. It is not acceptable for females to use this form. Sentences with the ending "-sso" were considered as low forms (e.g., 하겠소). Even though the low form is more polite than “-uing,” the casual form, it conveys very little respect toward the addressee. Humble words were marked if a speaker was lowering her/himself (e.g., 제가). The honorific form people use when they pray is distinct in Korean language, so we marked them up as a distinct form. Lastly, we marked up “higherstatus” for a word like "nim" (님) when it was attached to address terms and indicated that the speaker was in higher position than the person spoken to (e.g.,선생님). We did not mark casual words in Korean because if a word is not marked as honorific, then it is safe to assume that the word is a casual form.

Whereas the Korean language has a system of honorifics with very specific rules, English has no such equal. What makes a piece of dialog formal or informal? This is a question fundamental to our project. It isn’t enough to know that a character was referred to using a nickname, like “dear.” Who was referring to that character? Who might use the term “dear”? The speaker is crucial to consider when examining aspects of dialog. So, we included the speaker of each quote as well, because the manner in which a person speaks is largely a matter of their relationship to the person being spoken to or spoken about.

Going along with that question, we picked three character qualities (to start with, though there are no doubt many more) that we considered important to compare between speakers and referenced characters. They were age, social status, and gender. Each of these is marked as an attribute in our characters list. We chose these particular qualities because they are basic, universal and undoubtedly visible characteristics. One’s decision to use formality in English can be, and is often, based off of these. Characters will have these same qualities in any translation of the story, and they are relevant in determining formality in Korean as well.

So, we added attribute values to label our characters as “young” (a child) “middle” (anywhere from a young adult to middle-aged) and “old” (an obviously elderly person). “Low,” “neutral,” or “high” for status, and “male,” “female,” or “unknown” for gender. We based our marking for status on any relevant information determining economic status or background, and/or profession. “Neutral” isn’t so much a middle class as a category for characters without any obvious information pertaining to social status or class.


During the time allotted for our project, we managed to amass and categorize all of this information, as well as create a few charts and graphs to attempt to visualize some of this information (which can be found under “statistics” in the menu bar).. These graphs don’t conclude anything by themselves, but provide a way to efficiently view information. The two graphs under “general information” show different blocks of form-specific color per story – one bar chart for Korean and one for English. Our gender chart shows the number of honorifics used by male and female speakers per story. There is also information concerning total data above our graphs.

These graphs help us see some things clearly, for example, proportions of use of formality across all stories. The majority of conclusions we made are from either the marking-up process, or the combination of looking at the graphs and the stories together.

An interesting and obvious result was that overall not only are there fewer indicators in English, but the majority of them are informal as opposed to the Korean version, in which the majority of indicators are plain or higher honorific forms. This could be on account of our conservative approach to marking up the indicators in English, or it could be because of the differences in how/when formality is appropriate in English and Korean cultures. We also only marked words within dialog as formal or informal. Perhaps the translator would have picked more words as indicators, or, placed more weight on particular aspects of a story, outside of dialog even, in order to decide when to use honorifics.

What else did we notice? Complexities in dialog, for example, in tone of voice as related to those qualities we decided were important. In the English version of “The Whirligig of Life,” the two main characters (one male, one female, same age) speak almost entirely in slang. Of course, slang is typically considered informal in English, but these characters use slang even when speaking to a judge. It is known that they do not possess money, and their use of slang with the judge may suggest to a reader that either the characters consider it unimportant to use formality, or the characters never learned to speak in a proper manner. So, we conclude that the use of certain formal/informal mechanisms is relative.


We knew from the beginning that gender was an aspect of dialog we wanted to pay particular attention to. We picked stories that had each had both male and female characters.

Females and males use honorifics relatively equally in each story. Males had more total quotes (159) than females (114), but this does not take into account the length of the quote. It may be interesting to note that although female speakers have fewer quotes, the total number of honorific words spoken by women (188) is greater than the total number of honorific words spoken by men (137). We aren’t quite sure what this signifies; in relation to age and social status, there isn’t a total lean to one direction in either age or status, but these factors in combination could play a role.

Although we do not have a graph for gender for the English version at this time, we still made some observations in comparing the translation with the original. For example, in the story "A Lickpenny Lover,” the male character Carter proposes to Masie, a female character. In English, Carter uses 24 formal words while Masie uses none. In the Korean translation, both Masie and Carter use 38 honorific words (not including the low form, which, as stated above, can only be used with male speakers). It could be that this is showing a different gender relation in Korean culture.

Again, we could look at the other qualities to give us understanding. We marked Masie’s status as “low” and age as “young,” and we marked Carter’s status as “high” and his age as “middle.” We take it that in the story, Masie, the “shopgirl” lives in a different social world than Carter, the “painter, millionaire, traveler, poet, automobilist.” Although Masie lives in a different social world than Carter, she certainly would be aware of his class and gender. Masie is skeptical, frank, and a little spunky with the high- class gentlemen customer-turned-flirt. She has become accustomed and even numb to men falling for her beauty. These characteristics come across in her tone of voice in the English version.

It is possible, however, that in the Korean translation the most important factors were her lower profession or younger age (rather than skepticism or frankness) that caused the interpretation of honorific forms as most appropriate to the situation.

This raises the question of when formality is appropriate in the two different cultures. In Korean, a speaker uses honorifics to recognize that the referenced person's status is higher than one’s own and to humble oneself.

In English, formality is used for several different reasons. A speaker can show their own superiority in that they are more cultured, a higher class, or more educated. A speaker can use formality also to show respect. A speaker in general could raise themselves, rather than lowering themselves. There exists formal speech in English that is used humbly but has fallen out of use in the modern age.

Unlike Korean, there is no situation in English where it would be considered appropriate to lower oneself, any sort of use of low language would be considered disrespectful.

We assume that in the English version, Carter's formal language is used to demonstrate his status, and the fact that Masie does not go out of her way to be formal shows that she does not care to appear to be anything but what she is (a poor working girl).

When looking at the translations, one purpose of looking at dialog is to see what was either maintained or lost when translating to Korean. We would argue that by having Masie use honorifics in the Korean, the reader loses the effect of Masie’s indifferent attitude. Her attitude makes the climax (when Carter thinks he’s finally charmed Masie) more climactic; it makes the twist ending that is so signature of O. Henry’s style more dramatic.

Where could this project go?

This project evolved from an idea to something concrete. We could develop the girth of the project by adding more stories and more views of the same data we have displayed already. By adding more stories, we would be able to further confirm patterns or find outlying data.

The expansion of ideas is limitless. We could continue to explore the aspects of sociolinguistics already started in character attributes (gender, status, age). We could add new aspects such as education, ethnicity, religion, etc. We could also continue to mark-up parts of speech.

Possibly translated from Japanese?

It has come to our attention through the insight of a Korean professor that our translation could have been made from Japanese, not directly from English. This is only an educated speculation, but it would mean a further complexity in analysis. Some things to consider would be the similarities and differences between Korean and Japanese honorifics in terms of translating between the two, and translating from English to Korean.