Corpus Analysis

For my corpora, I took texts from two well-known creative short story authors: Joan Didion and David Sedaris. While I found some commonalities between both of their writing and stylistic trends custom to their own writing, neither of which could be considered a counterpart to each other or an opposite. Each writer stuck was very much their own form of writing and style.

Graphics of Terms From David Sedaris' and Joan Didion's Corpora

Joan Didion is a nonfiction creative writer who often journalizes about political and social aspects of America, adding rhetoric. Many works of hers describe her life and memorable events and travels, while others recall some traumatic events occurring and mourning. Didion has a multitude of published books. The Year of Magical Thinking examines her process of grief after her husband’s sudden death. She writes with sensitivity and wisdom likely gained from her many experiences and self-examination.

David Sedaris writes in a very different genre from Joan, in that he uses comedy in his nonfiction pieces. Often, he writes about his past experiences, some stemming from his childhood and others from outlandish experiences he encountered living in several countries. His memoir titled Naked holds a collection of humorous essays describing his upbringing as a young adult. It is not uncommon to see his work contain profanity, and often some of the stories and phrases he writes can be rather offensive when taken out of context. For example, the piece titled Six to Eight Black Men is not just about six to eight black men, but rather describes the absurdity of the Dutch storytelling of St. Nicolas, who was depicted with six to eight black men at any given time. More recently this depiction has become a controversy among the Netherlands.

Voyant Tools' Summary of David Sedaris' and Joan Didion's Corpora

Both corpora came to a similar total word count. Didion’s corpus came to a total count of 28,652 words in 8 documents. Sedaris’ corpus came to a total count of 28,618 words in 12 documents. The need for an increase in Sedaris’ documents to meet that of Didion’s word count was likely due to containing more dialogue, equating to lower average words per sentence or shorter document length. Looking at Voyant Tools, it appears both are true.

Sedaris’ longest text was Giant Dreams, Midget Abilities at 4034 words, not even half of Didion’s longest text, Afterlife. Sedaris’ average words per sentence also range from 13.5 to 18.2 words in each document whereas Didion ranges from 18.9 to 39.1 words. From what I understand of each author’s writing, Sedaris often retells his stories with short dialogue while Didion chooses to prefer reflecting and examining in an almost philosophical way. It is also not uncommon for Didion to create lengthy descriptions as well. This could be one explanation for the difference. How much the dialogue affects the length among other factors I could not say, however.

Voyant Tools' Phrases for Both Corpora

Phrases found were another subject I found particularly interesting. Voyant Tools found that both authors had an overwhelming majority of phrases only repeat twice. Some of the longest phrases and most short phrases are repeated twice. Voyant Tools likely excludes any single-use phrases which could overwhelm the program with large corpora. More often than not though, these double use phrases would only be found in one document of one author. Even after combining both corpora to view the trends, the same rule still applies. Under further inspection, I have come to realize that these phrases often happen in quick succession. Both authors tend to repeat phrases for intended impact.

Voyant Tools' Phrases for Sedaris' Corpus

The phrases themselves told a lot about the authors writing as well. From David Sedaris’ corpus, I could tell that his writing was meant to be humorous with how absurd they are when taken out of context. “The biggest, fattest bull dyke in the entire city of Boston” is my personal favorite phrase. The phrase comes from The Man Upstairs.

Voyant Tools' Phrases for Didion's Corpus

Joan Didion’s corpus’ phrases could just as well be taken out of context and have humor found in their vagueness, however, it is apparent that humor is not the intention. My favorite phrase from her corpus is “As it was in the beginning, is now and ever shall be, world without end.” This phrase comes from Afterlife. Apart from being the longest repeating phrase of hers, it can show much about how she writes. The context behind this phrase is that she is exploring the interpretation of this phrase, relating it back to the topic of meaninglessness which she explored as a child.

AntConc's 2-Grams for both Corpora

When using AntConc, both authors’ most common 2-gram is the same prepositional phrase. The N-gram “in the” appears 228 times in Joan Didion’s texts and 101 times in David Sedaris’. As an attempt to limit prepositional phrases, only 3-grams or higher were shown afterward.

AntConc's 3-Grams for Joan Didion's Corpus

Didion’s most common N-gram was “in New York” totaling 20 times. Locations were common in her texts, some locations more prominent than others. Some other frequent location-based N-grams were “the living room” (11), “the San Bernardino” (8), “a Santa Ana” (7), “in Las Vegas” (7), and “in Los Angeles” (7). While there is not much of a pattern between the locations and in what texts they are listed, the cities themselves are quite well known. Most of the texts used in Didion’s corpus are taken from one general part of her life which could explain the occurrence of relatively close locations.

I felt it should be reported that “Lucille Miller’s” is ranked the 7th most common N-gram at 10 times, despite appearing in only one text. The next name N-gram, “Arthwell Hayton’s,” is ranked as the 35th most common, still shown in only one text. Didion has a pattern of not mentioning names in her writing, a pattern that is broken in Some Dreamers of the Golden Dream. This does contribute to the impression that Didion takes more time self-examining in her writing.

AntConc's 3-Grams for David Sedaris' Corpus

Sedaris’ most common N-gram was “I don’t” at 17, which was followed by “my father’s” (16) and “my sister and” (14). AntConc becomes somewhat of a challenge here. Within the 50 top-ranked N-grams, 8 are a section of the phrase “the six to eight black men,” 3 mentions of “sister,” and 5 mentions either “father” or “mother.” Multiple counts of the N-grams can be found as part of the same repeating phrase, while others are entirely their own. However, “father,” “sister,” and “mother” are found usually in multiple documents. I feel this confirms a pattern in his writing and starkly contrasts Didion. Sedaris often writes about his past experiences as he was growing, but it could very much as well be that his comedic writings tend to revolve around others. From the N-grams listed location isn’t commonly listed either, again contrasting Didion.

Voyant Tools' Graphic for Both Corpora

Each creative writer has their own uniqueness, and to compare styles would be more than to compare apples to oranges. From further analysis, it is apparent that the authors are dissimilar in most aspects which could very well be attributed to differences in genre. Some of their non-genre-related differences include average sentence length, the prevalence of characters, the prevalence of locations, and the length of documents.

This isn’t to say that they completely lacked in commonalities, however. Both professional writers had unique phrases and formations of words that often never exceeded 3 occurrences. From Voyant Tools when combining both corpora and based on the scroll bar on the side, phrases that occur more than twice account for less than 1/5th of 2000 phrases listed. While the smaller phrases were often more general and could be seen in any variety of text, the lengthier phrases were used for impact within the text.

Corpora

Sedaris' Corpus Didion's Corpus Combined Corpora