On a spring evening of 1977, a 15-year-old boy turned a tape recorder on in an apartment on the island of Manhattan in New York City. His mother had volunteered herself and her family when Professor Hervé Varenne, in one of her classes, announced that Professor Clifford Hill and himself were interested in talk, conversation, familial talk particularly, everyday talk in a setting "where there were no problems, in a normal family."
Not much guidance as to what was needed was given. A mention was probably made of a "dinner" conversation. Mention was certainly made of the need to have "all" the members of the family present and to try and make the taping during a "normal" time. When the time came to transform these general directions into practical action, the woman recruited her son to help her set up the tape and the recorder, hide the latter in a sewing box, and place the sewing box on the table where she then organized herself to do some sewing while another son was finishing homework and a daughter watched. By doing this she made something, some things: a tape, that is, a magnetic representation of some speech. This was later transformed into various written representations, histories, transcripts, summary figures, analyses, etc. She did not physically make most of these things with her hands or her voice. Rather she triggered their productions and got others to act, "for herself" at first, and then in a way that more and more completely escaped her. This work is continuing at the present moment, not only the present of my writing act here, but the present of the reading act which, you, reader, are now involved in.
The present of your reading act requires that "I," in the present of my writing act, do certain things. What these are is not obvious to me. But I do have to act, to write something that somehow takes into account the task which I wish you, the reader, to accomplish. This task may be more difficult than the "normal" reading task in that I am trying to get you to keep track of your reading act and, even more importantly for me, of what it is that you will then do with your reading as you "think about it," that is, make internal speeches with it--and then, perhaps, as you "talk about it" with others. To make the current reading easier, I am thus going to start with a straightforward presentation of the people who made the tape and of the physical setting in which they made it.
The woman who set all this in motion will be known as Connie Harvey. Her son who turned on the tape recorder will just be known as Jack. At the present of the taping he is 15. He is the only child of Connie's first marriage. He resides full time with the Harveys. Connie is 41. She has just completed a masters degree in psychiatric social work and has entered a doctoral program in a related field. She is not now practicing her craft, though she did before her marriage to Ray Harvey. Ray is 48 years old. He has a private practice in one of the mental health professions. Both Ray and Connie have spent most of their adult lives in and around New York City. They have been married for ten years.
Ray and Connie have two children of their own: Mike, an 8-year-old boy who suffered some brain damage in a childhood accident, and Kate, a 6-year-old girl. All the children attend private schools in Manhattan. Mike also sees a therapist who is helping him with his handicap.
Since their marriage, Ray and Connie have lived on one of the better sides of Manhattan. For the past 8 years they have occupied a seven-room apartment in a high-rise building; and it is there, in the large living-room, that the recording was made. In one corner is a grand piano and nearby are two sofas, three armchairs, and a coffee table. Large plants stand in another corner, and smaller ones hang near the windows that look out across the river. Four speakers of an audio system are suspended from the ceiling. In another corner, near the kitchen and the corridor to the bedrooms, stands a dining-room table.
At about 7:30, Connie, assisted by Jack, places a small recorder in a sewing basket on the dining-room table, and the hour of recording begins. It had probably been suggested by Hervé Varenne that the tape not be the focus. No mention was made of keeping the recording secret to any of the participants (possibly no mention was made of the matter at all). With these very loose guidelines, Connie makes something. She chooses a moment after the children's dinner, the time when their day is closed and they are put to bed. This seems to be a routine moment, though there is also some indication that it is occurring a little bit later than usual. She decides that the children should not be aware of the tape, and they seem to have become aware of it only at the very end of the hour. Ray enters the room after the tape has been turned on and, from Connie's account, only becomes aware that it was on after the first half-hour.
During all the first half-hour Connie remains seated at the dining-room table, sewing, with Mike on one side and Kate on the other. They mostly stay put, finishing homework, playing with crayons, with objects in the sewing box. From time to time they get up to get a book, or look at what Ray has brought from his shopping for Indian spices. Jack comes in and out. He comments on stories his mother tells, helps the younger children, look for the time of a television program, discusses a band rehearsal with his mother, checks on the tape recorder. Ray moves about. He measures space in the living room where a china cabinet he has looked at during the day might fit. He puts away the spices he has bought. He checks on the children, on the roast that is cooking in the kitchen. Connie and he discuss various events from her day. And thus the first half-hour passes.
The above paragraphs are straightforward. But this is a deception, for they say certain things, but not others, in a certain genre, but not in another. This is what these people did and, it seems to say, "that's all there is to it." It is important, however, that this literary effect be questioned: What are we choosing to tell, what are the rhetorical means which we have used to make this description informative but somewhat flat in terms of affect and decidedly not "thick" interpretatively? All of these questions are questions which remain with us throughout this work. Whatever we say about the way the family members construct something to achieve something with the other members also apply to our own construction of this particular text with the readers.
We are not in fact going to explore the properties of the "descriptive" paragraphs, except from one point of view, and that is the source of the "information" which they "re-present," bring into the present moment again. This is useful, because the nature of the "facts" which we have brought forward at this point is not so different from the nature of the "facts" which we represent later, in what is framed rhetorically as the body of the analysis. How did we learn what we are now repeating? How did we produce the text that you are now reading? What is the relationship between this text and the original production which is the pre-text, if not the pretext, the text before the text which you are now reading? What is involved in establishing the "authority" of the text? What is it that makes it one that can claim a certain, indeed greater, usefulness as a statement of empirical knowledge about human beings and what they do in their natural, that is, cultural, setting? These are fundamental questions which we must ask to establish the validity of our statements within the social sciences to which they claim to be relevant.
What was read earlier, and what will then be read (assuming a linear, beginning-to-end reading of this work), is based on a set of encounters with some of the participants whom we have now met. One of these encounters is the main focus of the work: the transcript of the talk itself. As is always the case, most of the information the talk relied upon to achieve its practical effect is not literally spelled out in the talk itself. Either it is indexed (through the use of pronouns, for example), or it remains implicit, so that the reader of the transcript, initially at least, may not even realize that something is in fact specifically being referred to. For example, an utterance like "found us a proper china thing" (sec. 20-23) signals, through the use of personal pronouns, the need for us to ask:
- who is the subject?
- who is us?
- what is the thing?
- what is it proper to?
All these things are "in" the talk through the indexical power of the talk, but they cannot be specified without recourse to mechanisms other than an examination of the utterance itself.
Some of these "things" can be discovered by examining the transcript itself: The speaker is the man. Good guesses can be made about other things: Given that the children never participate in the talk about the thing, it makes sense to suppose that us excludes them. Later, Connie asks Ray, "tell me more about the china closet" (sec. 107-108) thereby specifying what the thing is and thus possibly stopping our investigation, as we may think that the original question about the nature of this thing has been answered.
Still other things remain audibly present but implicit. There is no expansion of the adjective proper. Still others, like the handicap of the second son, are audible but seem not to emerge in the speech until one gets alerted to it by other evidences (easily passed-over statements about the fact that the child has needs for special objects). Other things, like the fact that this was the wife's second marriage, seem to be completely absent. If we had not been told specifically by one of the participants in situations other than the initial one, we would never have known.
These features of the Harvey talk are, we now know, features of all talk. Most of the information available to participants in a conversation is not explicitly mentioned in the talk. Some is; some is brought indirectly through allusion and deixis. Most is not mentioned, as if participants always assumed that others will know--even when there is no evidence that they do. For someone who enters a long-standing conversation--as we all do many times in our life as we move out from our households--only extended participation will eventually fill in the gaps. Ethnographers place themselves in such situations because they know that one cannot imagine a priori what questions to ask human beings so that they can reveal what is significant in their lives. A general question like "tell me what I should question you about" will never produce all that can be produced and that we need for the kinds of research where the collection of information is the primary goal. Then, the only thing that can work is extended fieldwork that allows for one to be present when information emerges.
Our goal here is not the collection of information. Rather it is the more precise specification of the processes through which information is presented, accessed, transformed in speech. Thus, we did not need to conduct a full investigation of the family--at least not one that a "private investigator," or an early 20th-century ethnographer, might have conducted. No doubt such an investigation would have revealed that the talk contained internal references to many more things than we have become aware of. We suspect, however, that it would have confirmed what is a major "finding" of this research, the fact that the information that can be made relevant to talk is not a closed list which could be exhausted through patient investigation. In many ways which we will explore, talk does not "contain" information. Rather, it triggers information in the hearer, including the "over-hearer" of the talk.
Like any over-hearer of the tape which Connie Harvey had her son make for us, we were of course curious to "know more." She very graciously agreed to be interviewed two or three times about her family in general, and about details that had emerged in the talk but had not been developed. In these conversations we learned about the Tiffany china that was to be displayed in the 'proper' cabinet that had been discussed. We learned about the existence of a first marriage, the accident that had produced Mike's disability, the therapist who had told Connie not to answer him at his first calls to her, Ray's profession, the ages of the children, etc. In a series of interviews 2 years later, we learned about the divorce between Ray and Connie, the kind of difficulties which had triggered it, and some of the arrangements that had then been made.
This set of conversations forms a corpus of cross-referencing texts from which springs everything that we write in this work. We struggled to base our analyses solely on information that the reader can gain through a critical examination of the transcripts, because all that we ourselves know about this family comes from the transcript. In this way the "data" of our analyses are made available to critics.
Our use of the word text to represent the activities of the participants on the evening of the tape, the various kinds of transcripts which were produced from this tape, and the analysis as it is now written is not simply a bow to a current and perhaps already fading fashion within a subsection of cultural anthropology. In its original formulation by people like Geertz (1973), Boon (1982), and others, to talk about cultures, that is, the patterns which people make in their history, as texts was to construct a metaphor which highlighted certain aspects of the ethnographic task which preceding metaphors (culture as organism, or as personality, for example) had obscured. Thus, one could remind one's readers that human action always involve making a mark, writing, on the world. The world is thereby transformed, made richer but also always made poorer. What has been written, the myths, rituals, literatures, metaphors, proverbs, etc., but also the buildings, legal and economic systems, political institutions, etc., can then be seen as texts, the end-products of acts of inscription. The ethnographic task can be analogized to the task of the literary critic who takes a text and then interprets it.
Critics of the metaphor "culture is text" have focused on the theory of interpretation which seems to be necessarily attached to it, a theory which, in Geertz's hand particularly, has not been able to shake the accusation of "relativism." If culture is text, it is easy to make Geertz say, then we can say anything about anything, and the scientific task has collapsed into a struggle of voices asserting their own right to be heard in an act of raw power. Those who have followed Geertz have come close to saying that much, and they may be writing themselves out of anthropology.
In our case, however, the word text can be used to refer to our activity in a much more immediate sense. It is less a metaphor than an icon of what it is that we are dealing with. The transcript that is the "data" is indeed a written text in a literal sense. Similarly, what you are now holding is, literally again, a text. Whether what originally happened was also such a text may seem to stretch common sense (words like event, interaction, conversation, etc. seem more appropriate), but we suggest that so to see it does allow for a deeper knowledge than would otherwise be possible.
Theoretically, the issue becomes one of tracing the relationship between the texts which, once each has been produced, together form the historical setting for your reading act and the production of your text. While the texts (event, transcripts, analyses, articles, book) were produced in a historical progression which we must respect, they become available to you in another historical progression (the progression of your reading). Every new act of reading is produced in a setting that does not yet include it. It is only when this reading is completed and has produced something that can then be consulted that it takes its place within the background out of which new texts will be produced.
We argue that this process of production of new texts out of other texts is a process that is essentially iconic to the process that built the original action--whether this action is to be labelled a text or not. A close examination of the way the utterances follow each other in local history reveals that they, too, take into account what has already been said and do new things with it that then become a part of what must now be taken into account. No utterance ever simply replicates what has already been said, and the history of the social sciences is suggesting that no replication of human action is indeed possible. What is possible is the making of new actions, literal texts in our case, which can be demonstrated to pick up on certain aspects of the original action and amplify them for a particular purpose. In other words this present text, and all its various subtexts (figures, charts, summary analyses, etc.), claims to be grounded on specified properties of earlier texts which are retextualized, rewritten, transcribed, for the purpose of making a specified point within a particular conversation that is different from the original one, the conversation, or rather conversations, which have been going on in the various disciplines to which this work is relevant: family and interactional studies, discourse and symbolic analysis, cultural anthropology, general anthropology, and the human sciences in general.
Analysts of recorded speech events often say that their work is guided by two sets of principles, one concerned with transcription and the other with theoretical analysis. These sets of principles are, at times, presented as if they were disparate, those having to do with transcription being applied first so that a physical record of the events can be established independently of any particular analysis that might follow. This independent corpus is then viewed as providing a certain empirical control over variant forms of analysis.
As Elinor Ochs (1979) has pointed out in an article on "Transcription as Theory," the separation of the two is largely a methodological fiction, one particularly well suited to maintaining the myths of empiricism around which so much academic discourse is constructed. As Ochs points out, analysts do not work directly on the speech events themselves, but rather on a selective representation of these events that is itself organized as a distinct event, according to the specific constraints of the medium chosen for the representation.
Before mechanical means of recording, the initial events were necessarily filtered through a particular memory, and from this filtering process field notes were constructed. Hence, by the time the analysis was conducted, the initial experience had been mediated at least twice, first by the psychosocial processes which made a memory available for writing, and second by the socioliterary processes which govern such writing. The experience had to make memory. Memory had to make text, and text was what was used in the making of other texts.
With the use of tape recorders or video machines this double mediation remains, though its character has been altered. First, action is transformed into analogic magnetic signs inscribed on a tape, a process governed by technico-social processes which place boundaries on where and when a machine can be placed, what it can gather within its field of focus, what it can actually inscribe. Second, action is transformed by other technico-social processes that take the signs on the magnetic tape to be "read" by certain machines and make them into signs on paper to be read by other "machines," human beings with eyes and ears that are sensitive to different signals. Action has constrained a tape. The tape has constrained a transcript.
Certainly, a machine generally provides a record of the initial action that no human being could produce, but it also imposes its own constraints. Consider, for example, a situation in which analysts wish to work with variation in voice volume, which can obviously be of great importance in signalling speakers' attitudes. Working with only an audiotape, they will not be in a position to determine whether it reflects actual change in amplitude or merely a change in distance and/or orientation of the speaker's mouth with respect to the microphone. The use of a video recorder might allow the analyst to determine more accurately the source of variation, but it, of course, imposes its own constraints as well. Given the use of a single camera, a great deal necessarily remains unrecorded, particularly when people, as in many family settings, are involved in multiple activities that make them roam over large areas and many rooms. Moreover, what is recorded is subject to a highly selective point of view. In order to record what happens in a greater area of a room for example, the camera may be moved as far away as possible and a wide angle lens introduced. This decision, however, makes it quite difficult to record potentially important information involving facial expression, gesture, and posture. And, of course, there is always the problem of the intrusive presence of the machinery. A more thorough recording generally requires more equipment and personnel; and the more they increase, the more likely it is that local patterns will be transformed.
For all these reasons, most anthropologists prefer to work as one human being, alone in the midst of a large group, taking notes and writing them. For many purposes, for example, for determining what a person does over the course of a day, whom he meets, and what the formal labels of the activities he engages in with people are, traditional ethnography is both the most efficient and the most reliable technology we have. Other information can be adequately obtained through interviews. Some may even be obtained through structured questionnaires. In each case, however, the decision to use one technology rather than another is a theoretical one that must be grounded in one's understanding of the research goal, an understanding that can justify the trading of certain advantages against certain limitations.
In our case we were curious to see how information emerged in the moment-to-moment emergence of everyday life, in the real-time process of its production. We were less interested in describing a family than in accounting for the way a family makes a moment of its life. We also decided to leave the decision to turn the tape recorder on to the person making the recording, so that her very decision could be used for analysis.
We did this because we are very aware that one of the most powerful matters that must be taken into account when doing a tape recording is the matter of the effect produced by actually turning on the machine: What happens right after becomes "the beginning" of the tape, the beginning of the transcript and then, too easily, the beginning of "what happened." The turning on of the machine introduces a discontinuity within the flow of experience that may not correspond to any discontinuity the people themselves might otherwise place on their experience. Turning the machine on makes an event which, in the unrecorded world of experience, would not otherwise have emerged; and these events can be reinserted into the participants' experience only artificially, since what came before and after has been cut away.
Having considered the interface between experience and recording, let us now consider that between recording and transcription. The word transcribe itself signals the fundamental operation that is performed: a dynamic phenomenon, speech, is trans-formed into a static artifact, the written text. The initial step in dealing with recorded speech is to change its form so that we may have an access to it that is different from what we might have if we simply listened to it. But this access is purchased at considerable cost, for the fundamental contours of our sensate and cognitive experience of speech are disrupted. Speech signals, by their very nature, disappear continuously, can be retrieved only through memory, and can be made relevant to further speech only to the extent that others agree with our representation of the older speech. A transcript, on the other hand, is made up of graphic signals sensorially retrievable by means of visual scanning, and it is tempting to use it the way sportscasters use the instant replay of, say, a controversial call in a football game. Through an instant replay it can be determined whether some ball was in or out of bounds at a crucial moment. Through a transcript, it can appear that one will be able to disambiguate who had the floor at a particular time, who took turns, who interrupted, who introduced new topics, etc. As we demonstrate later, what is recorded and transcribed only provides evidence of the physical condition for a response. It never provides the consequential identification of this response, that is, the identification that can have consequences. More fundamentally, a recording never allows us to reconstruct the experience of any of the participants during the conversation. As phenomenologists have taught us (Merleau-Ponty, 1969), language is experienced as a seamless flow. Written language, on the other hand, is built on discontinuities--between letters, words, lines, etc. This is amplified in analyses, such as ours, that focus on differences which our very attention produces.
Our goal, however, is not to recapture experience, it is to recapture the practical conditions of experience. The first task is one at which poets and novelists excel, and we are neither. The other task does precisely require a focus on "external" matters, for they are precisely those which are jointly available and organize what an individual must deal with as he constructs his experience. Thus you, reader, must deal with our transcript and you must train yourself to read between its lines since its very external form can easily lead you to make the mistakes we are making and which it is your responsibility to correct. To give you some idea of these errors, look for example at an argument made by Elinor Ochs. She writes, and we agree, that
certain formats encourage the reader to link adjacent utterances and turns, whereas others encourage the reader to treat verbal acts more independently. For example, the standard "script" format [one that displays alternating utterances from top to bottom as in a play] tends to impose a contingent relation between immediately adjacent utterances of different speakers. (1979, p. 47).
This is particularly serious when examining the talk young children produce because children
frequently "tune out" the utterances of their partner because they are otherwise absorbed or because their attention span has been exhausted. (1979, p. 46)
What she posits for child talk, we show is equally true of adult talk in such settings as the one we examine. The point is in fact a more general one, for, as Albert Scheflen put it:
in general, people do not take turns talking and listening to each other. They do not respond only to what someone has just said. Rather they act within broader systems of events to what has been said hours or even months before, to something unsaid, to what might be said, and to matters unrelated to the immediate transaction. (1973, p. 6)
To counteract the assumption of strictly contingent relations between adjacent utterances, Ochs, following Bloom, suggests that speaker's turns be represented side by side in parallel columns. She argues that, "in this way, contingency across speaker's turns is not promoted by the transcript. The assessment of pragmatic and semantic links becomes a self-conscious process" (1979, p. 48). This is the format we have used.
Ochs goes on to discuss how even the format of parallel columns carries a bias of its own, given that we associate left to right ordering with features of "priority, inception, and prominence." She argues, for example, that we tend to view the speech represented in the left-most column as "opening up an interactional sequence" even though, in fact, it may not. She further points out that, when the verbal and the nonverbal are represented side by side, the former is generally placed further to the left, thereby reinforcing the bias that speech is somehow primary and concomitant behavior only secondary. Finally, she points out the difficulty of representing the verbal and the nonverbal as integrated, given the physical constraints that they must be represented at separate points on a sheet of paper. The mere act of representing them side by side suggests a disjunction that does not exist. She illustrates this with respect to their temporal integration, since transcripts ordinarily do not reveal the "inter-occurrence of the verbal and the nonverbal," that is, the complex ways in which the two overlap within the temporal stream.
In reviewing some of the ways in which transcription procedures organize our understanding of speech events, we do not wish to give the impression that, by merely revising them, we can free ourselves of the set of assumptions, expectancies, and beliefs on which they are based, for these underlie, not merely our making of transcripts, but the entire repertoire of symbolic means by which human beings attempt to represent experience.
Once we understood such limitations, we were still faced with the practical question of just what to include in the transcripts. One was the apparently straightforward question of whether to represent words in standard orthography, some modified version of standard orthography, or a phonetic script. It is quite clear that merely transcribing the spoken words in standard orthography is a radical departure from what actually goes on. As generative phonologists such as Chomsky and Halle often point out, standard orthography--with its rules of spelling, word spacing, and punctuation--tends to represent language at a semantic level rather than a phonetic one. For example, the lexical root san- is preserved in sane and sanity, even though the presence of the suffix '-ity' induces a vowel shift from /e/ to /æ/. And could you, given palatalization, is normally pronounced without a distinct word boundary (e.g., /dz/), even though the two words are written separately.
There are of course, multiple ways in which standard orthography can be adapted to represent speech phenomena: A letter may be repeated once or even many times to indicate various degrees of lengthening of a sound (e.g., Meeee!); a letter may be dropped to indicate the omission of a sound (e.g., firs') or even the use of a nonstandard one (e.g., goin'); words may be written together to indicate various blending phenomena involving processes such as devoicing (e.g., hafta) or palatalization (e.g., whatcha). These adaptations have often been used in recent studies (e.g. Sacks, Schegloff, & Jefferson, 1974; Scheflen, 1973; Labov & Fanshel, 1977). As Sacks et al. point out, they opted for such adaptation rather than a phonetic transcription in order "to get as much of the actual sound as possible into their transcripts, while still making them accessible to linguistically unsophisticated readers" (1974, p. 734). They also point out that this modified English spelling carries "derogatory connotations" for certain readers. Certainly such adaptation tends to be associated with "substandard speech," even though it may be designed to represent processes that go on in standard speech. These associations, seemingly unavoidable, have led certain researchers, including us, to avoid such modifications as much as possible. At the same time, few researchers are willing to make a phonetic transcription. In linguistics, such transcription has been used primarily for isolated sounds or words, and few people, linguists included, can read extended stretches of it with any degree of fluency. Moreover, these systems vary widely, according to the goals of transcription: There is no one commonly accepted system comparable to standard orthography. Finally, phonetic transcription involves a massive investment of time, particularly if it is to be done with sufficient accuracy. With our own recordings, we found that more refined transcription was, for the most part, impossible, given that they were made in settings where family members were involved in other activities such as cooking, eating, listening to music, or watching television. The transcribing of such talk can be a rather formidable task, requiring the analysts--often with the aid of the participants--constantly to replay the tape, bit by bit, merely to get the words down.
The question of time investment becomes even more acute when analysts move from transcribing the merely verbal to the paraverbal (i.e., the words uttered to how they were uttered). But at this point other difficulties begin to arise. In the first place, researchers are forced to confront the inherent limitations of the graphic modality that we discussed earlier: Discrete symbols in a static array do not readily represent the continuous variation within the paraverbal. Moreover, methods for analyzing the paraverbal are not particularly well developed. Linguists have developed a number of systems for analyzing pitch, duration, and amplitude. From a traditional perspective, in fact, these phenomena, or at least certain portions of them, have been viewed as verbal rather than paraverbal--that is, they have been used as representing a complex system involving various degrees of stress and multiple numbers of intonational contours, which cannot, in principle, be separated from the morphosyntactic system of language.
In practice, however, these two systems have not been effectively integrated. It has been demonstrated, for example that well-trained linguists, in listening to speech samples, differ substantially in transcribing pitch level. This difference is often accounted for by the fact that pitch functions both verbally (i.e., signalling stress patterns and intonation contours that distinguish statements, commands, etc.) and paraverbally (i.e., signalling the complex array of beliefs, feelings, and attitudes that speakers express in their speech). Given the multifunctional roles of pitch, it is difficult to isolate, in any reliable manner, the purely verbal ones. These difficulties have led some to avoid impressionistic methods of analysis, using instead electronic equipment in order to deal more directly with recorded variation in pitch, amplitude, and duration. Rather than abstracting verbal features of stress and intonation, they work directly with the machine-generated data, labelling it all as paraverbal. Labov and Fanshel, for example, using a variable persistence oscilloscope (which measures variation in amplitude and duration) and a real-time spectrum analyzer (which measures variation in pitch) have developed particularly effective ways of displaying their data through the machine printout. These displays are used as a kind of transcript on which they work directly. The displays, however, are quite selective, focusing only on certain bits of data. Finally, there is little agreement on what is accomplished through pitch, intonation, and stress. As we do with modification in phonetic form, we use these only if they seem directly relevant, and do not attempt to transcribe them fully. As for amplitude, one can only rely analytically on recorded variation in loudness if one is certain that participants were not moving relative to the microphone when changes are recorded. Since we could not check on this, we decided not to base any of our statements on it.
Given their exhaustive attention to such issues, it is interesting that most of those who work with recorded speech give but minimal attention to the temporal aspects. Most analyses do not include an account of the length of the sequences or the length of utterances. At most, the length of gaps in one person's speech is measured, but it is not represented visually in the transcript itself. This has do probably with typographic limitations, given the usual transcript format. The passage of time is something that is quite important in our analysis, and we chose the column format partially because it allows us to display visually the manipulation of time.
Even after analysts have struggled with pitch, amplitude, and duration, there remains a vast area of paraverbal phenomena for which no standardized methods of transcription are available. With our own recordings, we felt this lack most acutely with respect to two phenomena:
1) the wide range of laughter which occurred, varying from a light overlay in speech--which barely deformed normal pronunciation--to sharp outbursts of noise--radically discontinuous with speech itself. We found laughter to be extremely volatile behavior which merited a more refined representation than we could provide. We are quite unhappy with the symbol ***, which merely indicates the presence of laughter but not its modality;
2) the wide range of signals, generally described as voice qualifiers, which are associated impressionistically with tension, release, etc.
We suspect that they are physically available to participants, and that they take them into account in the construction of their responses. But we could not find easy ways to represent them in our transcript and thus rarely use these for analytic purposes.
Elinor Ochs has suggested that a transcript should accord with the "particular interests--the hypotheses to be examined--of the researcher" (1979, p. 44). This, however, could mislead people into making a priori decisions that might close avenues of inquiry. Eventually, the physical world, that which resists us, must remain available, and we must always fear the temptation to substitute another, completely artificial, world that will not resist us anymore. A transcript, from our point of view, should always remain a practical rather than a theoretical task. The only limits one should put on transcripts should be the limits put on us by our social position (the time we can give to transcription, the limitations of publishing technologies, etc.). As these change, so will our transcripts. In the meantime, one can work with what one has. In our case, what we have that remain substantially external to us as producers, is a trace that allows for an investigation of the development of a conversation that focuses on semantic issues (the words actually used), on deictic issues (the means used to make relevant various forms of information), on sequencing issues (the succession of utterances in the conversation, their emergence and disappearance from the semantic stream), and on timing issues (how long the group focuses on the various matters that emerge).
The choices about what to include that are reflected in this version of our work are by no means an ideal that we would suggest others emulate uncritically. Indeed, we did not work them out carefully in advance, and we hope that others, as they embark on similar enterprises, will always understand that it is their prerogative as scientists to explore new techniques. If we were to redo the research, we would obviously do it differently. We might even spend less time worrying about the exact details of behavior in one setting and rather explore some of the many other settings that human beings inevitably construct. We would do that, first, because it would help us understand better what remained inaccessible to us in the original tape, because we could not read certain signs. We would also do it because one of the major findings of this work is that an examination of one local pattern cannot tell us anything about the full range of patterns that form the conditions of every day life. Family life is not one thing--even for one particular family-- it is many things at different times and places.
What we did, thus, does not reflect our theoretical understanding of what must now be done. Rather it reflects the conditions that prevailed while we were spending time making tapes, listening to them, and transcribing them. For various reasons, we decided quite early on to work only with audio-recording. We wished also to work with recordings during which only family members were present. In making this decision we clearly recognized that we would not be able to construct the kind of behavioral stream, compounded of the verbal and the nonverbal, which recent work in conversational analysis is often concerned with. Given our own background in symbolic anthropology and linguistics, it did not seem that we would have to concern ourselves with it. Certainly, neither of us was trained in kinesic analysis. But our interest in the ways language in everyday life produces a cultural context that cross-references other contexts did not seem to make it necessary to move in that direction.
We still think that these choices, however contingent they may appear to be, did give us something rich to work with. The very fact that we do not have a visual record of the interaction, while at times a technical handicap, obliged us to focus on the evocative properties of language. We had to notice, and work with, the evidence that people do not ever make fully explicit everything that is relevant to what they are doing. There is no evidence either that they rely solely on the visual to disambiguate what has emerged. What we discovered, eventually, is that much does not have to disambiguate, as people proceed "as if" it had been. In another words, we rediscovered that reality is socially constructed or, more accurately, that human beings are continually struggling with social realities, some that they find already made, and some that they participate in constructing.
Critics would be right to rejoin at this point that we did not actually "rediscover" what was in fact the guiding theoretical framework of the whole work. What we did, however, is deepen the argument which sociologists, linguists and anthropologists of a structuralist bent have been making by emphasizing how much our approach could highlight what might otherwise remain obscured. This is a rather general, perhaps even abstract, goal. Still, it allowed us not to have to worry about issues of sampling, since our goal was not to contribute to the ethnography of any particular group. As we explain at some length later, we are not proposing the Harveys as illustrative of the ways any labelled group of human beings are: We do not propose them as either "White," "middle class," "urban," "American," or anything else. We propose them as human.
Given these conditions of our production, we obtained a few tapes from a few people. Eventually, we found ourselves working with the Harvey tape, because Connie was most helpful and interested in the project. She was easily available for interviews. She gave invaluable help with the transcript. Soon we had given significant time to her tape, and it was clear that we would not soon exhaust what could be done with it.
Soon after we received the tape we began transcribing and presenting our work and transcript in classes and workshops. It has been noted many times that transcription is an endless task, that one always discover more in a tape than what one had heard until then. We also became aware of the extent to which the audience participates in the construction of a transcript and this is reflected in this chapter of our work. While transcribing one cannot simply take into account what the people on the tape said, one must also be aware of what the people who will listen to the tape or read the transcript will say. It became clear, as we have mentioned before, that it would be a waste of time to try and produce a phonetic transcript. It also became clear that any modification to standard orthography would lead to secondary effects on readers that we decided we would try to control. We did not want to caricature a family that was precisely not caricatural in its ordinariness--as far as accent was concerned, at least. In fact hearers of the tape report that they experience what they hear as "formal," sometimes even overly so. To have used nonstandard orthography would have given the wrong impression.
The last major decision involved which transcript format to choose. We settled on the columnar model advocated by Bloom and Ochs for the same reasons they settled on it. First, it always appeared to us that a linear transcript, when representing the speech of more than two or three speakers, always becomes extremely difficult to read, as one cannot easily figure out who is speaking, particularly when scanning rapidly. Second, horizontal transcripts generally do not transcribe silence, and thus can give the impression that someone who does not speak is not participating. A vertical format makes it easy to do so. Finally, it became very clear to us that the sequential relationships of utterances with each other was always problematic. Rarely did the participants alternate speakership according to expectations one might build using either common sense or the analytic sense developed out of a superficial reading of Sacks et al. Anything that would foster the temptation to search for such a relationship in visually adjacent utterances was thus to be avoided.
Once we had decided that our original decision to use a vertical format was one we would stick with, we had to decide how to organize the columns. Not surprisingly, our first transcript had the five columns arranged in the order (from left to right) "father, mother, children." As our analysis developed we became convinced that there was precisely no reason to follow this order, and we substituted one that is somewhat iconic to the physical situation and to the more obvious features of the social structure of this family: the mother is at the center, the younger children immediately flank her (we place the handicapped child on her left for the same reasons Ochs advises researchers to place children on the left: his utterances are the most problematical and should be represented in such a way that they remain so), while the father and older son are more on the periphery. The father, however, while on the periphery, has many rights of initiation. He has precedence over Jack, whose participation is always fully peripheral and thus ended in the right-most column of our transcript.