Coding Qualitative Data
Camila Torres Rivera
You’ve gathered all your data carefully so …now what?
Qualitative researchers often ask the same question at this point in the process…now what? It can be daunting to look over your massive collection of interviews, field notes, and transcripts and not know how to begin to use it to answer your research question. While you have gathered important information, the connections of this information to your research question may not be completely clear. Organizing the raw data in a way that makes sense to you will require you to read and re-read your raw data carefully and patiently. While there are various ways to prepare data for analysis, this chapter will focus on a commonly used method called coding.
What are codes?
A code is a word or short phrase that assigns an attribute (e.g. translation, feeling, category, summary, idea) to a section of text (Saldaña, 2015). Coding is the act of assigning a code to a section of raw data text for interpretive purposes to gain meaning (Charmaz & Mitchell, 2001). The purpose of coding is to filter and organize the raw data so it will be easier to detect patterns or sequences, to identify themes, and to build theories to answer the research question (Bogdan & Biklen, 1997).
Let’s look at a simple coding example for a portion of an interview conducted to learn the value of the open-access music site called SoundCloud. The interview, organized in a table format below, has two key features – the columns and the rows. The right column has the transcript of the interview; the left column has codes aligned with the related text. When you read, you should read across the entire row (i.e. the question and the question codes together) and then read the next row the same way (i.e. the answer and the answer codes together).
The eight unique codes in the right column (accessible was used three times) are attributes that are associated with the 135 words of the text. Rather than just reading the 135 words and hoping to remember what we read, assigning codes now gives us the following advantages:
- The codes left a trail of the thought process during the reading.
- We will be able to find the ideas later
- We will be able to associate related sections of the data to each other
- We know how many times the same attribute was used and which were primary and secondary.
How do ethnographers prepare for coding?
The purpose of qualitative research is to answer a research question. Therefore, before we begin to work with the raw data, we must be certain that we are also focused on this goal. Aside from concentrating on the research question, ethnographers should consider other factors in their coding such as their approach to the research, topics, and efficiency.
Research approach: Deductive vs. Inductive
- Inductive research is more commonly used in qualitative studies (Bogden and Biklen, 1997). The inductive research approach consists of the researcher reading the data and developing theories based on what appears in the raw data rather than any preconceived notions. The researcher will develop codes DURING the reading of the data and connect ideas in their mind through a process called open coding. Open-coding is a reactive and iterative process between the data and the researcher. In other words, open-coding documents the researcher’s reactions to data as the researcher continuously interacts with the data. As an example, imagine the researcher is reading a transcript and something they read strikes them as familiar, or unusual, or obvious, the data then causes a natural, spontaneous reaction within the researcher. In this case, the researcher would create a code to mark that section of text for future reference and/or analysis.
- In some circumstances, a researcher may want to know if certain predetermined ideas exist in the data. Sometimes a researcher will already have some experience with a topic and they want to further their understanding on this topic. When the researcher wants to use the raw data to prove their ideas or hypothesis, they engage in deductive research. The researcher will create a code BEFORE they start the coding process and search the data for examples of the code. In this case, the predetermined code is referred to as a priori code. In previous chapters, we discussed how ethical research requires value neutrality. While deductive research and priori codes are not automatically introducing bias into the coding, the researcher should be especially careful that they only code a section of text with a priori code when there is a very clear connection between the priori code and text. Using both of these methods can help lessen potential biases.
Coding Families
Some codes are used so often in qualitative data analysis that it is almost expected that they will be mentioned at some point in the analysis regardless of the subject matter. Coding families are not codes – rather, they are categories that suggest different ways in which coding can be accomplished or may be necessary (Glaser, 1978; Bogden & Biklen, 1997). Because coding families describe large, overarching concepts (i.e. definitions, settings, language structure, etc.), there are usually significant overlaps between coding families. As always, the selection of a coding family should be determined by how the coding family lends itself to answering the research question.
Using the same SoundCloud interview raw data we used in earlier in this chapter, let’s look at how different coding families relate to the same text.
Notice that the same text could be used as an example for several of coding families. However, you should use the research question as a guide when selecting some coding families over others. For example, if the research question asks, “How do musicians interact with technology?”, we may decide that the word “interact” in the research question relates most to the “activity code” family. However, if the research question asks, “What problems do musicians face in the Music Industry?”, we may decide that the word “problem” in the question relates to “the definition of situation code” family and/or “the perspective codes” family. There may be times in your research where you will have to focus more on one type of code over others, although you should strive to use all of them.
Here is a list of some commonly used code families for your work. This is not a complete list of code families, but it can give you some ideas on how to begin your coding work.
Selecting Coding Supplies: Paper/Pencil vs. Technology
Ultimately, the researcher will need to physically mark their ideas on the selected text. Deciding whether this should be done using traditional office supplies (e.g. pencils, pens, highlighters, sticky-notes, paper clips, envelopes) or computer programs (e.g. Microsoft Programs like Word or Excel, Google programs like Docs or Sheets, Qualitative Data software) is a personal decision for the researcher. Both types of supplies have their pros and cons so the selection of the coding supply should be made based on efficiency and cost.
Traditional office supplies (e.g. pencils, pens, highlighters, colored pencils, sticky-notes, scissors, paper clips, envelopes, etc.) are inexpensive, are readily available, and require no training. When researchers use office supplies to code, they simply write codes in the margins of the pages of the data, highlight sections of code using colored highlighters or colored pencils, and/or mark pages with sticky-notes. Sometimes, researchers they will assign colors to codes so they can find the related coded text more easily. Sections of text with the same code (or color) can be cut out with scissors and organized into groups with paper clips or envelopes.
There are some drawbacks to consider as well. With traditional office supplies, accounting for all the codes and data manually can become messy and cumbersome. Also, there may be sections of raw data that can be coded with more than one code so those pages would need to be reproduced and coded more than once (once for each code). Finally, there is only one copy of the coded text so if the data is lost the researcher would have to start over.
Using technology to code has advantages and disadvantages too. An advantage to using word processing office technology (e.g. Microsoft Word, Google Docs) is that the data can be duplicated and saved quickly. Sections of text can be coded once, or more times, by using the highlighter or comment features. If sections of text have the same code, they can be cut/pasted into new documents to help organize all the codes into different pages or several different files. Depending on the researcher’s familiarity and comfort level, spreadsheet program (e.g. Microsoft Word, Google Sheets) functionality, can be combined to automatically create tables with the duplicated comments and codes. Of course, the premier qualitative research programs (e.g. Qualtrix, Quirkos, MAXQDA) offer the greatest functionality by allowing researchers to do things like click-and-drag sections of data into code folders, keep running counts of codes, and create reminders for the researcher.
Technology is a wonderful tool, but drawbacks to using technology should be considered carefully too. Obviously, the researcher will need access to a computer, electricity and, possibly the internet, to be able to conduct any work. While some programs are free (e.g. Google Docs, Google Sheets), some programs can cost hundreds of dollars (e.g. Microsoft Office, Qualtrix, Quirkos, MAXQDA). Also, some programs are fairly intuitive and easy to use (e.g. MS Word, Google Docs), but some programs are more complex (e.g. MS Excel, Google Sheets) and specialized programs require training (e.g. Qualtrix, Quirkos, MAXQDA).
The Coding Cycle
After we have made some decisions on how to approach the coding, we can begin the coding cycle. The coding cycle is a repeating cycle of three phases: the coding phase, writing memos phase, and reviewing/revising/refining codes phase.
In the coding phase, the researcher will start to assign codes to the text based on their research question and their other preparation decisions (i.e. inductive vs. deductive research, coding families, efficiency). After a researcher codes a section of text, the researcher writes a memo—a short journal entry to document processes, important research notes, and possible theories. After writing their memos, the researcher will review their codes, both old and new, and decide if there should be any revisions to the codes.
As an example of this entire process, and using the SoundCloud interview, we used previously in this chapter as an example, we can look at each phase of the cycle in more detail. As our preparation, we will be using the data to answer the question “How are Technology and Equity related within the Music Industry?” We will use a deductive research approach and apply open-coding processes with no priori codes. We have selected the following coding families as a guide: Settings Codes, Definition of Situations Codes, Perspectives Held By Subjects Codes, Activity Codes, Strategy Codes, and Relationship Codes. The codes will be documented using Microsoft Word’s comment feature.
Coding Phase
As previously described, the interview text shown here was assigned codes using the comment feature on Microsoft Word during the reading of the text. Since some text applied to more than one code, the same section of text was highlighted several times and assigned a different code each time. At the end of the coding period, the text looked like this:
Memo Writing Phase
When the researcher stops coding, they must document their experience in a memo. The memos are short journal entries that document the researcher’s interaction with the data. Since the memos are written for the researcher’s personal use, formal writing (e.g. full sentences, punctuation, paragraphs) are not required. Instead, the researcher will use simple bullets or short phrases to record the experiences quickly.
In our example, we used the benefits of technology to convert the highlighted text with the comments into a table and sort the codes alphabetically. Most of the codes seemed to be within the Relationship Code Family (i.e. creator, listener, fan, community). A note was made that the researcher wondered if a larger community is related to more opportunities which could result in more equity.
Reviewing/Revising/Refining Code Phase
The number of times a code appears does not necessarily imply the code is (or is not) important and it is certainly not the only indicator. In qualitative research, the quality of the final research report will be directly linked to the quality of the codes (Saldaña, 2003). While reviewing, revising, and/or refining the codes during the coding cycle, the researcher is making decisions on the importance a particular code may have to their research question. After the researcher reviews their codes carefully, they may decide to refine their codes by merging two (or more) codes into one code, dividing one code into two (or more) codes, renaming a code, or eliminating a code completely. When a researcher decides to merge, unmerge, rename codes, and/or eliminate codes, a note should be added to the memos. In our example, we added a note to the memo that we decided to merge both the “listener/fan” codes and the “solution/simplify” codes with instructions to only use “Listener” and “Solution” in future coding sessions.
Another good practice in the review/revise/refine phase is to begin to conceptualize the codes. The conceptualization of a code is creating a definition of the code in specific, concrete terms (Saunders et al, 2018). Future use of a conceptualized code is more intentional because the characteristics of the code are clear and, thus, codes are more easily identified in raw data. Any code conceptualizations or working definitions should always be written in the memo. In our example, code definitions were written in italics next to the most frequently used codes.
At this point, the researcher has completed the full cycle one time. The researcher will start the cycle again and continue the cycle with new sections of text. Each time a new section of text is coded, the researcher will follow the same three phases – coding, writing a memo to document the progress, and review/revise/refine their codes.
Saturation
The coding cycle continues over and over until the researcher believes saturation has occurred. Saturation is the point in the research study where no new information seems to emerge from the coding cycle and there appears to be enough information to answer the research question (Strauss & Corbin, 1997). There is no one event or threshold that can alert a researcher when saturation has occurred. When interviews seem predictable, the same codes are being used over and over, and possible answers to the research question seem reasonable, the researcher should decide if further iterations of the cycle will result in any valuable insights or the coding cycle should end.
Chapter Summary
In this chapter, we learned that the coding cycle is the link between raw data and the theories that will answer the research question. We engaged in a mock coding cycle example to experience how researcher begins to organize their raw data, record their initial reactions to the data, and documents emergent ideas. The key take-aways from the chapter are:
- Coding is the organizational process where researchers assign codes to raw qualitative data in order to answer their research question.
- Coding is a reactive and iterative process between the data and the researcher.
- The research question is the principal guide in preparing to code and selecting codes.
- Coding can be used for either deductive research or inductive research studies. However, it is more commonly used for inductive research studies.
- The coding cycle is comprised of coding, memo writing, and review/revising/refining codes.
The coding cycle ends when no new information appears to emerge from the raw data. This phenomenon is referred to as saturation.
Questions
- How is coding data different than summarizing data?
- How are the research question and the coding process related?
- “Coding is reactive and iterative.” In your own words, explain the meaning of this statement.
- What are the advantages of writing memos at the end of each coding phase?
- What are some clues that may suggest saturation has occurred? Give an example.
In your opinion, how does coding help the researcher understand their raw data?
Key Terms
Code – A word or short phrase that assigns an attribute (e.g. translation, feeling, category, summary, idea) to a section of text.
Coding – The act of assigning a code to a section of raw data text for interpretive purposes to gain meaning.
Coding Cycle – A repeating cycle in which the researcher organizes their data. The coding cycle is comprised of three phases: the coding phase, writing memos phase, and reviewing/revising/refining codes phase
Coding Families – General categories that suggest different ways in which coding can be accomplished.
Conceptualization – Descriptions or definitions of abstract ideas is specific, concrete terms. Deductive Research – A research approach conducted to further an already existing theory. (The data is used to prove an existing theory).
Inductive Research – A research approach conducted to develop a theory based on what exists in the data. (The theory emerges from the data).
Memo – A short journal entry to document processes, important research notes, and possible theories.
Open-Coding – The assigning of codes to to the raw data without any previously selected codes.
Priori code – A code selected before the start of the coding cycle.
Saturation – The point in the research study where no new information seems to emerge from the coding cycle and there appears to be enough information to answer the research question.
References
Bogdan, R., & Biklen, S. K. (1997). Qualitative research for education. Boston, MA: Allyn & Bacon.
Charmaz, K., & Mitchell, R. G. (2014). Grounded theory in ethnography. In Atkinson, P., Coffey, A., Delamont, S., Lofland, J., & Lofland, L (Eds.), Handbook of ethnography, (160 -174). Thousand Oaks, CA: Sage Publications.
Soundcloud. (2010, August 19). Creative Commons. https://creativecommons.org/2010/08/19/soundcloud/
Creswell, J. W., & Poth, C. N. (2016). Qualitative inquiry and research design: Choosing among five approaches. Thousand Oaks, CA: Sage Publications.
Glaser, B. G., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine Publishing Co.
Saldaña, J. (2015). The coding manual for qualitative researchers. Thousand Oaks, CA: Sage Publications.
Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Jinks, C. (2018). Saturation in qualitative research: exploring its conceptualization and operationalization. Quality & quantity, 52(4), 1893–1907.
Strauss, A., & Corbin, J. M. (1997). Grounded theory in practice. Thousand Oaks, CA: Sage Publications.