Preparing Data for Analysis

How should you analyze your data? First you need to prepare it.

Author

Eleanor C Sayre

Abstract

How will you analyze your data? “Methodology” covers both the actions you will take (like transcribing or performing statistical tests) and the reasons why those actions are meaningful. Broadly speaking, you want to work to prepare your data for analysis, then analyze your data to develop a narrative about what it tells you. Different data types conceptualize data preparation differently, and different methodological frameworks conceptualize the steps of how you develop your narrative differently. However, there are some broad similarities across many methodologies in education research.

Data preparation and analysis is not linear

It’s tempting to think of these steps as primarily linear, but instead you should conceptualize the work of preparing data and developing a narrative as generative and iterative, just like the rest of the work of your research project.

Prepare your data

When you prepare your data, you convert your raw observations of what happened into a form that you can perform meaningful analyses on. Different data types require different ways to enrich, reduce, and clean your data. The details matter, but the principles are the same.

Enrich your data with metadata

When you enrich your data, you are adding information about what happened and when so that later you know what’s going on. If you have multiple data streams from the same participant or the same event, you can link them together using metadata so that later you can triangulate your narrative across multiple streams of information.

Common kinds of metadata include:

  • When was this collected? Date, time, semester, etc
  • What task were the participants working on? Linking an assignment sheet or interview protocol is appropriate.
  • What is the context of this data? In class (what class?), before or after an event (what event?), etc
  • Who are the participants? Who collected the data?
  • Additional information about the participants, like demographics or pseudonyms.

You should strive to enrich your data as soon as possible after you collect it, so that you don’t forget what’s going on. Don’t wait until you’re ready to analyze it!

Plan ahead

When you design your research project, plan ahead for what kinds of metadata you will want to collect, and make sure they are covered under your IRB approval.

Reduce and clean your data

When you reduce and clean your data, you are removing information and reformatting it so that you can perform analyses on it. Unlike enriching, which is an additive process, this is a subtractive one. For this reason, some people like to keep the raw data as untouched as possible, just in case there are errors introduced in reduction; however, some kinds of data are prohibitively costly or faintly ridiculous to maintain in raw form (such as paper surveys from students). Some research projects may require that you delete the original data in order to preserve the identities of the participants; check with your IRB.

Common ways to reduce and clean data include

  • Removing testing or excess “data” (e.g. survey responses to check that your system worked, the last 10 seconds of video as you fumbled to turn off the camera)
  • Identifying typos or nonsensical outliers (e.g. “H” responses to “ABCDE” questions)
  • Reformatting to conform to a standard data format (e.g. converting “M”, “m”, and “male” to “M”)
  • Putting it into a spreadsheet instead of individual files
  • Transcribing video or audio

In the process of cleaning and reducing, you should record what you did and how it affected your data. Afterwards, you need to check that your reduction process has not introduced new errors.

The data cleaning and reduction process can be boring. If you are mentoring students in research, you should strive to offload this work as much as possible onto other workers – don’t waste valuable researcher hours on washing the glassware!

Develop a narrative

Your data tell you something (possibly many things). As you prepare for analysis, you want to develop a narrative about what’s happening in your data. Your narrative is a story, probably told in words, about what’s meaningful in your data and how it connects to your research questions via your theory. The narrative ties together your research design and results.

Different methodologies are different

Different methodologies differ greatly on how you develop your narrative and what narratives are appropriate. You need to engage with the literature from your chosen methodology to make sure that your choices are aligned with the intent and steps of that methodology, otherwise your conclusions might not be valid within that methodology. If the literature is vague or confusing, it is appropriate to reach out to human experts for guidance, possibly by engaging them on your advisory board.

Broadly speaking, you’re looking at four basic steps in developing a narrative:

  1. Make sense of your data (what is going on here?)
  2. Develop a preliminary narrative that your data tells you
  3. Check your narrative for consistency and contrastive evidence
  4. Represent your data to make your narrative shine

Throughout this process, your generative writing will help you record what you are doing and figure out what your emerging narrative is. As you go through iterative cycles of data collection and analysis, your work on each cycle should inform what you’re doing next. Generative writing, coupled to iterative design, makes space for emergence .

Emergence is the process of something coming into being or becoming important. In research, new ideas and opportunities will emerge in the course doing of your research project.
Back to top