Introduction

Quantitative Social Science – Core Concepts, Skills, and Stories
(draft manuscript)

Author
Affiliation

John McLevey (he/him)

Sociology, Memorial University

 

This open-access book accompanies the quantitative research methods course I teach at Memorial University. It’s under active development and revision. Chapters are in different stages of development, so some may be a little rougher than others. Feedback is welcome!

By the end of this introductory chapter, you will be able to:

  • Explain Alexander’s (2023) “telling stories with data” framework and how it applies to quantitative research
  • Describe the five components of the storytelling workflow (Plan → Simulate → Acquire → Explore → Share) and how they relate to quantitative data analysis
  • Set up and navigate Google Colab for research computing
  • Write and execute basic Python code using variables, strings, lists, and functions
  • Create well-organized research notebooks that combine code, analysis, and documentation

Before diving into specific research methods and techniques, we need to establish two foundations: a conceptual framework for thinking about data analysis as a systematic process, and the practical tools we’ll use to implement that process. This introduction provides both.

First, we’ll explore how quantitative research is fundamentally about storytelling. Not fiction, but the structured, transparent communication of how we move from questions to evidence to conclusions. This perspective, developed by Rohan Alexander, provides a workflow that will guide every analysis in this course.

Second, we’ll set up our research computing environment and learn essential Python programming concepts. You don’t need prior programming experience; we’ll start with basics and build skills progressively through real projects.

By combining a clear analytical framework with hands-on technical skills, you’ll be prepared to tackle meaningful research questions about public opinion, political behavior, and social patterns. The goal isn’t just to learn methods, but to understand how those methods help us tell credible, compelling stories with data.

Note for Sociology/Criminology 3040 Students

Your assigned introductory reading is Chapter 1 of Alexander (2023), available free here. In Telling Stories with Data, Rohan Alexander (2023) proposes a simple cycle for doing credible, useful data work: plan, simulate, acquire, explore/analyze, share. Below is a overview of that workflow, which we’ll reuse throughout the course. It’s followed by a guided introduction to Google Colab, where we’ll run our first bit of Python code together.

Telling Stories with Data

Alexander (2023)

Alexander (2023)

Alexander (2023) frames data‐analysis and statistics not as technical exercises but as storytelling: telling something convincing, communicating with others about what a dataset reveals (and hides). The introductory chapter establishes foundations: what it means to make the world into data; what ethical, measurement, and communicative constraints attend doing so; and how the entire process of turning raw observations into communicated insight involves many decisions.

He then introduces a five‐step workflow for telling stories with data:

  1. Plan and sketch an endpoint
  2. Simulate and consider that simulated data
  3. Acquire and prepare the actual data
  4. Explore and understand the actual data
  5. Share what was done and what was found

Additionally, he sets out a set of foundational elements (ethics, reproducibility, etc.), and underscores that communication is central; clarity always beats complexity if the complex analysis is not understandable or persuasive.

Alexander emphasizes that datasets are simplifications of the messy, complex world; measurement, collection, cleaning, modeling decisions all matter. Knowing what’s missing, what’s poorly measured, where bias can creep in, etc., is part of being able to tell a credible story. But what does he mean by “stories” in this context, and how is data analysis like storytelling?

When Alexander talks about “stories” in this context, he means structured, reasoned account, not fiction or narrative in the sense of novel or myth. Stories here help an audience understand what can and can’t be said with the data, why those findings matter, and how they were obtained.

In this sense, “stories with data” are best understood as arguments or explanations that are grounded in empirical evidence but structured in ways that resemble the elements of a traditional story. Every analysis begins with a setting, which establishes the dataset itself, including where it came from, how it was collected, and what broader phenomenon it represents. Within this setting, there are characters or agents: the people, institutions, and processes that generate the data, as well as those who are represented in it or excluded from it.

The analysis then unfolds like a plot, moving through a sequence of steps in which questions are posed, decisions are made, and discoveries emerge, often accompanied by surprises or tensions. Inevitably, there is conflict or uncertainty, expressed through measurement error, missing values, bias, and the trade-offs analysts must make while cleaning and modeling data. Finally, every data story must reach some form of resolution, in which findings are presented, claims are carefully bounded, and the limits of what the data can support are made clear.

In this sense, “telling stories with data” means constructing an arc that moves from raw data to insight, being explicit about the choices and reasoning that shape the analysis, and communicating in a way that allows an audience to follow the process and trust the outcome.

How Quantitative Data Analysis is Like Storytelling

Alexander draws several analogies between data analysis and storytelling:

Structure and Purpose: Just as stories have a structure (setup, conflict, resolution), a data project has structure: aim/question, method, results, interpretation, limits. He stresses starting with a plan (sketching the endpoint) so the analysis does not wander without purpose.

Audience & Persuasion: A story is designed for an audience. Similarly, data‐analysis must consider who the audience is, what they know, what they will find credible, what they need explained. Persuasiveness matters! it’s not enough that you discover something; it has to be credible and you have to be able to convince others.

Decisions and Tension: Stories often have surprises, choices, trade‐offs. So do data analyses. Deciding what to measure, how to clean data, which model to use, how to interpret results-all involve trade-offs. The “messiness” of the world comes in: missing data, measurement error, ethical constraints. These are the tension/conflict in the story that make it interesting and trustworthy.

Clarity / Narrative Arc: Just as good stories take the audience through stages in a way that they can follow and be engaged, good data‐stories guide the audience-show what was known, what questions were asked, what methods used, what was found, what is uncertain. Without clarity (narrative arc), the audience may get lost or distrust the claims.

Ethics, Context, and Who is Represented: Good storytelling is not just about entertaining, but about responsibility. Whose perspective is told or excluded? What is left out? Data stories require the same: understanding what the data omit, which voices or groups are missing; what assumptions underlie measurement; being transparent about limitations.

Alexander is not saying that data analysis is literally writing fiction, but that many of the same principles (structure, audience, tension, clarity, ethical responsibility) are essential if your work is to be persuasive, credible, and useful. To that end, he proposes an iterative storytelling workflow, summarized in Figure 1, with five core components, and also identifies a set of foundational elements that support the workflow.

Figure 1: Alexander (2023) proposes a workflow for “telling stories with data” with these five components. Despite the presentation, this workflow is iterative, not linear.

We begin with a plan. Planning doesn’t mean knowing everything in advance; it means sketching the question, the likely form of evidence, and a rough idea of what a useful “finished” output could look like. Start by thinking about what you want the final story or endpoint to be. What questions do you want to answer? What claims might you hope to make?

Planning can be minimal and your endpoint may change. No problem! Having an initial target will still help you make better decisions along the way and will help you avoid scope creep.

Before we tangle with messy real-world data, we often simulate small, made-up datasets that approximate what we expect in terms of data types, distributions, structure. Simulation lets us “rehearse” our approach: does our code work the way we think it does? Do our models behave sensibly on data where we know the truth? Practicing on tiny, controlled examples builds intuition and helps us catch issues early, when they’re cheap to fix.

Acquiring data is where most of the hard choices and work live. Sometimes we don’t have enough observations and sometimes we have far too many. Those observations may or may not focus on things we are about. Data may be in different formats, have ambiguous variables, or plenty of missing values. The decisions you make here (what to include, how to clean, how to define variables) shape everything downstream.

With data in hand, we explore and analyze. Start with straightforward summaries and visualizations (“Exploratory Data Analysis (EDA)”). Then, when it helps, bring in statistical models. Models are powerful tools, not truth machines. They organize our thinking, but they always rest on our choices about measurement and preparation (McElreath 2018). Good analysis is iterative: look, think, model, check, revise.

Finally, we share. Clear communication is part of the research itself, not an optional extra. Document what you did and why, present what you found, and be open about limitations as well as strengths. Use graphs, tables, and narrative, but also your code, data (where appropriate), and so on so others can see exactly what you did. Be transparent about who is included and who excluded, about measurement issues, and about limitations and biases in your approach. Transparency builds trust and makes your work useful to others.

To help see how these pieces fit together, Table 1 maps storytelling elements to components of data analysis. In short, when Alexander says data analysis is like storytelling, he means that it involves constructing a narrative through data: not simply reporting numbers, but setting up a problem, exploring it, confronting difficulties, and arriving at conclusions. It’s transparently showing how you got there so that readers can follow, be persuaded, and understand the strengths and limitations. Telling good stories with data is hard and rewarding. You’ll make mistakes, because that’s part of learning. Aim for openness and steady improvement, not perfection.

Table 1: Understanding Alexander’s analogy by mapping storytelling elements to data analysis components
Storytelling Element Analogue in Data Analysis / Data Story
Setting / Background The data source, how data were collected; why the dataset exists; the measurement etc.
Characters / Agents The processes and people who produce the data; those represented in data; those left out.
Plot / Sequence The workflow steps: question → measurement → cleaning → modeling → communicating. The sequence of decisions and discoveries made during analysis.
Conflict / Tension / Surprise Data defects, bias, missing parts; unexpected patterns or anomalies; trade-offs; uncertainty.
Resolutions / Conclusions What findings can be asserted; what claims are supported; what remains uncertain.
Audience & Purpose Who the story is for; what claims or changes you hope to motivate; what understanding you hope to impart.

In short, Alexander’s point is not that analysis is fiction, but that it is narrative: a structured, ethically informed, and persuasive account of how we move from messy, incomplete observations to provisional insights. Telling stories with data requires rigor and transparency, but also clarity and creativity.

Mapping the Framework to Technical Skills

Alexander’s five-step workflow translates directly to programming tasks:

  • Plan → Sketch data schemas in code, define expected structures
  • Simulate → Generate test datasets with known properties
  • Acquire → Web scraping, API calls, file loading
  • Explore → Data manipulation, visualization, summary statistics
  • Share → Reproducible notebooks that document every decision

Programming isn’t separate from research-it’s how we implement rigorous, transparent analysis.

Chapter Primary Framework Step Key Skills
1: Web Scraping Acquire Requesting pages, parsing HTML, extracting tables
2: Data Processing Acquire → Explore Cleaning data, handling missing values, reshaping
3: Survey Foundations Plan Understanding TSE, sampling, measurement
4: Loading Survey Data Acquire Working with complex datasets, documentation
5: Assessing Quality Acquire → Explore Filtering, outlier detection, validation
6: Exploration Explore Visualization, relationships, weighted analysis

Getting Started with Python and Google Colab

Research Computing

To do research with data, we need a place to work. In this course, it’s Google Colab. Think of Colab as a free, online workbench: you open it in your browser, and it gives you a ready-to-use Python environment without any installation headaches. Because Colab runs in the cloud, everyone gets the same setup, and your work saves to your Google Drive.

What is Research Computing?

Research computing involves using specialized computer software to support systematic primary research, for example collecting data, cleaning and organizing data, simulating theories, fitting models, and communicating results. It blends technical skills (like programming and data management) with substantive and theoretical knowledge from fields like sociology and criminology to answer meaningful research questions and deepen our understanding of the world in some way.

In this course, we’ll use Python as our main tool for research computing. We’ll run our Python code in Google Colab, a free, cloud-based platform that lets you write and execute Python in your web browser without needing to install anything on your computer. It’s like Google Docs, but for coding and data analysis.

Open a new Colab notebook at colab.research.google.com. Colab autosaves to your Google Drive automatically.

A notebook is made of cells. Some cells are for code and contain Python code you can run. Others are for text (explanations, notes, section headings). This mix is what makes notebooks so useful in research: your reasoning and your analysis live side by side. When you (or anyone else) return later, the logic is visible.

Notebooks blend code and explanation in one document. This isn’t just convenient, it’s essential for credible research. When your analysis lives alongside your reasoning, others (including future you) can trace every decision from raw data to conclusion. This transparency is what transforms code from a personal tool into scientific communication.

Text and Code in Colab

Add a text cell with + Text and type a heading. You can format text with Markdown: # creates a big header, **bold** makes text bold, [link](url) creates links.

Now add a code cell with + Code. Type:

print("Hello and welcome to Sociology/Criminology 3040: Quantitative Research Methods!")

Press Shift + Enter. You should see the message printed just below the cell. That’s the full loop: write code → run it → inspect the output.

If you ever see an error, don’t panic. You can read it slowly to figure out exactly where things went wrong, but it’s also okay to ask Gemini (since we’re using Colab) or another GenAI for help in understanding the error message (or “Traceback.”). You can and should do this in a way that helps you learn, not just mindlessly copy-paste. Ask the AI for clear and simple explanations tailored to your current level of understanding, and then try to understand and apply the answer.

Run code cells with the play button or Shift + Enter (runs cell and moves to next). If a cell takes time, you’ll see a spinner.

Essential Python Concepts for Data Work

Every value in Python has a type. Three types are fundamental:

  • Integers are whole numbers, like 42
  • Floats are numbers with decimals, like 42.0
  • Strings are text, wrapped in quotes, like "Quinn and Nora are obsessed with cats!"

You can use either single quotes ' ' or double quotes " " for strings.

Mathematical Expressions

Python can act like a calculator. An expression combines values with operators:

2 + 2
2 * 9
10 / 2
2 + 9 * 7  # follows normal order of operations

If you want to control the order, use parentheses: (2 + 9) * 7.

String Operations

The same operators can mean different things with strings. With text, + means concatenate (join):

print('Data analysis' + ' can be exciting!)')

If you mix numbers and strings, you’ll need to convert the number to text:

# str( ) converts a number to a string
str(42) + ' is the answer'

You can also “multiply” a string by an integer to repeat it:

'Sociology/Criminology ' * 3

Variables and Assignment

A variable is a name you give to a value so you can refer to it later, like putting something in a labeled container. The assignment operator = stores a value in that name:

a_number = 16
print(a_number)
a_number * a_number

Choose variable names that describe what they hold. last_name is better than ln. It makes your code easier to read, especially when you return to it next week or share it with a classmate.

Here’s a small example that builds a sentence from pieces:

city = 'Cologne'
country = 'Germany'
sentence = city + ' is the fourth-most populous city in ' + country + '.'
print(sentence)

Good Habits for Credible, Reproducible Research

Mix code and explanation in notebooks. Use clear names for files and variables. Test your work with Runtime → Restart and run all to ensure everything runs from a clean start.

Saving and Downloading Your Work

Colab autosaves to your Google Drive, so you won’t lose work if your browser closes. You can also share a notebook with a link, just like a Google Doc. You can also download your notebook in different formats via File → Download. The .ipynb format is the standard Jupyter notebook file, which preserves both code and output. The .py format is a plain Python script that contains only the code.

Note for Sociology/Criminology 3040 Students

When you need to submit something to Brightspace, download it via File → Download as a .ipynb (Jupyter notebook) and a .py (plain Python script). Submit both files together in the Brightspace assignment dropbox.

What’s Ahead

This introduction established both conceptual foundations (the storytelling framework) and technical foundations (Python basics, Colab environment). The next chapter immediately applies these skills to a real research task: collecting polling data from the web.