3  Survey Research Foundations

Quantitative Social Science – Core Concepts, Skills, and Stories
(draft manuscript)

Author
Affiliation

John McLevey (he/him)

Sociology, Memorial University

 

This open-access book accompanies the quantitative research methods course I teach at Memorial University. It’s under active development and revision. Chapters are in different stages of development, so some may be a little rougher than others. Feedback is welcome!

By the end of this chapter you will be able to:

  • Distinguish between opinion polling and social scientific survey research in scope, purpose, and methodology.
  • Apply the Total Survey Error (TSE) framework to understand and reason about sources of error in survey research.
  • Explain how survey methodology evolved from early quota and voluntary-response polls to modern probability, design, and model-based approaches.
  • Describe the process of translating theoretical concepts into measurable constructs.
  • Navigate survey documentation to understand sampling design, data collection procedures, and ethical considerations.
  • Interpret the structure and design principles of the Canadian Election Study (CES).

The previous chapters taught practical skills for collecting and processing web data. We scraped Wikipedia tables, cleaned messy formats, and built reproducible workflows. These skills generalize to many data sources and will serve you throughout your research career.

In social scientific research, surveys differ fundamentally from web scraping or simple polling. Surveys are designed instruments for systematic measurement, requiring careful attention to sampling, question design, and multiple sources of error. Before working with a large-scale survey dataset, specifically the Canadian Election Study, we need some conceptual foundation.

This chapter establishes those foundations: how surveys differ from polls, what can go wrong (Total Survey Error framework), and how methodological innovations address these challenges. We’ll see how the CES design reflects these foundations, which will guide our quality assessment and analysis in subsequent chapters.

3.1 From Polls to Surveys: A Fundamental Shift

In earlier chapters, we collected and cleaned polling data from the web. Polls are snapshots of public sentiment, taken at a moment in time and often focused on prediction. They are useful for understanding “what” people think in the present.

When surveys are used in the context of social science research, we go much further. Our goals, for example, are not just to describe current attitudes but to explain “why” people think as they do, and “how” those attitudes and behaviors change over time. This difference shapes everything about the design. Polls typically include a handful of questions, often five to fifteen, while major surveys like the Canadian Election Study contain hundreds. Polls measure states; surveys test and build theories. Polls rely on repeated cross-sections to track trends; surveys deploy stratified samples, panel studies, and experimental designs to capture processes across elections or even decades.

The contrast is just as striking in analysis. Polls estimate today’s distribution of opinion. Surveys investigate relationships among variables, test hypotheses, and build models of political attitudes, social identities, and policy preferences.

3.2 The Total Survey Error Framework

No survey is ever error-free. The Total Survey Error (TSE) framework (Groves et al. 2011) provides a structured way of thinking about where errors can creep in, how they differ, and how they can be mitigated.

The framework distinguishes between two broad domains. Representation errors arise when the sample achieved in practice fails to mirror the population the researcher wants to study. These include coverage error, when parts of the population are missing from the sampling frame; sampling error, which reflects the unavoidable variation that comes from studying only a subset of the population; and nonresponse error, when people who decline to participate differ systematically from those who take part.

The second domain concerns measurement error. This occurs when the responses collected fail to reflect the underlying constructs of interest. Poorly worded questions, cultural assumptions, interviewer effects, or mode differences can all introduce distortions.

Finally, the framework recognizes processing and adjustment errors. These happen later in the pipeline-during coding, weighting, or imputation-when researchers attempt to clean or correct the data. Such adjustments can reduce bias, but they can also introduce new distortions if based on weak assumptions.

The central point is straightforward but powerful: all surveys contain error. The task is not to eliminate error entirely (an impossible goal) but to understand it, reduce it where possible, and be transparent about what remains.

3.2.1 Coverage Error

Coverage error occurs when the sampling frame used to draw respondents does not fully cover the target population. The target population is the group you want to generalize about; the frame is the list or mechanism from which you actually select people. If some groups are missing from that frame, they cannot be represented in the survey no matter how large the sample is.

A classic example is a phone survey that relies on landline numbers. Households that rely exclusively on mobile phones are excluded, and if they differ systematically-for example, younger people who are less likely to have landlines-the results will be biased. Similarly, a university climate survey that draws only from enrolled students excludes faculty and staff, even if their views are central to the study.

The most famous case of coverage error is the Literary Digest poll of 1936. The Digest mailed ballots to names drawn from telephone directories, car registrations, and magazine subscriber lists-groups that disproportionately captured wealthier households during the Depression. This skew in the sampling frame, compounded by voluntary response, led to the spectacularly wrong prediction that Alf Landon would defeat Franklin D. Roosevelt.

The lesson is that coverage problems cannot be fixed by sheer size. Ten million invitations or two million responses will not help if the wrong groups are systematically missing.

Coverage in the CES: The 2021 CES used online panels, excluding Canadians without internet access. Statistics Canada estimates approximately 94% of Canadian adults have internet access, meaning about 6% of the target population is missing from the sampling frame. This coverage error is smaller than telephone surveys faced when many households abandoned landlines, but still present. The CES documentation acknowledges this limitation and uses weighting to partially compensate.

3.2.2 Sampling Error

Sampling error reflects the fact that when we study a subset of the population, our estimates will almost never match the population values exactly. Even a perfect frame and a 100% response rate would still leave us with sampling error, because random draws differ.

If you randomly survey 1,000 Canadians, your estimate of support for a political party may come out at 32%. Another random sample of the same size might give 34%. Neither is “wrong”; the difference is due to chance. This is why survey researchers calculate margins of error or confidence intervals: to indicate the expected range of variation given the size and structure of the sample.

One advantage of probability sampling is that sampling error can be quantified. Because every person has a known, non-zero chance of being selected, the mathematics of probability allow researchers to estimate how precise their results are. In Bayesian analysis, the same design information helps structure priors and informs posterior uncertainty. The key point is that sampling error is about chance, not bias. It narrows as sample size increases, but it never disappears entirely.

Sampling in the CES: The 2021 CES recruited approximately 20,000 respondents from online panels. Unlike a simple random sample, the design deliberately oversampled smaller provinces to enable regional analysis. This means someone in Prince Edward Island had a higher probability of selection than someone in Ontario. Survey weights correct for these unequal selection probabilities when estimating population parameters.

3.2.3 Nonresponse Error

Nonresponse error arises when people who are selected for a survey do not participate, and their absence is correlated with the outcomes being studied.

Consider political surveys that rely on phone calls. Younger people are harder to reach and less likely to answer, and those who do respond may differ systematically from their peers who decline. If young respondents are underrepresented, and if youth are also more supportive of certain parties or policies, then the survey results will be biased.

The issue is not limited to age. People with lower levels of trust in institutions are often less willing to take part in surveys about politics, which means that the very voices most relevant to understanding political disengagement may be missing.

High nonresponse rates are not fatal in themselves. If the people who refuse to participate resemble those who do respond, the impact is small. But when nonrespondents differ in unobserved ways, bias is inevitable. Weighting, imputation, and model-based adjustments can help, but they depend on auxiliary data that capture the right sources of variation. Without that, no statistical technique can fully solve the problem.

Nonresponse in the CES: The completion rate for the 2021 CES campaign wave was approximately 35-40% (exact rates vary by panel source). This means 60-65% of invited participants declined. If those who declined differ systematically from participants-perhaps less politically interested, more distrustful of research, or busier with work and family-the results will be biased even after weighting. The CES team conducts extensive nonresponse analysis to understand these patterns.

3.2.4 Measurement Error

Measurement error occurs when there is a gap between the construct the researcher intends to measure and the data actually collected. The source can be as simple as a poorly worded question or as complex as a mismatch between cultural assumptions and survey design.

A question like “Do you support social programs?” is ambiguous, since respondents may think of very different policies-from health care to unemployment insurance to child benefits. Recall problems also matter: asking people to report their income, number of books read, or past voting behavior often produces inaccurate answers. Mode effects play a role as well. People tend to give more socially desirable responses in interviewer-administered surveys than in anonymous online questionnaires.

The danger of measurement error is not only bias but also noise. Responses that are systematically off can distort relationships between variables, while random misunderstandings add variance and make it harder to detect genuine patterns. The best defenses are careful questionnaire design, pretesting, cognitive interviewing, and consistent field protocols.

Measurement in the CES: The left-right ideology question asks respondents to place themselves on a 0-10 scale. But “left” and “right” mean different things to different people-some focus on economic issues like taxation and redistribution, others confidence or trust in government, others immigration, the environment, etc. This conceptual ambiguity introduces measurement error even when respondents answer carefully. The CES addresses this by including multiple ideology measures and specific policy questions.

3.2.5 Processing and Adjustment Errors

A final set of problems arises during the later stages of data handling. Processing errors occur when coding mistakes, data entry issues, or inconsistent variable construction distort the dataset. Adjustment errors emerge when researchers attempt to correct for known biases-through weighting, calibration, or imputation-but do so under questionable assumptions.

These errors are particularly tricky because they often remain hidden. Two researchers may analyze the same dataset differently, applying different weights or constructing variables in slightly different ways, and arrive at subtly different results. Transparency and reproducibility are the best safeguards. Documenting coding choices, sharing weighting procedures, and making analysis scripts public all help keep adjustment errors in check.

3.3 Historical Lessons: 1936 and the Limits of Size

The Literary Digest poll of 1936 remains a touchstone in survey methodology. It demonstrated with painful clarity that coverage error and nonresponse bias can overwhelm sample size. The Digest mailed ten million ballots, received 2.4 million responses, and predicted a decisive Republican win. The reality was an overwhelming victory for Roosevelt.

The reason was not just that the Digest used quotas while Gallup pioneered probability sampling-both methods were crude by today’s standards. The real problem was the skewed frame and the self-selection of respondents. Gallup’s smaller quota sample was far from perfect, but it was less distorted and therefore more accurate.

The story underscores two lessons. First, bigger is not better if bias dominates. Second, non-probability methods are not all equal. Some may produce usable results if biases are limited or well understood; others can be disastrously misleading.

3.4 Probability Sampling

One of the most important innovations in survey methodology was the move to probability sampling. In a probability sample, every eligible unit in the population has a known, non-zero chance of selection. These probabilities do not have to be equal-stratified and clustered designs often involve unequal chances-but they must be explicit and calculable.

In a sampling context such as this, the use of probability sampling means that our respondents are chosen through a known random process, which establishes a transparent mathematics connection between the respondents we actually observe and the population we want to describe. Among other things, this enables us to reason about error and uncertainty.

One (“Frequentist”) way of doing this involves using design-based measures such as confidence intervals and margins of error. For example, suppose we draw a simple random sample of 1,000 Canadians to estimate support for a political party. If 32 percent of the sample reports intending to vote for that party, probability theory enables us to make statements like “32 percent, plus or minus three percentage points, nineteen times out of twenty” rather than just “32 percent.” This is the familiar language of polling margins of error.

The role of probability sampling is equally important in “model-based” or “Bayesian” approaches. Here the design provides structured information about how the sample relates to the population. State-of-the-art hierarchical models, poststratification, and calibration methods all rely on this design information to properly account for multiple sources of uncertainty.

3.5 Beyond the Binary: Non-Probability Samples

It is tempting to draw a sharp line between “good” probability samples and “bad” non-probability samples. In practice, the distinction is much less absolute. Not all non-probability samples are equally flawed. Their value depends on the context, the available auxiliary information, and the adjustments applied.

At the weak end are convenience and voluntary-response surveys. Online opt-in polls where anyone can participate are a clear example. They provide little control over who enters the sample and are prone to severe, opaque biases. These data may be cheap and abundant, but their quality is low.

In the middle are more structured non-probability designs, such as commercial online panels that recruit participants to match demographic quotas. These panels can be combined with calibration or raking techniques to bring the sample into alignment with known population margins (for example, adjusting to census distributions of age, gender, and region). If the sources of bias are limited and observable, these adjustments can make the data usable for certain purposes.

At the strong end are model-based approaches that explicitly treat the non-probability design as part of the inference problem. Methods such as multilevel regression with poststratification (MRP), propensity score weighting, and doubly robust estimators allow researchers to combine non-probability data with auxiliary information to produce credible population estimates. In some cases, a carefully modeled non-probability sample can outperform a poorly executed probability survey.

The key point is that neither label-probability nor non-probability-is a guarantee of quality. What matters are the assumptions being made, the auxiliary data available, and the transparency of the design. A “probability sample” can mislead if coverage errors and nonresponse are severe. A “non-probability sample” can yield valuable insights if analyzed with robust models and good auxiliary data.

3.5.1 MR P in Action

Perhaps the most striking demonstration of the potential of non-probability data comes from the work of Andrew Gelman and colleagues during the 2012 U.S. presidential election. The researchers drew on a highly skewed dataset: surveys of Xbox gamers. This group was overwhelmingly young, male, and conservative-hardly a mirror of the American electorate. On its face, the sample looked hopelessly biased.

The team used two techniques to address the problem. First, they applied poststratification, dividing the population into demographic cells based on census categories such as age, gender, race, and education. Each cell in the Xbox sample was weighted according to its share of the broader population. This ensured that, for example, the overabundance of young men did not overwhelm the estimates.

Second, they used Bayesian multilevel regression. This approach allowed the model to “borrow strength” across groups, stabilizing estimates for smaller or underrepresented categories. If only a few older women appeared in the Xbox data, the model could partially pool information from similar groups while still allowing for meaningful differences.

Incredibly, the adjusted forecasts produced by MRP were highly accurate at both the national and state levels despite the biased sample. The study did not prove that any non-probability sample can be rescued, but it showed that with careful modeling and good auxiliary data, even extremely unrepresentative sources can be turned into useful tools for inference.

This case study demonstrates a broader methodological principle: the quality of inference depends on modeling assumptions and auxiliary data, not just sampling design. A biased sample with rich auxiliary information (census demographics, past election results) can sometimes produce better estimates than a probability sample with limited auxiliary data.

This doesn’t mean probability sampling is obsolete. Rather, contemporary survey methodology recognizes that:

  1. Perfect probability samples are increasingly rare (declining response rates, coverage problems)
  2. Model-based methods can address known biases (if we have good auxiliary data)
  3. Transparency about assumptions matters (document what you’re adjusting for and why)

The CES uses both approaches: probability-based panel recruitment combined with model-based weighting adjustments. This hybrid strategy acknowledges the strengths and limitations of each approach.

This case illustrates the broader theme of contemporary survey methodology. The old binary of “probability versus non-probability” is giving way to a more nuanced understanding. Probability designs remain the gold standard, but model-based approaches provide powerful tools for making the most of the data actually available. In a world of declining response rates and rising costs, this flexibility is not just an intellectual advance, it’s a practical necessity.

3.6 From Concepts to Variables: The Measurement Process

One of the central challenges of survey research is transforming abstract theoretical concepts into measurable variables. This is not a straightforward pipeline but an iterative process that requires careful judgment at every step. The researcher must ensure that what ends up in the dataset is a faithful, if imperfect, representation of the ideas that motivated the study in the first place.

Take the example of political efficacy, the belief that ordinary citizens can understand and influence politics. As a theoretical construct, efficacy is abstract. To measure it, researchers must craft concrete survey items, such as the statement, “People like me don’t have any say about what the government does.” Respondents are asked to evaluate this statement using a set of response options, perhaps a five-point scale ranging from strongly agree to strongly disagree. These responses are then coded into a numeric variable, often something like cps21_efficacy_1, which can be combined with other related items into an overall construct such as a political efficacy index.

Each step in this process introduces decisions that shape the data. A slightly different wording, a different number of response categories, or a different coding scheme could all produce slightly different distributions. These choices matter because they influence what conclusions researchers can credibly draw. The goal is not to create a perfect measure-that is impossible-but to create a measure that is transparent, reliable, and closely linked to the underlying concept.

A second example comes from the study of affective polarization. Here the concept refers to the tendency to feel warmly toward one’s own party while viewing opposing parties with hostility. In the Canadian Election Study (CES), this is measured with feeling thermometers: respondents are asked, “How do you feel about the [Liberal/Conservative/NDP] Party?” and record their answer on a scale from 0 (very cold) to 100 (very warm). The resulting variables-for example, cps21_party_rating_23 for the Liberal Party-can be combined into a construct such as “in-group minus out-group difference.” This captures the extent to which respondents express positive feelings for their preferred party and negative feelings for its rivals.

These examples illustrate a general principle. Concepts become data through a chain of transformations, and each link in that chain involves choices. Good measurement is not accidental; it is designed, tested, and refined to ensure that the final numbers capture as much of the underlying concept as possible without introducing unnecessary noise or bias.

CES Example: Measuring Party Identification

The theoretical concept of “party identification” refers to a psychological attachment to a political party-a social identity that shapes how people interpret political events. The CES operationalizes this concept with the question:

“In federal politics, do you usually think of yourself as a: Liberal, Conservative, NDP, Bloc Québécois, Green, or none of these?”

This becomes variable cps21_fed_id with numeric codes (1=Liberal, 2=Conservative, etc.). The single question doesn’t capture all aspects of party identification (strength of attachment, stability over time, multiple identities), but it provides a measurable indicator of the underlying construct.

Later in the survey, related questions about voting history, party thermometers, and issue positions help validate whether cps21_fed_id captures meaningful variation in party attachment.

3.7 The Science of Asking Questions

If concepts become variables through survey questions, then the quality of the data rests heavily on the quality of those questions. Poorly designed questions are one of the most common sources of measurement error. Contemporary survey methodology emphasizes not only the wording of individual items but also how the design of the entire questionnaire shapes responses.

3.7.1 Evolution of question design

Survey question design has evolved significantly from early approaches that often focused on isolated elements like question wording, toward more comprehensive frameworks that account for the co-occurrence of question characteristics. Research by Schaeffer (2020) highlights how survey design and methodology has moved toward decision-based frameworks that seek to identify question characteristics that influence respondent behavior.

One significant change is the expanded use of fully labeled options for response categories. In the past, it was common to rely on numeric scales or partial labeling, but recent research shows that using fully labeled response categories, like “not at all” to “extremely,” increases reliability across diverse modes of data collection, particularly in self-administered surveys. The use of five response categories has become a standard, as this number strikes a balance between providing enough granularity for nuanced responses without overwhelming the respondent.

Another important shift is that item-specific (IS) response formats that ask respondents about the topic in question directly (e.g., “How satisfied are you with your job?”) are now heavily favored over traditional agree-disagree (AD) scales that pose a statement and ask the respondent whether they agree (e.g. “I am satisfied with my job” - agree or disagree?).

Agree-disagree scales are falling out of favor for several reasons, most importantly their susceptibility to biases like acquiescence (the tendency to agree with statements) and extreme responding. Item-specific formats have been shown to provide more accurate data by reducing these biases. This shift represents an important development in the field, addressing long-standing issues with response quality.

3.7.2 Question characteristics framework

Effective survey questions must balance multiple considerations:

Cognitive Demand: Questions should be understandable without overwhelming respondents. Complex or ambiguous questions increase cognitive burden and reduce response quality.

Comprehension: Respondents must interpret questions as researchers intend. This requires avoiding jargon, cultural assumptions, and ambiguous language.

Retrieval and Judgment: Some questions require recalling information or making evaluative judgments. Understanding these cognitive processes helps design better questions.

Response Formatting: The structure of response options affects how people answer. Modern research shows that item-specific formats reduce bias compared to agree-disagree scales.

3.7.3 Modern challenges and adaptations

The rise of self-administered web and mobile surveys has brought new challenges and innovations in question design. In earlier decades, most surveys were administered face-to-face or via telephone, which allowed interviewers to clarify questions and guide respondents through the survey process. With the shift toward self-administered surveys, researchers have had to design questions that work across diverse modes and platforms, including small mobile screens where question length, complexity, and presentation can significantly impact data quality.

One response to this challenge has been the development of dynamic filtering, which allows follow-up questions to be triggered based on respondents’ initial answers. This technique reduces cognitive load and improves data quality by preventing respondents from having to answer irrelevant questions. Similarly, yes-no checklists are now preferred over traditional check-all-that-apply (CATA) formats, which tend to under-report responses. Yes-no checklists prompt respondents to actively consider each option, leading to more complete and accurate answers.

The shift to mobile-first design has led to a reconsideration of grid formats and question batteries. While early research suggested that grids, which present multiple related questions in a single visual format, could save time and reduce respondent burden, more recent research shows that grids can lead to straightlining, where respondents provide the same answer across all questions to expedite the process. This is particularly problematic in mobile surveys, where the small screen size makes grids harder to navigate. Presenting questions individually has been found to improve data quality, especially in mobile contexts.

A survey question is not just text-it is a decision environment in which respondents must interpret, retrieve information, make a judgment, and select an answer. Designing that environment carefully is essential for valid and reliable measurement.

3.8 Understanding Survey Documentation

Before any analysis can begin, researchers must understand how the survey data were collected. This requires careful attention to survey documentation, which provides the essential context for interpreting results. Documentation is not an optional supplement; it is part of the data itself. Without it, the numbers in a dataset are stripped of meaning.

Good documentation usually includes details about the sampling design-what population was targeted, what frame was used, how participants were selected, and what the response rates were. It also provides the full questionnaire, including exact wording, response categories, and skip patterns that determine which respondents saw which questions. Documentation describes the field procedures: how interviews were conducted, how long the survey took, what incentives were provided, and what steps were taken to ensure quality. It should also cover data processing, such as coding decisions, construction of derived variables, and treatment of missing values. Finally, it explains weighting procedures, which are critical for correcting nonresponse and aligning the sample with population benchmarks.

A researcher who skips over this material risks serious mistakes. For example, analyzing raw frequencies without applying weights can exaggerate or suppress the views of under- or over-represented groups. Misinterpreting a variable code can lead to faulty conclusions about public attitudes. Documentation literacy is therefore just as important as statistical literacy for anyone working with survey data.

3.9 Case Study: The Canadian Election Study

The Canadian Election Study (CES) offers an exemplary model of documentation and design. Conducted around every federal election since 1965, the CES has produced one of the most important longitudinal records of political attitudes and behavior in the world.

The CES was built on the theoretical foundations of the Michigan Model, which emphasizes party identification as a psychological anchor shaping how voters perceive issues and candidates. Over time, the CES has expanded to include spatial models of ideological positioning and insights from political psychology on values, social identities, and cognitive biases.

The current design uses an online panel methodology. A large, representative sample of Canadians is recruited before the election to capture baseline attitudes, demographic characteristics, and vote intentions. After the election, the same respondents are recontacted to measure actual vote choice, reactions to the campaign, and post-election evaluations. This panel structure makes it possible to trace how individual-level attitudes shift over the course of an election and how pre-election opinions translate into real-world behavior.

The CES documentation includes complete questionnaires, detailed explanations of skip patterns, and transparent descriptions of coding procedures. It explains how weights are constructed to adjust for both sampling and nonresponse. For students and researchers, learning to navigate this documentation is a crucial step. It teaches not only how to use the CES responsibly but also how to approach survey data in general: with curiosity about design, respect for complexity, and caution about interpretation.

3.10 Ethics in Survey Research

All survey research rests on a foundation of trust. Respondents share personal information on the understanding that it will be used responsibly and that their privacy will be protected. Maintaining this trust is not just an ethical requirement but a condition for the long-term viability of survey science.

The first principle is informed consent. Participants must know what they are agreeing to, including how their data will be stored, who will have access, and what risks might be involved. Closely related is the principle of confidentiality. Even when datasets are anonymized, combinations of demographic variables can sometimes re-identify individuals, especially in small communities or rare subgroups. Researchers must be vigilant in protecting identities.

The rise of record linkage-combining survey responses with administrative data such as tax records or health files-offers powerful new opportunities for analysis but also magnifies privacy risks. Even when identifiers are stripped, the detailed nature of the data can make re-identification possible. This makes careful review by ethics boards and transparent communication with participants all the more important.

Ethics also extend to the use and interpretation of data. Researchers have an obligation to use survey data only for legitimate scholarly purposes and to respect the conditions under which consent was obtained. They must also avoid overclaiming, particularly when results are based on models that rest on strong assumptions. Honesty about limitations is as important as clarity about findings.

Ethical responsibility is not an add-on to survey methodology. It is woven into every stage, from the moment a question is drafted to the way findings are communicated. Survey science depends on public willingness to participate, and that willingness depends on researchers earning and maintaining trust.

3.11 Looking Forward

This chapter has established the foundations of survey research. We began by distinguishing between opinion polling and academic surveys, showing how their purposes, designs, and analyses differ. We then introduced the Total Survey Error framework, which highlights the many ways error can creep into surveys-from coverage and sampling problems, to nonresponse, to measurement and processing challenges. Historical lessons like the Literary Digest poll of 1936 underscored that bigger samples are not always better, and that the design of a study matters more than its sheer size.

We also explored the role of probability sampling in making inference credible, while recognizing that non-probability samples are not all created equal. With appropriate adjustments-such as poststratification and multilevel regression-even highly biased data sources can yield valuable insights, as demonstrated by the Xbox study of the 2012 U.S. election. This shift reminds us that the old binary of “probability versus non-probability” is giving way to a more nuanced understanding in contemporary research.

Moving from design to measurement, we saw how concepts become variables through the careful crafting of survey items, response categories, and coding schemes. The examples of political efficacy and affective polarization showed how abstract theoretical ideas are transformed into analyzable data. We also considered the science of question design, noting how innovations such as item-specific formats, fully labeled response options, and mobile-first presentation reduce bias and improve data quality.

The chapter then turned to the practical skills needed to work with surveys. Mastery of documentation is just as important as mastery of statistics: without understanding how a survey was designed, weighted, and processed, analysis risks becoming superficial or misleading. The Canadian Election Study provided a case study of how rigorous design, transparent documentation, and theoretical ambition come together in a world-class survey project. Finally, we reflected on some of the ethical responsibilities of survey research, including informed consent, protecting confidentiality, managing the risks of record linkage, and avoiding overclaiming in interpretation.

3.12 Looking Forward: From Theory to Practice

This chapter established the conceptual foundations of survey research: the distinction between polls and academic surveys, the Total Survey Error framework for understanding what can go wrong, methodological innovations like MRP that address modern challenges, and the process of transforming concepts into measurable variables.

We’ve seen how the CES design reflects these principles: deliberate sampling strategies, careful question design, multiple measures of key concepts, and extensive documentation. Now we’re ready to work with the data itself.

The next chapter shifts from theory to practice. We’ll load the CES dataset, understand its structure and naming conventions, create analysis-ready labels, and learn to navigate the documentation that makes responsible analysis possible. The TSE framework will guide our thinking as we encounter real survey data with all its complexity and imperfections.