Polling Fundamentals Sections

This tutorial offers a glimpse into the fundamentals of public opinion polling. Designed for the novice, Polling Fundamentals provides definitions, examples, and explanations that serve as an introduction to the field of public opinion research.
Glossary of terminology for polling

Glossary of Terms

Aggregate
A group of persons (or any other units of analysis) that have some characteristics in common without necessarily having any other connection to each other. For example, Teachers in the U.S. , Internet-users, or Voters.
Archive Catalog
The Center’s online catalog of studies. While no opinion data are directly available through this service, researchers interested in secondary analysis of survey data can determine studies relevant to their areas of interest.
Beginning and Ending Dates
The dates of interviewing.
Bivariate Analysis
The analysis of the relationship between an independent and dependent variable. (e.g. a cross tabulation showing presidential approval for men and women separately).
Boolean
Used to search the Roper database, Boolean logic uses the terms “and,” “or,” and “not” as connectors between keywords or phrases to narrow the results of keyword searches.
Census
A census is often similar to a survey, with the difference that the census collects data from all members of the population while the survey is limited to a sample.
 
Codebook
A list of the variables and how they have been coded in a survey. Every polling organization has its own methods for coding, so every codebook will look a little different.
Coding (Responses)
The process of translating data (respondents’ answers to questions) into a format that can be read and manipulated by a computer and later analyzed by the survey researcher. For example, a female respondent answers the following question:
 
Are you: a) Democrat b) Republican c) Independent d) Don’t Know
 
When this question is coded later, it may look like this:
 
Sex: 1=Male 2=Female
Party: 1=Democrat 2= Republican 3= Independent 8=Don’t Know
9=No Answer/Refused, and coded accordingly.
Continuous Variable
A continuous variable is a variable that can be expressed by an infinite number of measures. For survey purposes, they are usually measured on an interval or ratio scale. (i.e. time, speed, weight – since these may be broken down into an infinite number of smaller parts.)
  
Cross-Tabulation
A table which shows the influence of an independent variable (located in the column) on a dependent variable (located in the row.)(e.g. a graph showing how income influences the likelihood of voting for a certain candidate). Instantly generate cross-tabular analyses using RoperExplorer’s simple point-and-click technology.
Data Mining
This is a process where the researcher searches or “digs” through data-bases for information that may validate his/her own work or inspire ideas for new projects.
Dataset
The individual-level results of a survey, conceptualized as a table or “matrix” where the rows contain values for each individuals’ coded responses. For example, a “1” on presidential approval might mean “approve” while a “2” might mean “disapprove.” “Don’t know” is often coded as “8” or “9”.  Datasets may be used for secondary analysis. The raw data. Each respondents responses to a question laid out in a table like form.
Date of Source Document
This is a reference tool used by Roper Center staff and is not necessarily the date of a survey’s first public release. When no release date exists, the last day of the interviewing period is used.
Dependent Variable
In research, a dependent variable (also called the output variable) is the variable that is being measured in an experiment. It “depends” on other factors. For example, if the research question is “Does education level have an effect on annual income?” income is the dependent variable; the question is asking whether income “depends” on education level.
Dichotomous Question
A type of (close-ended) question which has two answer choices.
e.g. Are you: a) Female b) Male
Discrete Variable
A discrete (also known as categorical or nominal) variable is one that has two or more categories with no intrinsic order. For example, Sex, Hair color, and Favorite radio station. You can assign these variables to a category, but they have no order from highest to lowest. Therefore, you can’t find “the average” of hair color. (See ordinal variable)
Feeling Thermometer
A type of ratings scale where respondents are asked to gauge their attitudes about a particular topic or person. For example, ratings between 50 degrees and 100 degrees mean that you feel favorable and warm toward a person. Ratings between 0 degrees and 50 degrees mean that you don’t feel favorable toward the person. You would rate the person at the 50 degree mark if you don’t feel particularly warm or cold toward the person. The National Election Studies (NES) has often used this rating scale for questions about presidential candidates.
Filter Question
A type of question used on surveys in order to determine which subsequent (if any) questions to ask.
Focus Group
A small group selected from a wider population that is led by a moderator in an open discussion about the research topic. While there are many different functions of a focus group, there are typically three reasons why a focus group is used in qualitative research. The information gathered from the group interaction is used: 1) to gain insights that will help generate data for the development of a survey 2) as a supplementary source in a study along with other methods of data-collecting; used in either a preliminary or follow-up stage of the research 3) in combination with other methods of interviewing for the research project. Focus groups are often used in the areas of market research and political opinion analysis.
Full Question ID
The unique identifier assigned by the Roper Center to every question in the iPOLL database.
Independent Variable
In research, an independent variable (also called experimental variable or predictor variable) is a variable that is measured for its effect on the dependent variable(s). As the independent variable changes, its effect on the dependent variable is observed by the researcher. For example, if the research question is “does education level have an effect on annual income?” education level is the independent variable.
Interval Variable
An interval variable is similar to an ordinal variable except that the intervals between the values of the categories are equidistant, or equally spaced. However, there is no meaningful “zero” point. For example, Age.
Interview
A data collection encounter in which one person (an interviewer) asks questions of another (a respondent). Interviews may be conducted face-to-face or by telephone.
Interview Method/Mode
An indication of the method or mode of interviewing: mail, telephone, or in-person.
Interviewer Bias
Interviewers can intentionally or unintentionally prompt respondents to reply in a particular manner. Characteristics like sex, race, age, physical appearance and behavior can have subtle or sometimes, obvious affects on respondents during the interview process. Some respondents may answer in a manner that they believe would please the interviewer. It is for this reason that survey firms seeking to interview special samples of the population will carefully select interviewers with like characteristics to conduct the survey.
iPOLL
The most comprehensive and up-to-date source for national public opinion data in the United States. The iPOLL Databank is at the core of a full-text question-level retrieval system, designed so that users can locate, examine and, ultimately, capture questions asked on national surveys on a variety of topics.

Likert Scale
A widely used response format that allows for measuring typically qualitative attitudes into quantitative measures. For example, response options may include: “strongly agree”, “agree”, “disagree”, “strongly disagree”.
Mean
The average. To calculate this, simply add up the values for each case and divide by the total number of cases. (e.g. If you want to find out the average number of hours you spend on the computer each week, simply add up your daily hours and divide them by seven.)
Median
The middle score or measurement in a set of ranked scores or measurements.
Mode
1) the most common score or measure in a set of scores or measures;
2) a method of interviewing respondents.
Multi-stage (Cluster) Sampling
A sampling method using more than one stage in the process of gathering the sample. (e.g. You want to interview Missouri voters about their preferences in an upcoming election. However, you have limited resources in your ability to contact them. Therefore, you randomly select 30 voting districts in the state. From there, you randomly select towns. Within those towns, you randomly select neighborhoods. From the neighborhoods, you randomly select streets. and so on.)
Multivariate Analysis
The analysis of more than two variables simultaneously, for the purpose of determining the relationship between and/or among them. For example an issue by age and by sex.
N
The number of observed cases in a sample. In polling, N refers to the number of respondents.
Non-probability Sampling
A type of sampling where samples are drawn arbitrarily, without regard to scientific methods; and, therefore, should not be used to make statistical inferences about the target population. (i.e. “person on the street” samples)
Ordinal Variable
An ordinal variable is similar to a discrete variable; however, the difference is that there is a clear ordering of the variables from low to high. For example, “Education.” There is an order as well as a value from low to high when it comes to measuring years of education. We can code the variable as follows: “1”=Less than High School Graduate; “2”=H.S. Grad/Some College; “3” =College Graduate; “4” =Post Graduate or more
We know that there is an order from low to high in this case, but the size of the difference between each of the categories is not necessarily equidistant from one to the next. (If the difference between each of the categories were equally spaced, the variable would be an interval variable.)
Outlier Effect
An extreme value of a variable in a dataset. This extreme value can distort the results of your survey if you’re dependent solely on the mean statistic for analysis. i.e. Here is a list of test scores: 90, 87, 99, 95, 85, 43, 91. The score 43 is an outlier and will distort the mean (84, in this case) would give a better sense of the overall scores.
Population
The theoretical population from which the sample was drawn. For example, adults living in the contiguous U.S.; Some studies are based on sample sub-sets (such as national samples of women or African Americans).
Population Parameter
A characteristic of the target population described by a statistic. For example, if your target population is runners in the New York City Marathon, the average finishing time would be a population parameter. You calculate every runners’ finishing time to get the parameter. (Not to be confused with sample statistic)
Population Size
In iPOLL, this is the total unweighted count of all completed interviews, also referred to as the sample size.
Probability Sampling
A type of sampling which ensures that each member of the sampling frame has an equal, known chance of being selected. This kind of sampling allows researchers to make statistical inferences about the population at large. (see Non-probability Sampling)
Question
The actual wording of the question. In iPOLL this is preceded by a number such as R18, Q08, or R02. This unique designation is assigned by the Roper Center and does not necessarily reflect the order in which the question appeared in the original study. Whenever possible the original survey instrument is used as the source document by Center staff. In many cases, though, the order of questions on the survey may have been altered in some way for publication in a final report or news release. Researchers requiring information on the original question order should contact the Roper Center.
Question-Level Text
The actual wording of the question used during the survey interview. (See also: iPOLL question-level text example).
Random Digit Dialing
A technique used to obtain a representative sample by using a device that randomly generates telephone numbers in order to contact eligible participants.
Ratio Variable
A ratio variable has all the properties of an interval variable. In addition, it has a zero point. For example, income.
 
Reliability
The quality of measurement that suggest that the same data would have been collected each time in repeated observations of the same phenomenon.
Research Sponsor
When applicable, the name of the organization that commissioned the survey.
Responses (in iPOLL)
The response categories and percentages of the sample answering each way. Generally, the percentages shown are weighted if the data were weighted better to reflect the population. Any special question-related information clarifying such things as multiple responses, partial responses, and the like will appear after the responses. These notes relate only to specific questions as opposed to the entire study and are referred to as question-level notes.
RoperExpress
Exclusive to the Roper Center, this is a data access tool for on-demand download of data.
Sample
This is the total number of eligible participants randomly selected from the sampling frame of the total population in the survey. The desired sample size is determined by the necessary statistical quality for the survey results. [Note: The total sample size will inevitably be greater than the actual number of completed interviews due to varying response rates and other sources of survey error.]
Sampling
A method of selecting elements (or units) from the target population in a way that is representative. Types of sampling include: Simple random sampling,stratified sampling, systematic sampling, and multi-stage cluster sampling.
Sample Error
One type of inaccuracy caused by making inferences about the target population based on the sample. The sampling error is an estimate of how a sample statistic is expected to differ from the population parameter.
Sample Frame
This is the list of eligible participants included in the target population. The sample is chosen from the sampling frame.
Sample Size
This is the total unweighted count of all completed interviews, also referred to in iPOLL as the population size.
Sample Statistic
A statistic which describes the sample. (e.g. If you want to do a survey of New York City Marathon runners, including their finishing times, the average finishing time of the those surveyed would be an example of a sample statistic. Not to be confused with population parameter, which would calculate the average finishing time of all the runners, not just a sample of them.)
Secondary Data
This term refers to materials and information that has previously been documented. For example, a poll, a press release, a business report.
Simple Random Sample (SRS)
The most common sampling method where each element in the population has an equal chance of being selected.
Source Document
The document from which survey information was gathered.
Standard Deviation
A statistic that shows the dispersion of scores in a distribution of scores. It is a measure of the average amount the scores in a distribution deviate from the mean. The more widely spread out the scores are, the larger the standard deviation will be.
Standard Error (of the Mean)
A statistic indicating how much the mean score of a single sample is likely to differ from the mean score of the population. It answers the question, “How good an estimate of the population mean is the sample mean?” (Not to be confused with sampling error)
Statistic
A number that describes some characteristic of a variable. (e.g. the mean, the standard deviation)
Stratified Sampling
A method of sampling where groups that might not otherwise be equally represented are first divided proportionately into categories (“strata”); then, a sample is randomly selected from each of these categories. (e.g. If you wanted to do a study on hospitals, you’d separate them by size—small, medium-sized, and large hospitals. From there, you would draw samples from each category so that they’d all be equally represented.
Study Note
This notation pertains to the entire release, report, or study from which the question was taken.
Subject
The topic classification(s) that best describe the question. The scheme for this categorization was developed by the Roper Center and contains over 100 subject categories.
Subpopulation
In cases where responses are not based on the entire sample, a description of the portion of the sample whose responses are being reported appears here (e.g. women, or those who favor a given policy).
Survey Organization
The name of the survey firm or other organization which conducted the research.
Systematic Sampling
A method of sampling where units are selected from the sampling frame by every “nth” unit. (e.g. You have a directory of 100,000 names and you want a sample of 1,000 names. Divide 100,000 by 1,000 to get 100. You will select every 100th name from the directory. Randomly select a number between 1 and 100, say 42, and select every 42nd name in groups of 100 (42, 142, 242, 342, 442.) to complete your sample.
Topline
The topline is the result of how the aggregated sample answered a specific question. (See also, What is a topline?)
Triangulation
Using more than one method to find meaning in a problem. i.e. If you want to interpret the President’s Approval Rating, you could look at poll results, results of focus groups, and news stories of current events.
Trends
This term typically refers to the long-term patterns over time relating to topics of interest in public opinion that are measured by the repetition of the same question with unchanging wording over many years. An example of a survey that includes many important trends is the GSS, one of the nation’s longest running surveys of social, cultural and political indicators.
US National Adult
A common theoretical population for US “national” polls. Typically it means the adult, non-institutionalized (e.g. no prisons or military bases) population in the 48 contiguous states, since Alaska and Hawaii are often omitted for practical reasons.
Univariate Analysis
The analysis of a single variable, for purposes of description (e.g., averages, or the proportion of cases falling into a given category among the entire sample).
Variable
In survey research, a variable is an example of what is being measured. (i.e. income, age; presidential approval; support for a policy, etc.) There are different kinds of variables, including: categorical, continuous, interval, ratio, independent, dependent.
Weighting
Also known as sample balancing, weighting is a technique used to reflect differences in the number of population units that each case in a dataset represents. Typically, for surveys designed to be representative of the population of the U.S., units are adjusted to reflect the U.S. Census. While polling organizations may have different methods for their weighting procedures, weighting generally involves the multiplication of survey observations by one or more factors in order to increase or decrease the emphasis that will be given to the observations when analyzing the data..

For further information please contact The Roper Center at 607.255.8129 or support@ropercenter.org.