Skip to Main Content

Statistics & Data for the Social Sciences

Guide to locating data sets commonly used by social scientists and other researchers on a wide range of topics.

Questions to Ask about your Data

Think carefully about any data or statistics you want to use in your research.  The following questions come from Emma Smith, Using Secondary Data in Educational and Social Research (New York: McGraw Hill/Open University Press, 2008).

  • Who collected the data?
  • How were the data collected?
  • What types of questions were used?
  • How relevant are the data to your own research?
  • Do the variables match?
  • Do your definitions match?  (Avoid comparings apples and oranges.)
  • Are the data of good quality?
  • What are the sampling strategies and response rates?
  • How timely are the data?
  • From whom was the information collected?
  • What categories are used to group the data?
  • How precise are the data?
  • Who is missing from the data?
  • Are there any missing data?

A Few Definitions

Understanding some basic terminology will help you to determine whether or not you need statistics, data or both.

Statistics are in a format where the data have already been analyzed and processed to produce information in an easy to read format such as charts, tables, and graphs.  An example of this is Statistical Abstract of the United States.  If you're looking for a quick number, it's best to start with statistics.

Data are typically raw data that need to be manipulated using software.  Data can be quantitative, qualitative, spatial, etc. The difference between data and statistics can be confusing because in everyday language, the terms statistics and data are often used interchangeably.

Numeric Data are made up of numbers.  Numeric Data are processed using statistical software like SPSS, Stata, or SAS.

Qualitative Data are data that describe a property or attribute.  Examples of qualitative data are interview responses, observation notes for a case study or ethnography, comments collected on a questionnaire, text or images for content analysis, etc.

Spatial Data are geographic information that is used for analysis with GIS software like ArcGIS.

More Terminology:

Codebook provides information on the structure, content, and layout of a data file as well as methodology, questionnaire(s) and any other relevant information about the data set.

Data Archive preserves and makes accessible research data.  Some examples are ICPSR, CPANDA, and CIESIN.

Microdata are data on the lowest level of observation such as individual answers to questions.  For example, the U.S. Census Bureau's Public-Use Microdata Samples (PUMS files) is a data set of individual housing unit responses to census questions.

Primary Data are data collected through your own research study directly through instruments such as surveys, observations, etc.

Secondary Data are data from a research study conducted by someone else.  Usually when you are asked to locate statistics on a topic you are using secondary data.  An example of secondary data are statistics from the Census of Population and Housing.

Summary Data is another way of describing data that has been processed, or summarized (see statistics).  For example, the tables you are reading when using statistical sources are summary data.

Time Series is a sequence of data points spaced over time intervals.

Books about Social Science Data

The following books at Howard-Tilton may help you learn more about using data and statistics in social science research.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.