Learning and Evaluation/Archive/Learning modules/3Reliability

Jump to page

Part 1: Introduction

Why Survey?
Why Surveys Are Useful
Survey instruments
Types of information
Attributes - a special case
Survey Objective and Planning

Part 2: Reliability & Validity

Reliability & Validity
Face Validity
Content Validity
Criterion Validity
Construct Validity

Part 3: Question Construction

Writing Good Questions
Questions from Existing Surveys
Constructing your own Questions
Be Specific
Be Concise
Avoid Double Negatives
Minimize Social Desirability Bias
Avoid Double-barreled questions
Avoid abbreviations, jargon, technical terms, or slang
Avoid leading questions
Avoid loaded questions
Use appropriate wording
Ask useful questions
Rely on second-hand data sparsely
Use caution when asking personal questions

Part 4: Response Options

Question types
Dichotomous pairs
Multiple choice
Check all that apply
Choosing response options

Part 5: Questionnaire structure

Important considerations
Questions order
Additional Resources

  Wikimedia Training Designing Effective Questions Menu


Reliability is how consistently a proxy (i.e. survey) measures a construct. Having a basic understanding of the types of reliability is helpful when writing a survey, but generally, testing for reliability involves some statistics. Deeper statistical information at the bottom is provided for the interested.

Internal Reliability
This is the most commonly used method to test reliability. Internal reliability involves testing for homogeneity, that is, testing whether different questions that aim to measure similar targets are correlated, as opposed to being the result of random chance.


Test-retest reliability
Does the measure produce the same or similar results from the same respondents if administered at different points of time? Usually the questionnaire is administered on 2 occasions separated by a few days. Ideally, responses shouldn't vary except in measures of health, which can change from day to day.

Statistical background
Internal reliability is measured using the Cronbach's alpha statistic (for items with more than 2 response categories) and the Kuder-Richardson (KR-20) test (for items with 2 response categories, e.g. yes/no)If the Alpha statistic is < 0.5, then this is regarded as low internal reliability (i.e. the items are not measuring the same phenomenon).
Test-retest reliability is measured using a basic correlation coefficient targeting a test-retest correspondence that is one-to-one, or, it may be more carefully assessed using Cohen's Kappa statistic, as is used for measuring inter-rater reliability, or the reliability between two rating instances, taking into account the random chance of agreement as well as observed inter-rater agreement.