35 Collecting Data
Izabela Mazur and Kim Moshenko
Learning Objectives
- State whether data is quantitative or qualitative
- Discuss potential problems that might arise when sampling from a population
Types of Data
Most data can be categorized as qualitative or quantitative.
Qualitative data are the result of categorizing or describing attributes of a population using our senses such as sight or touch. Hair color, blood type, ethnic group, the car model that a person drives, and the street a person lives on are examples of qualitative data. Qualitative data are generally described by words or letters. For instance, hair color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+, O-, or B+.
Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population. Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics are examples of quantitative data. Researchers often prefer to use quantitative data over qualitative data because it lends itself more easily to mathematical analysis. For example, it does not make sense to find an average hair color or median blood type.
Populations and Samples
In statistics, we generally study populations. You can think of a population as a collection of persons, things, or objects under study. It is often not feasible or possible to study the entire population. Instead we can select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population.
Because it takes a lot of time and money to examine an entire population, sampling is a very practical technique. If you wished to compute the overall grade point average at your school, it would make sense to select a sample of students who attend the school. The data collected from the sample would be the students’ grade point averages. In elections, opinion poll samples of 1,000–2,000 people are taken. The opinion poll is supposed to represent the views of the people in the entire country.
Critical Thinking: Potential Survey Issues
Users of statistical studies should be aware of the sampling method before accepting the results of the studies. Common problems to be aware of include:
- Nonrepresentative samples: A sample must be representative of the population under study. A sample that is not representative of the population is biased. Biased samples that are not representative of the population give results that are inaccurate and not valid. An example of a biased sample would be a survey on violence in sports where only the female students in a coed high school are surveyed.
- Self-selected samples: Surveys where responses are voluntary, such as call-in surveys, are often unreliable.
- Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if possible. In some situations, having small samples is unavoidable and can still be used to draw conclusions. Examples would include crash testing of cars or medical testing for rare conditions.
- Undue influence: collecting data or asking questions in a way that influences the response. An example would be conducting a taste test of two sodas where one is refrigerated and the other is served at room temperature.
- Non-response or refusal of a subject to participate: The collected responses may no longer be representative of the population. Often, people with strong positive or negative opinions may answer surveys, which can affect the results. As an example, reviewers on Internet travel sites may not be representative of the entire population.
- Misleading use of data: Be aware of improperly displayed graphs, incomplete data, or lack of context.
Key Concepts
When conducting a survey we can choose from several sampling methods:
- Simple random sampling is where a member of the population is equally as likely to be chosen as any other member from the population.
- Systematic sampling is where the first sample member from a larger population is selected according to a random starting point. Additional sample members are then selected based on a fixed interval.
- Cluster sampling is where the population is divided into clusters (groups) and then a specific number of clusters is randomly selected. Every member from each of the selected clusters will be in the cluster sample.
- Convenience sampling is where the selection is made from a part of the population that is easy to access.
Adapted from Business/Technical Mathematics by Izabela Mazur and Kim Moshenko, CC BY 4.0