Rabu, 04 Juli 2018

Sponsored Links

Statistics: Sampling Methods - YouTube
src: i.ytimg.com

In statistics, quality assurance, and survey methodology, sampling is the selection of an individual subset (sample of statistics) from within the statistical population to estimate the characteristics of the entire population. Two advantages of sampling are lower cost and data collection faster than measuring the entire population.

Each observation measures one or more properties (such as weight, location, color) of observable objects distinguished as independent or individual objects. In survey sampling, weights can be applied to the data to adjust the sample design, especially stratified sampling. The results of probability theory and statistical theory are used to guide practice. In business and medical research, sampling is widely used to gather information about a population. Sampling acceptance is used to determine if many production materials meet the set specifications.

The sampling process consists of several stages:

  • Defining a population of interest
  • Specifies a sampling template, a set of items or events that may be possible to measure
  • Specifies the sampling method to select items or events from the frame
  • Specifies the sample size
  • Implementing a sampling plan
  • Sampling and data collection


Video Sampling (statistics)



Definisi populasi

Successful statistical practice is based on the definition of a focused problem. In sampling, this includes determining the population from which our sample was taken. The population can be defined as including all persons or goods with characteristics to be understood. Since there is rarely enough time or money to gather information from everyone or everything in a population, the goal is to find a representative sample (or subset) of that population.

Sometimes what defines a population is clear. For example, the manufacturer must decide whether a set of materials from production is of sufficiently high quality to be released to the customer, or should be punished for scrap or rework due to poor quality. In this case, the batch is a population.

Although the interest population often consists of physical objects, we sometimes need to sample from time to time, space, or some combination of these dimensions. For example, the investigation of supermarket staff can check the length of the cash line at various times, or the study of endangered penguins may be aimed at understanding the use of various hunting places over time. For time dimensions, the focus may be on certain periods or occasions.

In other cases, our 'population' may be even less real. For example, Joseph Jagger studied the roulette wheel behavior in a casino in Monte Carlo, and used it to identify a biased wheel. In this case, the Jagger 'population' to be investigated is the overall behavior of the wheel (ie the probability distribution of results in many experiments without boundaries), while the 'sample' is formed from the observed results of the wheel. Similar considerations arise when taking repeated measurements of some physical characteristics such as copper electrical conductivity.

This situation often arises when we seek knowledge of the causative system that the observed population is the result. In such cases, sampling theory can treat the observed population as a sample of the larger 'super population'. For example, a researcher might study the success rate of a new 'stop smoking' program in a test group of 100 patients, to predict the effect of the program if it is available nationally. Here the superpopulation is "everyone in this country, given access to this treatment" - a group that does not yet exist, because the program is not yet available for all.

Note also that the population from which the samples were taken may not be the same as the actual population we want information. Often there is a large but incomplete overlap between these two groups due to frame issues etc. (see below). Sometimes they may be completely separate - for example, we may study mice to gain a better understanding of human health, or we may study notes from people born in 2008 to make predictions about people born in 2009.

The time spent making sample populations and the care of the right population is often well spent, because it creates many problems, ambiguities and questions that should have been ignored at this stage.

Maps Sampling (statistics)



Sampling frame

In the simplest case, such as sampling from a single batch of material from production (sampling receipt by many), it would be desirable to identify and quantify each item in the population and to include one of them in our sample. However, in the more general case this is usually impossible or practical. There is no way to identify all the rats in the set of all the rats. Where voting is not mandatory, there is no way to identify the person who will actually vote in the upcoming election (before the election). This improper population does not agree to take a sample in one of the ways below and where we can apply statistical theory.

As a remedy, we look for a sampling frame that has properties that we can identify every element and included in our sample. The simplest type of framework is the list of population elements (preferably the entire population) with the right contact information. For example, in a poll, a sampling frame might include a voter list and a phone directory.

A probability sample is a sample in which each unit in the population has a probability (greater than zero) selected in the sample, and this probability can be accurately determined. The combination of these properties makes it possible to produce an unbiased estimate of the total population, by weighting the sample units according to the probability of their selection.

Example: We want to estimate the total income of adults living on the given road. We visited every household on the street, identified all the adults who lived there, and randomly selected one adult from each household. (For example, we can allocate each person a random number, generated from a uniform distribution between 0 and 1, and select the person with the highest number in each household). We then interviewed the selected person and found their income.

People who live alone will be selected, so we just add their earnings to our total estimates. But a person living in a two-adult household has only one-in-two opportunities for selection. To reflect this, when we come to such households, we will count the income of the person chosen twice against the total. (The person selected from that household can be seen loosely as also represents the person who is not selected.)

In the above example, not everyone has the same selection probability; what makes it a probability sample is the fact that everyone's probability is known. When any element in the population does not have the same selection probability, this is known as the 'eligibility equivalent probability' design (EPS). Such designs are also referred to as 'self-weighting' because all sample units are given equal weight.

Sampling samples include: Simple Random Sampling, Systematic Sampling, Stratification Sampling, Proportional Proportion of Size Sampling, and Cluster or Multistage Sampling. The various ways of probability sampling have two similarities:

  1. Each element has a non-zero probability known as a sample and
  2. involves random selection at some point.

Nonprobability sampling

Nonprobability sampling is a sampling method in which some elements of the population have no selection opportunities (this is sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can not be accurately determined. This involves selecting elements based on assumptions about an interesting population, which form the criteria for selection. Therefore, because the element selection is not random, nonprobability sampling does not allow estimation of sampling error. This condition creates an exclusion bias, placing limits on how much information the sample can give the population. Information on the relationship between sample and population is limited, making it difficult to extrapolate from sample to population.

Example: We visited every household on a particular street, and interviewed the first person who answered the door. In every household with more than one resident, this is a nonprobability sample, as some people are more likely to answer the door (eg unemployed people who spend most of their time at home more likely to answer than a workable roommate who may be at work when the interviewer calls) and it is not practical to calculate this probability.

Nonprobability sampling methods include convenience sampling, quota sampling and purposive sampling. In addition, the nonresponse effect can change any probability design into an irreversible design if nonresponse characteristics are not well understood, since the nonresponse response effectively modifies every possible element to be sampled.

AP Statistics: Sampling Distributions & the Central Limit Theorem ...
src: i.ytimg.com


Sampling method

In each type of frame identified above, various sampling methods may be used, individually or in combination. Factors that typically influence the choice between these designs include:

  • The nature and quality of the frame
  • Availability of additional information about units in frames
  • Accuracy requirements, and the need to measure accuracy
  • Is the detailed analysis of the samples expected
  • Cost/operational issues

Simple random sampling

In a simple random sample (SRS) of a given size, all subsets of the frame are given the same probability. Each element of the frame thus has the same selection probability: the frame is not shared partitioned. Furthermore, every pair of the element has the same chance to be chosen as another such pair (as well as to triple, and so on). This minimizes bias and simplifies analysis of results. Specifically, the variance between individual outcomes in the sample is a good indicator of variance in the entire population, which makes it relatively easy to estimate the accuracy of the results.

SRS can be susceptible to sampling error because randomness of selection can produce samples that do not reflect the population structure. For example, a simple random sample of ten people from a country would on average produce five men and five women, but each given experiment tended to represent one gender and less representative of the other. Systematic and multilevel techniques try to overcome this problem by "using information about the population" to select a more "representative" sample.

SRS may also be troublesome and tedious when taking samples from a large target population. In some cases, researchers are interested in "specific research questions" for the subgroup population. For example, researchers may be interested in testing whether cognitive ability as a work performance astrologer equally applies to all racial groups. SRS can not accommodate the needs of researchers in this situation because it does not provide a population subsample. "Stratified sampling" overcomes this SRS weakness.

Systematic sampling

Systematic sampling (also known as sampling interval) depends on the setting of the study population according to some ordering scheme and then selecting elements at regular intervals through the ordered list. Systematic sampling involves a random start and then proceeds with the selection of every k th element from the next. In this case, k = (population size/sample size). It is important that the starting point is not automatically the first in the list, but randomly selected from within the first to the k element in the list. A simple example is to choose every name 10 from the telephone directory (sample 'every 10', also referred to as 'sampling with jump 10').

During a random initial point, systematic sampling is a type of probability sampling. Easy to implement and induced stratification can make it efficient, if the variable in which the ordered list is correlated with the interest variable. The '10' pickup is very useful for efficient sampling of the database.

For example, suppose we want to sample people from long roads that start in poor areas (house No. 1) and end up in expensive districts (home No. 1000). A simple random address selection from this path can easily end up too much from the high end and too little from the lower end (or vice versa), leading to an unrepresentative sample. Selecting (eg.) Every 10 street numbers along the road ensure that samples are spread evenly along the road, representing all these districts. (Note that if we always start at home # 1 and end up in # 991, sample a little bias toward the bottom end, by choosing a random start between # 1 and # 10, this bias is omitted.

However, systematic sampling is particularly vulnerable to periodic lists. If the periodicity is present and the period is some or the factor of the interval used, the sample is very likely to be representative of the whole population, making the scheme less accurate than simple random sampling.

For example, consider a street where the odd-numbered houses are all on the north side (expensive) of the road, and even-numbered houses are all on the southern (cheap) side. Under the sampling scheme given above, it is not possible to obtain a representative sample; both the sampled houses will all come from the weird side, the expensive numbers, or they will all come from the even side, even cheap, unless the researcher has previous knowledge. of this bias and avoid it by using a jump that ensures a jump between the two sides (each odd numbered jump).

Another disadvantage of systematic sampling is that even in scenarios where it is more accurate than SRS, the nature of the theory makes it difficult to measure the accuracy. (In the two samples of systematic sampling given above, many potential sampling errors are due to variations between neighboring homes - but since this method never selects two neighboring homes, the sample will not give us information about that variation.)

As described above, systematic sampling is an EPS method, since all elements have the same selection probability (in the given example, one in ten). This is not not 'simple random sampling' because different subsets of the same size have different selection probabilities - ie. the set {4,14,24,..., 994} has a selection probability of one in ten, but the set {4,13,24,34,...} has a zero probability for selection.

Systematic sampling can also be adapted to non-EPS approaches; for example, see the discussion of the PPS sample below.

Stratified sampling

When the population includes a number of different categories, frames can be organized by these categories into separate "strata". Each stratum is then sampled as an independent sub-population, in which each element can be randomly selected. This ratio of the size of this random selection (or sample) to the population size is called the sampling fraction. There are several potential benefits for multilevel sampling.

First, dividing the population into distinct and independent strata may allow researchers to draw conclusions about certain subgroups that may be lost in more general random samples.

Second, using the stratified sampling method may lead to more efficient statistical estimates (provided that strata are selected based on relevance to the intended criteria, not sample availability). Even if the stratified sampling approach does not lead to increased statistical efficiency, such tactics will not yield much less efficiency than simple random sampling, provided that each strata is proportional to the size of the group in the population.

Third, sometimes cases that data is more readily available to individuals, strata that already exist in a population than to the entire population; in such cases, using a stratified sampling approach may be easier than collecting cross-group data (although this is potentially contrary to the prior importance of using strata relevant to the criteria).

Finally, since each strata is treated as an independent population, different sampling approaches can be applied to different strata, allowing researchers to use the most appropriate (or most cost-effective) approach to each subgroup identified in the population.

Nevertheless, there are some potential drawbacks for using multilevel sampling. First, identifying strata and implementing such an approach can increase the cost and complexity of sample selection, and lead to increased complexity of population estimates. Second, when examining multiple criteria, stratification variables may be related to some criteria, but not to others, further complicate the design, and potentially reduce the utility of the strata. Finally, in some cases (such as design with a large number of strata, or those with a certain minimum sample size per group), stratified sampling can potentially require larger samples than other methods (although in most cases, the required sample size will not be greater than necessary for simple random sampling).

A stratified sampling approach is most effective when three conditions are met
  1. Variability in strata is minimized
  2. Variability between strata is maximized
  3. The variables in which populations are grouped are highly correlated with the desired dependent variable.
Advantages compared to other sampling methods
  1. Focuses on important sub-populations and ignores irrelevant ones.
  2. Allows the use of different sampling techniques for different subpopulations.
  3. Increases the accuracy/efficiency of the estimates.
  4. Enables greater statistical strength balancing from testing the differences between strata by sampling the same numbers of strata that vary widely in size.
Disadvantages
  1. Requires the selection of relevant stratification variables that can be difficult.
  2. Not useful when there are no homogeneous subgroups.
  3. It can be expensive to implement.
Poststratification

Stratification is sometimes introduced after the sampling phase in a process called "poststratification". This approach is usually carried out due to a lack of prior knowledge of the appropriate stratifying variables or when the experiment lacks the information necessary to create a stratifying variable during the sampling phase. Although this method is vulnerable to post-hoc approach traps, this method can provide several benefits in appropriate situations. Implementation usually follows a simple random sample. In addition to allowing for stratification on additional variables, poststratification can be used to apply weighting, which can improve the accuracy of sample estimates.

Oversampling

Sampling by choice is one of the stratified sampling strategies. In select-based sampling, data are grouped by target and samples taken from each strata so that the rare target classes will be better represented in the sample. This model is then built on this biased sample. The effects of input variables on targets are often estimated with more precision with sample-based options even when smaller overall sample sizes are taken, compared to random samples. The result should usually be adjusted to improve oversampling.

Probability-proportional-to-sample size

In some cases, the sample designer has access to "additional variables" or "size sizes", believed to be correlated with interesting variables, for each element in the population. This data can be used to improve accuracy in the sample design. One option is to use additional variables as the basis for stratification, as discussed above.

Another option is probability proportional to the size ('PPS') sampling, in which the probability of selection for each element is set proportional to its size, to a maximum of 1. In a simple PPS design, the probability of this selection can be used as a basis for Poisson sampling. However, this has the disadvantage of the sample size of the variables, and the different parts of the population may still be too much or underrepresented because of the variation in the chance of selection.

Systematic sampling theory can be used to create probabilities that are proportional to sample size. This is done by treating each count in the size variable as a single sampling unit. The sample is then identified by selecting at an even interval between this number in the size variable. This method is sometimes called PPS-sequential or monetary sampling units in case of audit or forensic sampling.

Example: Suppose we have six schools with populations of 150, 180, 200, 220, 260 and 490 students each (a total of 1500 students), and we want to use the student population as the basis for a sample of three Size PPS. To do this, we can allocate the first school number 1 to 150, second school 151 to 330 (= Â ± 150 180), third school 331 to 530, and so on to the last school (1011 to 1500). We then generate a random start between 1 and 500 (equals 1500/3) and calculate through school populations with a multiple of 500. If our random start is 137, we will select the schools allocated 137, 637 and 1137, the first school , fourth, and sixth.

The PPS approach can improve accuracy for the given sample size by concentrating samples on large elements that have the greatest impact on population estimates. PPS sampling is typically used for business surveys, where elemental sizes vary greatly and additional information is often available - for example, surveys that try to measure the number of guest-nights spent in a hotel may use each of the number of hotel rooms as an additional variable. In some cases, measurements of the older variables of interest can be used as additional variables when trying to generate more current forecasts.

Cluster sample

It is sometimes more cost-effective to select respondents in groups ('groups'). Sampling is often grouped by geography, or by period of time. (Almost all samples are in some cases 'clustered' in time - though these are rarely taken into account in the analysis.) For example, if surveying households within a city, we may choose to select 100 city blocks and then interview each household in the selected block.

Clustering can reduce travel and administration costs. In the example above, the interviewer can travel one trip to visit several households within a block, rather than having to go to different blocks for each household.

It also means that one does not need a sampling frame that lists all elements in the target population. In contrast, clusters can be selected from cluster-level frames, with element-level frames created only for selected clusters. In the example above, the sample requires only a block-level city map for initial selection, and then a household-level map of 100 selected blocks, rather than a household-level map of an entire city.

Cluster sampling (also known as group sampling) generally increases the variability of the sample estimates above that simple random sampling, depending on how the clusters differ from each other compared to in-cluster variations. For this reason, cluster sampling requires larger samples than SRS to achieve the same level of accuracy - but cost savings from clustering may still make this option cheaper.

Cluster sampling is generally implemented as multistage sampling. This is a complex form of cluster sampling where two or more unit levels are embedded one in the other. The first stage consists of building a cluster to be used for sampling from. In the second stage, the primary sample unit is randomly selected from each cluster (rather than using all units contained in all selected clusters). In subsequent stages, in each selected cluster, additional samples of units are selected, and so on. All the main units (individuals, for example) selected in the last step of the procedure are then surveyed. This technique, in essence, is a random sampling process from a previous random sample.

Multistage sampling can substantially reduce sampling costs, where a complete population list needs to be built (before other sampling methods can be applied). By eliminating the work involved in describing unselected clusters, multistage sampling can reduce the huge costs associated with traditional cluster sampling. However, each sample may not represent the entire population.

Quota sampling

In quota sampling , populations are first segmented into mutually exclusive subgroups, just as in stratified sampling. Then the assessment is used to select the subject or unit of each segment based on the specified proportions. For example, an interviewer may be asked to sample 200 women and 300 men between the ages of 45 and 60 years.

This is the second step that makes the technique one of the non-probability sampling. In the sampling quota the sample selection is not random. For example, interviewers may be tempted to interview those who seem helpful. The problem is that this sample may be biased because not everyone gets a chance of selection. This random element is its weakness and its largest quota versus probability has been controversial for several years.

Minimax sampling

In an unbalanced dataset, where the sampling ratio does not follow population statistics, one can rewind the dataset in a conservative way called minimax sampling. The minimax sample has an origin in the Anderson minimax ratio which is proved to be 0.5: in binary classification, the sample-class size should be selected equally. This ratio can be proven as a minimax ratio only under the assumption of LDA classifier with Gaussian distribution. The idea of ​​minimax sampling has recently been developed for a general class of classification rules, called class wise classifiers. In this case, the sampling rate of the class is chosen so that the worst case error classifier over all possible population statistics for the prior probability of the class, will be

Accidental sampling

Accidental sampling (sometimes known as capture , convenience or sampling opportunities ) is a nonprobability sampling type involving samples taken from the close to hand. That is, the population is chosen because it is available and convenient. It may be through meeting people or including someone in the sample when someone meets them or is chosen by finding it through technological means like the internet or by phone. Researchers using such samples can not scientifically make generalizations about the total population of this sample because they are not representative enough. For example, if the interviewer performs such a survey in the shopping center on a particular day of the morning, the interviewees will be limited to those given there at a given time, which will not represent the views of other community members in such areas , if the survey was conducted at different times of the day and several times per week. This type of sampling is most useful for trials. Some important considerations for researchers using convenience samples include:

  1. Are there any controls in a research design or experiment that can serve to reduce the impact of a non-random sample, thereby ensuring the outcome will be more representative of the population?
  2. Is there a compelling reason to believe that a particular comfort sample will or should respond or behave differently from a random sample of the same population?
  3. Is the question asked by the person being researched adequately answered using the convenience example?

In social science research, snowball sampling is a similar technique, in which existing research subjects are used to recruit more subjects into the sample. Some snowball sampling variants, such as respondent-driven samples, allow selection probability calculations and probability sampling methods under certain conditions.

Voluntary Sampling

The voluntary sampling method is a non-probability sampling type. Voluntary samples consist of self-selected people in the survey. Often, this subject has a strong interest in the main topic of the survey. Volunteers can be invited through advertising on Social Media Sites. This method is suitable for research that can be done through filling questionnaire. Target population for ads can be selected based on characteristics such as demographics, age, gender, income, occupation, educational level or interest in using advertising tools provided by social media sites. Ads can include messages about research and will link to web surveys. After voluntarily following links and submitting web-based questionnaires, respondents will be included in the sample population. This method can reach the global population and is limited by advertising budget. This method may allow volunteers outside the reference population to volunteer and be included in the sample. It is difficult to make generalizations about the total population of this sample because this will not be representative enough.

Line intercept retrieval

Intercept-intercept sampling is the method of sampling element in the region where the element is sampled if the selected line segment, called "transect", truncates the element.

Panel sampling

Panel sample is the first method of selecting a group of participants through a random sampling method and then asking the group for (potentially similar) information multiple times over a period of time. Therefore, each participant is interviewed at two or more points of time; each data collection period is called "wave". This method was developed by sociologist Paul Lazarsfeld in 1938 as a means to study political campaigns. This longitudinal sampling method allows for estimation of changes in the population, for example with regard to chronic illness with work stress to weekly food expenditure. Sampling panels can also be used to inform researchers about in-person health changes due to age or to help explain changes in continuous dependent variables such as partner interactions. There are several proposed methods for analyzing panel data, including MANOVA, growth curves, and structural equation modeling with lagging effects.

Snowball sampling

Snowball sampling involves finding a small group of early responders and using it to recruit more respondents. This is particularly useful in cases where populations are hidden or difficult to quantify.

Theoretical sampling

Theoretical sampling takes place when a sample is selected based on the results of data collected so far with the aim of developing a deeper understanding of the area or developing a theory

MM150 Unit 8 Seminar Statistics. 8.1 Sampling Techniques ppt download
src: images.slideplayer.com


Replacement of selected unit

The sampling scheme may be no replacement ('WOR' - no element can be selected more than once in the same sample) or with replacement ('WR' - elements may appear several times in one sample). For example, if we catch a fish, measure it, and immediately return it to the water before proceeding with the sample, this is a WR design, because we might catch and measure the same fish more than once. However, if we do not return the fish to the water, this becomes the WORK design.

AP Statistics: How to Sample with a Random Number Table - YouTube
src: i.ytimg.com


Sample size

The formulas, tables, and power function charts are well known approaches to determine the sample size.

Steps to use sample size table

  1. Postulate the size of the effect of interest,?, and ?.
  2. Check sample size table
    1. Select the table that matches the selected one?
    2. Find the line that matches the desired power
    3. Find the column that matches the estimated effect size.
    4. The intersection of columns and rows is the required minimum sample size.

Comparing groups for statistical differences: how to choose the ...
src: www.biochemia-medica.com


Sampling and data collection

Good data collection involves:

  • Following the specified sampling process
  • Storing data in a time sequence
  • Take note of other contextual comments and events
  • Record non-responses

Sampling: Populations and samples, parameters and statistics - YouTube
src: i.ytimg.com


The sampling app

Sampling allows the selection of appropriate data points from within larger data sets to estimate the characteristics of the entire population. For example, there are about 600 million tweets produced every day. You do not need to see everything to determine the topics covered during the day, nor do you need to look at all tweets to determine the sentiments on each topic. A theoretical formulation for Twitter data sampling has been developed.

In the manufacture of various types of sensory data such as acoustics, vibrations, pressure, currents, voltages and controller data are available at short intervals of time. To forecast down-time may not need to see all the data but the sample may be enough.

For our statistics group project, we selected the exhalation study ...
src: images.slideplayer.com


Error in sample survey

Survey results usually experience some errors. Total errors can be classified into sampling errors and non-sampling errors. The term "error" here includes systematic bias as well as random error.

Error and sampling bias

Sampling error and bias are induced by the sample design. They include:

  1. Selected biases : When the actual probability of selection differs from that assumed in calculating the result.
  2. Random sampling error : Random variations in results because the elements in the sample are randomly selected.

Non-sampling error

A non-sampling error is another error that could have an impact on the final survey estimate, caused by problems in data collection, processing, or sample design. They include:

  1. Over-coverage : Inclusion of data from outside the population.
  2. Bottom coverage : The sampling frame does not include elements in the population.
  3. Measurement error : e.g. when the respondent misunderstands the question, or finds it difficult to answer.
  4. Processing errors : Errors in data encoding.
  5. Non-response or Participation bias : Failed to get complete data from all selected individuals.

After sampling, a review should be made of the appropriate process followed by sampling, rather than intended, to study any effect that may occur in subsequent analysis.

A specific issue is a matter of non-responses . Two main types of non-responses exist: nonresponse units (referring to lack of completion of parts of the survey) and non-response items (submission or participation in surveys but failing to complete one or more component/survey questions). In survey sampling, many individuals identified as part of the sample may not want to participate, have no time to participate (opportunity costs), or survey administrators may not be able to contact them. In this case, there is a risk of difference, between respondents and nonrespondents, leading to an estimate of the bias of population parameters. This is often overcome by improving survey designs, offering incentives, and conducting advanced research that makes repeated attempts to contact unresponsive and to characterize their similarities and differences with the rest of the framework. The effect can also be reduced by weighing the data when the population benchmarks are available or by calculating data based on answers to other questions. Nonresponse is primarily a problem in Internet sampling. Reasons for this problem include an unreliable survey, over-surveying (or survey fatigue), and the fact that potential participants have multiple e-mail addresses, which they do not use anymore or are not regularly checked.

Excel 2013 Statistical Analysis #43: Simple Random Sampling in ...
src: i.ytimg.com


Survey weight

In many situations, the sample fraction may be varied by the strata and the data must be weighed to represent the population appropriately. So for example, a simple random sample of individuals in the UK may include some on remote Scottish islands that would be very expensive to sample. The cheaper method is to use stratified samples with urban and rural strata. Rural samples can be poorly represented in the sample, but are appropriately weighted in the analysis to compensate.

In general, the data should normally be weighted if the sample design does not give each individual the same opportunity to choose from. For example, when a household has the same selection probability but one person is interviewed from within each household, this gives the person from the larger household a smaller chance of being interviewed. This can be accounted for using survey weights. Similarly, households with more than one phone line have a greater chance of being selected in a random digit dialing sample, and the weights can adjust for this.

Weights can also serve other purposes, such as helping to correct non-responses.

The Logic behind Statistical Inference â€
src: www.popularsocialscience.com


The method generates a random sample

  • Random number table
  • Mathematical algorithm for pseudo-random number generator
  • Physical randomization devices such as coins, playing cards or advanced devices such as ERNIE

Sampling: Cluster sampling - YouTube
src: i.ytimg.com


History

Random sampling using many is an old idea, mentioned several times in the Bible. In 1786 Pierre Simon Laplace estimated the French population by using samples, together with the ratio estimator. He also calculated the probabilistic estimates of the error. This is not expressed as a modern confidence interval but as a sample size that would be required to achieve a certain upper limit on sampling errors with probability of 1000/1001. The estimates use Bayes's theorem with a previously uniform probability and it is assumed that the sample is random. Alexander Ivanovich Chuprov introduced a sample survey to Imperial Russia in the 1870s.

In the United States 1936 the Digest Literature predictions of Republican victory in the presidential election became very bad, due to the severe bias [1]. Over two million people responded to the research with their names obtained through a list of magazine subscriptions and phone directories. It is not appreciated that this list is highly biased towards Republicans and the resulting sample, albeit very large, is highly flawed.

Joachim Goεdhαrt on Twitter:
src: pbs.twimg.com


See also

  • Data collection
  • Gy sampling theory
  • Horvitz-Thompson Estimator
  • Official statistics
  • Trades estimator
  • Replication (stats)
  • Sampling (case study)
  • Sampling error
  • Random sampling mechanism
  • Resampling (stats)
  • Sorting

Excel 2013 Statistical Analysis #45: Sampling Distribution of Pbar ...
src: i.ytimg.com


Note

The textbook by Groves et alia provides an overview of the survey methodology, including the latest literature on the development of the questionnaire (informed by cognitive psychology):

  • Robert Groves, et al. Survey methodology (2010) Second edition (2004) first edition ISBNÃ, 0-471-48348-6.

Other books focus on the theory of survey sampling statistics and require some knowledge of basic statistics, as discussed in the following textbooks:

  • David S. Moore and George P. McCabe (February 2005). " Introduction to statistical practice " (5th ed.). W.H. Freeman & amp; Company. ISBNÃ, 0-7167-6282-X.
  • Freedman, David; Pisani, Robert; Purves, Roger (2007). Statistics (fourth edition). New York: Norton. ISBNÃ, 0-393-92972-8. Archived from the original on 2008-07-06.

The basic book by Scheaffer et alia uses the square equation of high school algebra:

  • Scheaffer, Richard L., William Mendenhal, and R. Lyman Ott. Basic sampling sampling , Fifth Edition. Belmont: Duxbury Press, 1996.

More mathematical statistics are needed for Lohr, for SÃÆ'¤rndal et alia, and for Cochran (classic): Cochran, William G. (1977). sampling technique (third edition.). Wiley. ISBNÃ, 0-471-16240-X.

  • Lohr, Sharon L. (1999). Sampling: Design and analysis . Duxbury. ISBN: 0-534-35361-4.
  • SÃÆ'¤rndal, Carl-Erik, and Swensson, Bengt, and Wretman, Jan (1992). Model assisted sampling survey . Springer-Verlag. ISBNÃ, 0-387-40620-4. CS1 maint: Many names: list of authors (links)
  • Source of the article : Wikipedia

    Comments
    0 Comments