Correlations: First, correlations always range from negative 1 to positive 1.
The closer the relationship is to 1, the more in-alignment those variables are, so they
move together either in the same direction if it’s a positive correlation, or in opposite
directions if it’s a negative correlation.
So that’s kind of the fundamental piece of a correlation; there’s two key aspects: the number itself – which always ranges from plus 1 to minus 1 – and then the direction it’s headed, which is the sign, the plus or the negative.
What is a Variable?
A variable is an attribute that can be used to describe a person, place, or thing. In the case of
statistics, it is any attribute that can be represented as a number. The numbers used to represent
variables fall into two categories:
Quantitative variables are those for which the value has numerical meaning. The value
refers to a specific amount of something. The higher the number, the more of some
attribute the object has. For example, temperature, sales, and number of flyers posted
are quantitative variables. Quantitative variables can be:
o Continuous: A value that is measured along a scale (e.g., temperature) or
o Discrete:Avaluethatiscountedinfixedunits(e.g.,thenumberofflyers
distributed).
Categorical variables are those for which the value indicates group membership. Thus,
you can’t say that one person, place, or thing has more/less of something based on the
number assigned to it because it’s arbitrary. In Rosie’s data, location where the drinks are
sold is a categorical variable. Gender is a classic example.
Population vs. Sample
A population includes all elements of interest.
A sample consists of a subset of observations from a population. As a result, multiple samples
can be drawn from the same population. A measurable outcome of a population is called a
parameter; in a sample, it is called a statistic.
Measures of Central Tendency
Central tendency measures simply provide information on the most typical values in the data for
a given variable.
Mean: Mean represents the average value of the variable and is calculated by summing across all
observations and dividing by the total number of observations.
Median: The median is the middle most value in the data for a given variable; 50% of the values are
above, 50% of the values are below. To find the median, you must order your data from smallest
to largest.
Md = (n+1)/2 Mode:The mode is the most frequently occurring value for a given variable; if there is more than one
mode, report them all. The best way to identify the mode is to plot the data using a histogram.
Measures of Variability
Range: simply the minimum value subtracted
from the maximum value:
Range = Max (i) – Min(i) Variance measures the dispersion of the data from the mean : As you can see, variance is the sum of each observation’s deviation from the mean. We must
square these deviations because if we didn’t, the sum would always be zero. Standard deviation is the square root of the variance.
In normal distributions, 68% of the data fall within +/-1 standard deviation from the mean; 95%
within 2 standard deviations, and 99% within 3 standard deviations. However, the data can take on other shapes, including right (positive) skewed, where the tail of
the distribution is on the right side of the curve (as indicated by a median and mode that are
less than the mean) or left (negative) skewed, where the tail is to the left (as indicated by a
median and mode that are greater than the mean).
Standard error:It indicates how close the sample mean is
from the true population mean. The means obtained from samples are estimates of the
population mean, and it will vary if we were to calculate the means of different samples from the
same population.
𝑆𝐸 = S √𝑛 Kurtosis is a measure of peakedness. Is the distribution tall and narrow, or is it short and
flat? Skewness is a measure of the symmetry of the data. The skewness value indicates the
direction of the tail. If it is positive, the distribution is right skewed; if negative, the
distribution is left skewed. A normal distribution has a skew of 0.