Skip to content

Latest commit

 

History

History
157 lines (157 loc) · 4.99 KB

Statistics.md

File metadata and controls

157 lines (157 loc) · 4.99 KB

Outline of Statistics

References

1. Basics

1.1 Four Core Concepts

  • Population
  • Parameter
  • Sample
  • Statistics

1.2 Two Types of Data

  • Categorical/Qualitative (aka Dimensions)
    • Nominal (not ordered)
    • Ordinal (ordered)
  • Numerical/Quantitative/ (aka Measures)
    • Discrete (finite counts of values)
    • Continuous (infinite counts of values)
      • Interval (add, subtract)
      • Ratio (divide, multiple, zero)

1.3 Sampling

  • Nonprobability Sampling
  • Probability Sampling
    • Simple Random Sampling with Replacement (A selected unit is placed back in the population and has another chance to be selected again)
    • Simple Random Sampling without Replacement (A selected unit is placed back and hence not eligible to be selected again)

1.4 Sources of Data

  • Primary Data (Collected first-hand by the researchers)
    • Field study
    • Survey design
    • Experimental design
    • Simulation & modeling
  • Secondary Data (collected by other people)
    • Published by researchers & institutions
    • Published by government agencies

1.5 Types of Research

  • Experimental Study
    • Response Variable
    • Treatments/Control Variables
  • Observational Study
    • Ethnoggraphy
    • Interview
    • Panel interview
    • Delphi

2. Descriptive Statistics

2.1 Measure of Centrality/Location

  • Mean
    • Arithmetic mean (average)
    • Weighted Mean
    • Trimmed Mean
    • Geometric Mean
  • Median (midpoint, 50% above it, 50% below it)
  • Mode (most frequent occurrance)

2.2 Measure of Dispersion/Scale

  • Range (Maximum - Minimum)
  • Variance (The mean of the squared differences from the mean.)
    • population vs sample (N vs N-1)
    • Degree of freedom
  • Standard Deviation (The squared root of variance)
  • Outliers (unusual observations)

2.3 Percentiles, Quantiles, and Quartiles

  • Quartiles
    • Q1 (25%)
    • Q2 (50%, Median)
    • Q3 (75%)
  • Interquartile Range (IQR) (Q3 - Q1)

2.4 Five-Number Summary (corresponding to Boxplot or Box and Whisker Plot)

  • Min
  • Q1
  • Q2
  • Q3
  • Max

2.5 Measure of Association Between Two Variables

  • Correlation/Covariance
  • Correlation Coefficient (r is between -1 and 1)

3. Data Visualization

3.1 Categorical Variables

  • Frequency Table/Frequency Distribution
    • Relative Frequency (number between 0 and 1 relative to the total count)
    • Percent Frequency ( % of the total count)
  • Bar Charts
    • Pareto Charts (Ordered from highest to lowest, may include a dotted line indicating the cumulative relative or percent frequency)
  • Pie Charts

3.2 Numerical Variables

Typical Traits of Interests:

  • Central Tendency
  • Dispersion
  • Shape
  • Outliers
  • Trend, Seasonality,and Temporal Dependence

Types of Visualizations:

  • Box Plots (Five number summary statistics)
  • Dot Plots
  • Histograms
  • Line Charts
  • Area Charts

Logarithmic Scale (commonly used for economic and financial data)

Visualization of Two or More Variables

  • Multiple Categorical Variable
    • Contigency Table
    • Stacked Bar Charts
    • Grouped Bar Charts
  • Multiple Numerical Variables
    • Stacked Dot Plots
    • Stacked Histograms
    • Stacked Area Plots
    • Scatterplots (two variables)
    • Bubble Plots (three variables)

4. Probability

4.1 Basics

  • Sample Space
  • Elementary Event
    • Complement of an Event
    • Intersection of two Events
    • Mutually exclusive or Disjoint
    • Union of two Events
  • Probability of an Event
  • Types of Probability
    • Thoretical Probability
    • Empiricial Probability
    • Subjective Probability
  • Law of Large Numbers (LLN) (As observations increases, empiricial probabilities converge to theoretical probabilities)
  • Independent Events, Conditional Probability, and Bayes' Theorem

4.2 Discrete Random Variables

  • Probability Distribution Function (PDF)/Probability Mass Function (PMF): F(x) = Pr(X = x)
  • Cumulative distribution function (CDF): F(x) = Pr(X ≤ x)
  • Bernouli Random Variables
  • Binomial Random Variables

4.3 Continuous Random Variables

  • Probability Density Function (PDF):F(x) = P(a < X < b)
  • Cumulative distribution function (CDF): F(x) = Pr(X ≤ x)
  • Uniform Probability Distribution
  • Normal Distributions
    • Standardization
    • Z Distribution/z Score

5. Sample Statistics and Sampling Distribution

5.1 Basics

  • Population Distribution
  • Data Distribution
  • Sampling Distribution

5.2 Sample Statistics

  • Sample Mean
  • Sample Variance (degree of freedom)
  • Sample Standard Deviation (degree of freedom)
  • Samele Standard Error
  • Central Limit Theorem (CLT)

6. Interval Estimation

6.1 Basics

  • Point Estimators
  • Margin of Errors
  • Confidence Interval
  • Confidence Level (of Confidence Interval)

7. Hypothesis Testing (Challenge the Status Quo)

7.1 One Population

7.2 Two Populations

8. Analysis of Variance (ANOVA)

9. Regression

9.1 Simple Linear Regression

9.2 Multiple Linear Regression

10. Time Series and Forcasting