Applied Statistics with R
1
Introduction
1.1
About This Book
1.2
Conventions
1.3
Acknowledgements
1.4
License
2
Introduction to
R
2.1
Getting Started
2.2
Basic Calculations
2.3
Getting Help
2.4
Installing Packages
3
Data and Programming
3.1
Data Types
3.2
Data Structures
3.2.1
Vectors
3.2.2
Vectorization
3.2.3
Logical Operators
3.2.4
More Vectorization
3.2.5
Matrices
3.2.6
Lists
3.2.7
Data Frames
3.3
Programming Basics
3.3.1
Control Flow
3.3.2
Functions
4
Summarizing Data
4.1
Summary Statistics
Central Tendency
Spread
Categorical
4.2
Plotting
4.2.1
Histograms
4.2.2
Barplots
4.2.3
Boxplots
4.2.4
Scatterplots
5
Probability and Statistics in
R
5.1
Probability in
R
5.1.1
Distributions
5.2
Hypothesis Tests in
R
5.2.1
One Sample t-Test: Review
5.2.2
One Sample t-Test: Example
5.2.3
Two Sample t-Test: Review
5.2.4
Two Sample t-Test: Example
5.3
Simulation
5.3.1
Paired Differences
5.3.2
Distribution of a Sample Mean
6
R
Resources
6.1
Beginner Tutorials and References
6.2
Intermediate References
6.3
Advanced References
6.4
Quick Comparisons to Other Languages
6.5
RStudio and RMarkdown Videos
6.6
RMarkdown Template
7
Simple Linear Regression
7.1
Modeling
7.1.1
Simple Linear Regression Model
7.2
Least Squares Approach
7.2.1
Making Predictions
7.2.2
Residuals
7.2.3
Variance Estimation
7.3
Decomposition of Variation
7.3.1
Coefficient of Determination
7.4
The
lm
Function
7.5
Maximum Likelihood Estimation (MLE) Approach
7.6
Simulating SLR
7.7
History
7.8
R
Markdown
8
Inference for Simple Linear Regression
8.1
Gauss–Markov Theorem
8.2
Sampling Distributions
8.2.1
Simulating Sampling Distributions
8.3
Standard Errors
8.4
Confidence Intervals for Slope and Intercept
8.5
Hypothesis Tests
8.6
cars
Example
8.6.1
Tests in
R
8.6.2
Significance of Regression, t-Test
8.6.3
Confidence Intervals in
R
8.7
Confidence Interval for Mean Response
8.8
Prediction Interval for New Observations
8.9
Confidence and Prediction Bands
8.10
Significance of Regression, F-Test
8.11
R
Markdown
9
Multiple Linear Regression
9.1
Matrix Approach to Regression
9.2
Sampling Distribution
9.2.1
Single Parameter Tests
9.2.2
Confidence Intervals
9.2.3
Confidence Intervals for Mean Response
9.2.4
Prediction Intervals
9.3
Significance of Regression
9.4
Nested Models
9.5
Simulation
9.6
R
Markdown
10
Model Building
10.1
Family, Form, and Fit
10.1.1
Fit
10.1.2
Form
10.1.3
Family
10.1.4
Assumed Model, Fitted Model
10.2
Explanation versus Prediction
10.2.1
Explanation
10.2.2
Prediction
10.3
Summary
10.4
R
Markdown
11
Categorical Predictors and Interactions
11.1
Dummy Variables
11.2
Interactions
11.3
Factor Variables
11.3.1
Factors with More Than Two Levels
11.4
Parameterization
11.5
Building Larger Models
11.6
R
Markdown
12
Analysis of Variance
12.1
Experiments
12.2
Two-Sample t-Test
12.3
One-Way ANOVA
12.3.1
Factor Variables
12.3.2
Some Simulation
12.3.3
Power
12.4
Post Hoc Testing
12.5
Two-Way ANOVA
12.6
R
Markdown
13
Model Diagnostics
13.1
Model Assumptions
13.2
Checking Assumptions
13.2.1
Fitted versus Residuals Plot
13.2.2
Breusch-Pagan Test
13.2.3
Histograms
13.2.4
Q-Q Plots
13.2.5
Shapiro-Wilk Test
13.3
Unusual Observations
13.3.1
Leverage
13.3.2
Outliers
13.3.3
Influence
13.4
Data Analysis Examples
13.4.1
Good Diagnostics
13.4.2
Suspect Diagnostics
13.5
R
Markdown
14
Transformations
14.1
Response Transformation
14.1.1
Variance Stabilizing Transformations
14.1.2
Box-Cox Transformations
14.2
Predictor Transformation
14.2.1
Polynomials
Response Transformations
Predictor Transformations
14.2.2
A Quadratic Model
14.2.3
Overfitting and Extrapolation
14.2.4
Comparing Polynomial Models
14.2.5
poly()
Function and Orthogonal Polynomials
14.2.6
Inhibit Function
14.2.7
Data Example
14.3
R
Markdown
15
Collinearity
15.1
Exact Collinearity
15.2
Collinearity
15.2.1
Variance Inflation Factor.
15.3
Simulation
15.4
R
Markdown
16
Variable Selection and Model Building
16.1
Quality Criterion
16.1.1
Akaike Information Criterion
16.1.2
Bayesian Information Criterion
16.1.3
Adjusted R-Squared
16.1.4
Cross-Validated RMSE
16.2
Selection Procedures
16.2.1
Backward Search
16.2.2
Forward Search
16.2.3
Stepwise Search
16.2.4
Exhaustive Search
16.3
Higher Order Terms
16.4
Explanation versus Prediction
16.4.1
Explanation
16.4.2
Prediction
16.5
R
Markdown
17
Logistic Regression
17.1
Generalized Linear Models
17.2
Binary Response
17.2.1
Fitting Logistic Regression
17.2.2
Fitting Issues
17.2.3
Simulation Examples
17.3
Working with Logistic Regression
17.3.1
Testing with GLMs
17.3.2
Wald Test
17.3.3
Likelihood-Ratio Test
17.3.4
SAheart
Example
17.3.5
Confidence Intervals
17.3.6
Confidence Intervals for Mean Response
17.3.7
Formula Syntax
17.3.8
Deviance
17.4
Classification
17.4.1
spam
Example
17.4.2
Evaluating Classifiers
17.5
R
Markdown
18
Beyond
18.1
What’s Next
18.2
RStudio
18.3
Tidy Data
18.4
Visualization
18.5
Web Applications
18.6
Experimental Design
18.7
Machine Learning
18.7.1
Deep Learning
18.8
Time Series
18.9
Bayesianism
18.10
High Performance Computing
18.11
Further
R
Resources
19
Appendix
© 2016 - 2022 David Dalpiaz
Applied Statistics with R
Chapter 19
Appendix