Report, 2nd draft

Authors
Affiliation

Description

As its name might suggest, a statistical analysis plan (SAP) provides a detailed outline of the planned statistical analyses for a study. In the third phase of the SDS 210 project, you will return to and expand on your initial project proposal to create a statistical analysis plan. You will refine your study population, aims, primary response variables, and explanatory variables based on your data wrangling and exploratory data analysis experience. You will also outline which inferential statistical procedures you will use to formally investigate your primary and secondary hypotheses.

Format

Your final statistical analysis plan should be submitted as a PDF file, rendered from a Quarto file (*.qmd).

Content

Edit your previous draft so that it contains the following elements in the appropriate sections:

In the beginning of the Introduction

Write one paragraph providing an introduction to your study. You should aim to answer the following questions as part of your response:

  • What is the purpose of your study? In other words, what phenomenon or trend are you planning to investigate?
  • What is the importance or relevance of this topic?

In the middle of the Introduction

Study Objectives and Endpoints

Identify and describe the primary objectives of your study, as well as the relevant outcome(s) or response variable(s), that you will use to address these objectives. You should describe these variables in sufficient detail that someone unfamiliar with your project and dataset could understand:

  • what response variable(s) you plan to study, and
  • how those variables are measured/defined.

Study Population

Provide an updated short description of the population that you intend to include in your study, as well as the larger population/phenomenon to which you will generalize your results. Please be sure to address the following in your response:

  • What are the observational units for this population?
  • What exclusion criteria do you intend to implement? In other words, do you intend to remove observations from the dataset that have missing data? Have you chosen to restrict your focus to only a subset of your original sample? If so, please describe that process.
  • What limitations, biases, or threats to generalizability do you anticipate to be present in your study?

In the Data section

Study Variables

Provide an updated list of the explanatory variables that you will include in your analysis. You should describe these variables in sufficient detail that someone unfamiliar with your project and dataset could understand:

  • what explanatory variables you plan to study, and
  • how those variables are measured/defined.

It is not sufficient to simply list the column names of the relevant variables in your dataset. If you have derived any new variables—for example, by collapsing the levels of an existing categorical variable, by merging in a new dataset, or by transforming existing variables by, e.g., changing the units of measurement—please also describe how you constructed these new predictors.

In the Methods section

Analysis Plan

Based in part on the results of your exploratory data analysis, propose an appropriate set of statistical analyses to address your research question(s). You must also expand the simple linear regression model from your exploratory data analysis to include at least one additional predictor.

Primary Analysis

Given what you previously identified as the primary objectives of your study, highlight one primary statistical analysis that you will conduct. For this primary analysis:

  • State the null and alternative hypotheses, both in words and using mathematical/statistical notation.
  • Describe briefly what statistical method(s) you plan to use to analyze these data.
    • Do you plan to fit a linear regression model? If so, please write out the population regression model you intend to fit.
    • Do you plan to use computational procedures to construct a confidence interval or conduct a hypothesis test? If so, what statistic will you be bootstrapping/randomizing, and how many resamples/randomizations do you intend to conduct?
    • Do you plan to use a probabilistic model instead? If so, what reference distribution will you use (e.g., Normal, \(\chi^2\), \(t\))? Make sure that the technical conditions for your chosen approach are met in your data.

Secondary Analyses

All other statistical analyses should be considered secondary or exploratory and listed separately from your primary analysis. For each of these secondary/exploratory analyses, please once again:

  • State the null and alternative hypotheses, both in words and using mathematical/statistical notation.
  • Describe briefly what statistical method(s) you plan to use to analyze these data, answering the same questions as in Section 5.A above.

In the Discussion section

Limitations

Missing Data

Describe briefly (in one or two sentences) what approach you will take to address missing data in your sample. Discuss any potential limitations of this approach. If you did not have any missing data in your original dataset, you may leave this section blank.

Statistical Significance and Multiple Testing

What confidence level/significance level will you use for your analyses? Do you have any concerns about multiple testing? If so, please articulate your concerns.

Submission

Please have one member of your group turn in a PDF to Moodle by the due date.