Report, final draft
Your technical report will be an annotated Quarto file (.qmd) that contains your R code, interspersed with explanations of what the code is doing, and what it tells you about the problem.
Format
Your final statistical analysis plan should be submitted as a PDF file, rendered from a Quarto file (*.qmd).
Content
Continue to build on your previous draft, fleshing out all of the sections. At this stage, you must write the Results section, add an Abstract (use the abstract field in the YAML header), and revise all of the sections in the paper.
You should not present all of the R code that you wrote throughout the process of working on this project. Rather, the technical report should contain the minimal set of R code that is necessary to understand your results and findings in full. If you make a claim, it must be justified by explicit calculation. A knowledgeable reviewer should be able to compile your .qmd file without modification, and verify every statement that you have made. All of the R code necessary to produce your figures and tables must appear in the technical report. In short, the technical report should enable a reviewer to reproduce your work in full.
Please read the Statistical & Data Sciences Writing Plan for more information.
Organization
Your technical report should follow this basic format:
- Abstract: a short, one paragraph explanation of your project. The abstract should not consist of more than 5 or 6 sentences, but should relate what you studied and what you found. It need only convey a general sense of what you actually did. The purpose of the abstract is to give a prospective reader enough information to decide if they want to read the full paper.
- Introduction: an overview of your project. In a few paragraphs, you should explain clearly and precisely what your research question is, why it is interesting, and what contribution you have made towards answering that question. You should give an overview of the specifics of your model, but not the full details. Most readers never make it past the introduction, so this is your chance to hook the reader, and is in many ways the most important part of the paper!
- Data: a brief description of your data set. What variables are included? Where did they come from? What are units of measurement? What is the population that was sampled? How was the sample collected? You should also include some basic univariate analysis. Include at least one data graphic and at least one table in this section.
- Methods: a specification for the methods you used. Describe the models, techniques, and/or inferential procedures that you used. Write the mathematical formula for your regression model, with all of the coefficients identified clearly. State an assumptions necessary to use whatever methods you are using.
- Results: an explanation of what your model tells us about the research question. You should interpret coefficients in context and explain their relevance. What does your model tell us that we didn’t already know before? You may want to include negative results, but be careful about how you interpret them. For example, you may want to say something along the lines of: “we found no evidence that explanatory variable \(x\) is associated with response variable \(y\),” or “explanatory variable \(x\) did not provide any additional explanatory power above what was already conveyed by explanatory variable \(z\).” On other hand, you probably shouldn’t claim: “there is no relationship between \(x\) and \(y\).”
- Conclusion: a summary of your findings and a discussion of their limitations. First, remind the reader of the question that you originally set out to answer, and summarize your findings. Second, discuss the limitations of your model, and what could be done to improve it. You might also want to do the same for your data. This is your last opportunity to clarify the scope of your findings before a journalist misinterprets them and makes wild extrapolations! Protect yourself by being clear about what is not implied by your research.
Tone
This document should be written for peer reviewers, who comprehend statistics at least as well as you do. You should aim for a level of complexity that is more statistically sophisticated than an article in the Science section of The New York Times, but less sophisticated than an academic journal. [Chance magazine might provide a good example.] For example, you may use terms that that you will likely never see in the Times (e.g., bootstrap), but should not dwell on technical points with no obvious ramifications for the reader (e.g., reporting F-statistics). Your goal for this paper is to convince a statistically-minded reader (e.g., a student in this class, or a student from another school who has taken an introductory statistics class) that you have addressed an interesting research question in a meaningful way. Even a reader with no background in statistics should be able to read your paper and get the gist of it.
Additional Thoughts
The technical report is not simply a dump of all the R code you wrote during this project. Rather, it is a narrative, with technical details, that describes how you addressed your research question. You should not present tables or figures without a written explanation of the information that is supposed to be conveyed by that table or figure.
Keep in mind the distinction between data and information. Data is just numbers, whereas information is the result of analyzing that data and digesting it into meaningful ideas that human beings can understand. Your technical report should allow a reviewer to follow your steps from converting data into information.
There is no limit to the length of the technical report, but it should not be longer than it needs to be. You will not receive extra credit for simply describing your data ad infinitum. For example, simply displaying a table with the means and standard deviations of your variables is not meaningful. Writing a sentence that reiterates the content of the table (e.g. “the mean of variable \(x\) was 34.5 and the standard deviation was 2.8…”) is equally meaningless. What you should strive to do is interpret these values in context (e.g. “although variables \(x_1\) and \(x_2\) have similar means, the spread of \(x_1\) is much larger, suggesting…”).
You should present figures and tables in your technical report in context. These items should be understandable on their own – in the sense that they have understandable titles, axis labels, legends, and captions. Someone glancing through your technical report should be able to make sense of your figures and tables without having to read the entire report. That said, you should also include a discussion of what you want the reader to learn from your figures and tables.
Your report should be prepared in Quarto (.qmd) and submitted to Moodle as the corresponding rendered output (.pdf) file.
General advice
Every project is different, but here is some general advice culled from years of experience.
- DO read the Statistical & Data Sciences Writing Plan
- DO examine the distribution of your response variable. If it is non-normal, and your residuals are non-normal, you might want to try to transform the response variable. If the distribution of the response is right-skewed, then applying a
log()transformation can often fix many problems.- DO think very carefully about how this changes the interpretation of your coefficients!! Try writing out the equation of the regression model. How does a one unit change in the explanatory translate into a one unit change in the response?
- DON’T spend time worrying about model selection (e.g. backwards-elimination, etc.). Remember that there is no “best” model, and model pruning is most useful for predicting future values of the response variable, which is not the focus of most of your projects. We are far more interested in your ability to correctly interpret model coefficients, assess statistical significance, and analyze residuals, then we are in your ability to do model selection.
- DO include more advanced elements of Quarto – such as images, table, links, and references – in your write-up where appropriate.
- DO investigate the bivariate relationships between your variables. Are any of them highly correlated?
- DO carefully consider what to do about missing data. If you have a single variable that has a lot of missingness, can it be omitted from the model?
- DO disclose what you have done with the missing data. Were the data missing at random? Could the omission of these data have introduced some bias into your sample?
- DO spend a few minutes at the beginning of your paper introducing and motivating the problem, and providing context for your data and your question.
- DON’T forget to include a short limitation section at the end of your write-up.
- DON’T forget to do residual analysis.
- DO employ judicious rounding to make your results easier to read.
- DON’T forget about the Ecological Fallacy – if you have data about states or counties, you cannot interpret your results in terms of individual people.
- DO focus on relating to your audience what they can learn about the real-world problem from your model.
Submission
Please have one member of your group turn in a PDF to Moodle by the due date.