args(dplyr::filter)function (.data, ..., .by = NULL, .preserve = FALSE)
NULL
args(print)function (x, ...)
NULL
In this lab, we will learn how to find out about functions and the principles of lexical scoping.
Goal: by the end of this lab, you should be able to retrieve information about functions and understand how lexical scoping works in R.
There are many functions in R, and nobody knows how all of them work. From time to time it is helpful to retrieve information about these functions. The documentation in help() is great to read, but it doesn’t always provide the information you need.
The first question you may have is what arguments a function takes. We can learn this by running args() on the bare name of the function.
function (.data, ..., .by = NULL, .preserve = FALSE)
NULL
function (x, ...)
NULL
Note that for S3 generic function like print(), you can also find the arguments for specific methods.
function (x, ..., digits = NULL, quote = FALSE, right = TRUE,
row.names = TRUE, max = NULL)
NULL
tidyr::pivot_longer().function (data, cols, ..., cols_vary = "fastest", names_to = "name",
names_prefix = NULL, names_sep = NULL, names_pattern = NULL,
names_ptypes = NULL, names_transform = NULL, names_repair = "check_unique",
values_to = "value", values_drop_na = FALSE, values_ptypes = NULL,
values_transform = NULL)
NULL
args().The body() function returns the actual code that runs inside a function. There are a several reasons why you might want to see this:
Inspecting the code inside a function will help you to think more like a developer, as opposed to a user.
Note that a function may call unexported functions (e.g., tibble_quos()) that you won’t recognize. Since the function is not exported, you have to use the triple colon operator to view its source.
{
col_names <- given_col_names <- names2(xs)
empty_col_names <- which(col_names == "")
col_names[empty_col_names] <- names(quos_auto_name(xs[empty_col_names]))
lengths <- rep_along(xs, 0L)
output <- rep_along(xs, list(NULL))
env <- new_environment()
mask <- new_data_mask_with_data(env)
first_size <- .rows
for (j in seq_along(xs)) {
res <- eval_tidy(xs[[j]], mask)
if (!is.null(res)) {
if (single_row) {
if (vec_is(res)) {
if (vec_size(res) != 1) {
abort_tibble_row_size_one(j, given_col_names[[j]],
vec_size(res))
}
}
else {
res <- list(res)
}
}
else {
res <- check_valid_col(res, col_names[[j]], j,
call)
lengths[[j]] <- current_size <- vec_size(res)
if (is.null(first_size)) {
first_size <- current_size
}
else if (first_size == 1L && current_size !=
1L) {
idx_to_fix <- seq2(1L, j - 1L)
output[idx_to_fix] <- fixed_output <- map(output[idx_to_fix],
vec_recycle, current_size)
map2(output[idx_to_fix], col_names[idx_to_fix],
add_to_env2, env = env)
first_size <- current_size
}
else {
res <- vectbl_recycle_rows(res, first_size,
j, given_col_names[[j]], call)
}
}
output[[j]] <- res
col_names[[j]] <- add_to_env2(res, given_col_names[[j]],
col_names[[j]], env)
}
}
names(output) <- col_names
is_null <- map_lgl(output, is.null)
output <- output[!is_null]
output <- splice_dfs(output)
output <- set_repaired_names(output, repair_hint = TRUE,
.name_repair = .name_repair, call = call)
new_tibble(output, nrow = first_size %||% 0L)
}
If you try to view the body of an S3 generic function, you will find that it is rather short.
A generic function usually doesn’t do anything other than call one of its methods. Those methods have the interesting code.
{
n <- length(row.names(x))
if (length(x) == 0L) {
cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
"data frame with 0 columns and %d rows"), n), "\n",
sep = "")
}
else if (n == 0L) {
print.default(names(x), quote = FALSE)
cat(gettext("<0 rows> (or 0-length row.names)\n"))
}
else {
if (is.null(max))
max <- getOption("max.print", 99999L)
if (!is.finite(max))
stop("invalid 'max' / getOption(\"max.print\"): ",
max)
omit <- (n0 <- max%/%length(x)) < n
m <- as.matrix(format.data.frame(if (omit)
x[seq_len(n0), , drop = FALSE]
else x, digits = digits, na.encode = FALSE))
if (!isTRUE(row.names))
dimnames(m)[[1L]] <- if (isFALSE(row.names))
rep.int("", if (omit)
n0
else n)
else row.names
print(m, ..., quote = quote, right = right, max = max)
if (omit)
cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
n - n0, "rows ]\n")
}
invisible(x)
}
{
args <- pairlist(digits = digits, quote = quote, na.print = na.print,
print.gap = print.gap, right = right, max = max, width = width,
useSource = useSource, ...)
missings <- c(missing(digits), missing(quote), missing(na.print),
missing(print.gap), missing(right), missing(max), missing(width),
missing(useSource))
.Internal(print.default(x, args, missings))
}
summary method for an lm object. Do you see where the \(R^2\) is computed?{
z <- object
p <- z$rank
rdf <- z$df.residual
if (p == 0) {
r <- z$residuals
n <- length(r)
w <- z$weights
if (is.null(w)) {
rss <- sum(r^2)
}
else {
rss <- sum(w * r^2)
r <- sqrt(w) * r
}
resvar <- rss/rdf
ans <- z[c("call", "terms", if (!is.null(z$weights)) "weights")]
class(ans) <- "summary.lm"
ans$aliased <- is.na(coef(object))
ans$residuals <- r
ans$df <- c(0L, n, length(ans$aliased))
ans$coefficients <- matrix(NA_real_, 0L, 4L, dimnames = list(NULL,
c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
ans$sigma <- sqrt(resvar)
ans$r.squared <- ans$adj.r.squared <- 0
ans$cov.unscaled <- matrix(NA_real_, 0L, 0L)
if (correlation)
ans$correlation <- ans$cov.unscaled
return(ans)
}
if (is.null(z$terms))
stop("invalid 'lm' object: no 'terms' component")
if (!inherits(object, "lm"))
warning("calling summary.lm(<fake-lm-object>) ...")
Qr <- qr.lm(object)
n <- NROW(Qr$qr)
if (is.na(z$df.residual) || n - p != z$df.residual)
warning("residual degrees of freedom in object suggest this is not an \"lm\" fit")
r <- z$residuals
f <- z$fitted.values
if (!is.null(z$offset)) {
f <- f - z$offset
}
w <- z$weights
if (is.null(w)) {
mss <- if (attr(z$terms, "intercept"))
sum((f - mean(f))^2)
else sum(f^2)
rss <- sum(r^2)
}
else {
mss <- if (attr(z$terms, "intercept")) {
m <- sum(w * f/sum(w))
sum(w * (f - m)^2)
}
else sum(w * f^2)
rss <- sum(w * r^2)
r <- sqrt(w) * r
}
resvar <- rss/rdf
if (is.finite(resvar) && resvar < (mean(f)^2 + var(c(f))) *
1e-30)
warning("essentially perfect fit: summary may be unreliable")
p1 <- 1L:p
R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
se <- sqrt(diag(R) * resvar)
est <- z$coefficients[Qr$pivot[p1]]
tval <- est/se
ans <- z[c("call", "terms", if (!is.null(z$weights)) "weights")]
ans$residuals <- r
ans$coefficients <- cbind(Estimate = est, `Std. Error` = se,
`t value` = tval, `Pr(>|t|)` = 2 * pt(abs(tval), rdf,
lower.tail = FALSE))
ans$aliased <- is.na(z$coefficients)
ans$sigma <- sqrt(resvar)
ans$df <- c(p, rdf, NCOL(Qr$qr))
if (p != attr(z$terms, "intercept")) {
df.int <- if (attr(z$terms, "intercept"))
1L
else 0L
ans$r.squared <- mss/(mss + rss)
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n -
df.int)/rdf)
ans$fstatistic <- c(value = (mss/(p - df.int))/resvar,
numdf = p - df.int, dendf = rdf)
}
else ans$r.squared <- ans$adj.r.squared <- 0
ans$cov.unscaled <- R
dimnames(ans$cov.unscaled) <- dimnames(ans$coefficients)[c(1,
1)]
if (correlation) {
ans$correlation <- (R * resvar)/outer(se, se)
dimnames(ans$correlation) <- dimnames(ans$cov.unscaled)
ans$symbolic.cor <- symbolic.cor
}
if (!is.null(z$na.action))
ans$na.action <- z$na.action
class(ans) <- "summary.lm"
ans
}
It is also important to know in what environment a function is created. The environment() function tells us.
We’ll learn more about environments in the next chapter.
R uses lexical scoping. This means that when R looks for the value of names, it depends on the state of things when the function is defined.
R looks for variables inside a function definition first (local variables), but if it can’t find them, it looks in the parent environment. In this case, when my_fun() is executed, R looks for the value of global_var. Many programming languages will throw an error in this situation, because global_var is not defined in the environment created by my_fun(). However, R just keeps looking for global_var in the parent environment (which in this case is the global environment).
[1] 24
<environment: R_GlobalEnv>
Note also that if we try to re-define global_var inside the function definition, we are masking the name global_var.
Note the difference reported by findGlobals().
[1] "{" "+" "global_var"
[1] "{" "+" "<-"
my_fun() in Python or Java. Does it work?Local variables that are created inside a function never see the light of day in the parent environment.
[1] 24
Error: object 'local_var' not found
Because of name masking, values of objects in the parent environment can affect the behavior of a function. This behavior is often useful in data analysis scripts, but can be problematic in more formal programming.
This means that if we reset the value of global_var, we change the behavior of my_fun().
Give an example of a situation in which dynamic lookup is helpful.
Give an example of a situation in which dynamic lookup could be problematic.
A function that depends only on its inputs is called pure.
Take a minute to think about what questions you still have about subsetting. Review what questions have been posted (in the #questions channel) recently by other students and either:
Prompt: Which topic(s) from the chapter on functions could use further elaboration?