Skip to contents

Tidy methods for changepoint analysis

The tidychangepoint package allows you to use any number of algorithms for detecting changepoint sets in univariate time series with a common, tidyverse-compliant interface. It also provides model-fitting procedures for commonly-used parametric models, tools for computing various penalty functions, and graphical diagnostic displays.

Changepoint sets are computed using the segment() function, which takes a numeric vector that is coercible into a ts object, and a string indicating the algorithm you wish you use. segment() always returns a tidycpt object.

x <- segment(DataCPSim, method = "pelt")
class(x)
#> [1] "tidycpt"

Various methods are available for tidycpt objects. For example, as.ts() returns the original data as ts object, and changepoints() returns the set of changepoint indices.

changepoints(x)
#> [1] 547 822 972

Retrieving information using the broom interface

tidychangepoint follows the design interface of the broom package. Therefore, augment(), tidy() and glance() methods exists for tidycpt objects.

  • augment() returns a tsibble that is grouped according to the regions defined by the changepoint set.
augment(x)
#> # A tsibble: 1,096 x 5 [1]
#> # Groups:    region [4]
#>    index     y region  .fitted  .resid
#>    <int> <dbl> <fct>     <dbl>   <dbl>
#>  1     1  35.5 [0,547)    35.3   0.232
#>  2     2  29.0 [0,547)    35.3  -6.27 
#>  3     3  35.6 [0,547)    35.3   0.357
#>  4     4  33.0 [0,547)    35.3  -2.29 
#>  5     5  29.5 [0,547)    35.3  -5.74 
#>  6     6  25.4 [0,547)    35.3  -9.87 
#>  7     7  28.8 [0,547)    35.3  -6.45 
#>  8     8  50.3 [0,547)    35.3  15.0  
#>  9     9  24.9 [0,547)    35.3 -10.3  
#> 10    10  58.9 [0,547)    35.3  23.6  
#> # ℹ 1,086 more rows
  • tidy() returns a tbl that provides summary statistics for each region. These include any parameters that were fit, which are prefixed in the output by param_.
tidy(x)
#> # A tibble: 4 × 10
#>   region  num_obs   min   max  mean    sd begin   end param_mu param_sigma_hatsq
#>   <chr>     <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>             <dbl>
#> 1 [0,547)     546  13.7  92.8  35.3  11.3     0   547     35.3              127.
#> 2 [547,8…     275  20.5 163.   58.1  19.3   547   822     58.1              372.
#> 3 [822,9…     150  39.2 215.   96.7  30.5   822   972     96.7              924.
#> 4 [972,1…     125  67.2 299.  156.   49.6   972  1096    156.              2442.
  • glance() returns a tbl that provides summary statistics for the algorithm. This includes the fitness, which is the value of the penalized objective function that was used.
glance(x)
#> # A tibble: 1 × 8
#>   pkg      version algorithm seg_params model_name criteria fitness elapsed_time
#>   <chr>    <pckg_> <chr>     <list>     <chr>      <chr>      <dbl> <drtn>      
#> 1 changep… 2.2.4   PELT      <list [1]> meanvar    MBIC       9403. 0.063 secs

Other methods

The plot() method leverages ggplot2 to provide an informative plot, with the regions defined by the changepoint set clearly demarcated, and the means within each region also indicated.

plot(x)

Other generic functions defined for tidycpt objects include fitness(), as.model(), and exceedances(). For example, fitness() returns a named vector with the value of the penalized objective function used.

fitness(x)
#>     MBIC 
#> 9403.391

Structure

Every tidycpt objects contains two main children:

  • segmenter: The object that results from the changepoint detection algorithm. These can be of any class. Methods for objects of class cpt, ga, and wbs are currently implemented, and as well as seg_basket (the default internal class). Given a data set, a model, and a penalized objective function, a segmenter’s job is to search the exponentially-large space of possible changepoint sets for the one that optimizes the penalized objective function (over the space of possible changepoint sets). Some segmenting algorithms (e.g., PELT) are deterministic, while others (e.g., genetic algorithms) are randomized.
  • model: A model object inheriting from mod_cpt, an internal class for representing model objects. Model objects are created by model-fitting functions, all of whose names start with fit_. The model of a tidycpt object is the model object returned by the fit_*() function that corresponds to the one used by the segmenter. Given a data set, a model description, and a set of changepoints, the corresponding model-fitting function finds the values of the model parameters that optimize the model fit to the data.

Both segmenters and models implement methods for the generic functions changepoints(), as.ts(), nobs(), logLik(), model_name(), and glance(). However, it is important to note that while tidychangepoint does its best to match the model used by the segmenter to its corresponding model-fitting function, exact matches do not always exist. Thus, the logLik() of the segmenter may not always match the logLik() of the model. Nevertheless, squaring these values is the focus of ongoing work.

Segmenters

In the example above, the segmenter is of class cpt, because segment() simply wraps the cpt.meanvar() function from the changepoint package.

x |>
  as.segmenter() |>
  str()
#> Formal class 'cpt' [package "changepoint"] with 12 slots
#>   ..@ data.set : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>   ..@ cpttype  : chr "mean and variance"
#>   ..@ method   : chr "PELT"
#>   ..@ test.stat: chr "Normal"
#>   ..@ pen.type : chr "MBIC"
#>   ..@ pen.value: num 28
#>   ..@ minseglen: num 2
#>   ..@ cpts     : int [1:4] 547 822 972 1096
#>   ..@ ncpts.max: num Inf
#>   ..@ param.est:List of 2
#>   .. ..$ mean    : num [1:4] 35.3 58.2 96.8 156.5
#>   .. ..$ variance: num [1:4] 127 371 921 2406
#>   ..@ date     : chr "Wed Apr 24 21:56:29 2024"
#>   ..@ version  : chr "2.2.4"

In addition to the generic functions listed above, segmenters implement methods for the generic functions fitness(), model_args(), and seg_params().

Models

The model object in this case is created by fit_meanvar(), and is of class mod_cpt.

x |>
  as.model() |>
  str()
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:3] 547 822 972
#>  $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>   ..$ param_mu         : num [1:4] 35.3 58.1 96.7 155.9
#>   ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#>   .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"

In addition to the generic functions listed above, models implement methods for the generic functions fitted(), residuals(), coef(), augment(), tidy(), and plot().