Skip to contents

A wrapper function that encapsulates various algorithms for detecting changepoint sets in univariate time series.

Usage

segment(x, method = "null", ...)

# S3 method for tbl_ts
segment(x, method = "null", ...)

# S3 method for xts
segment(x, method = "null", ...)

# S3 method for numeric
segment(x, method = "null", ...)

# S3 method for ts
segment(x, method = "null", ...)

Arguments

x

a numeric vector coercible into a stats::ts object

method

a character string indicating the algorithm to use. See Details.

...

arguments passed to methods

Value

An object of class tidycpt. Every tidycpt object contains:

  • segmenter: The object returned by the underlying changepoint detection algorithm.

  • model: A model object inheriting from mod_cpt, as created by as.model() when called on the segmenter.

  • elapsed_time: The clock time that passed while the algorithm was running.

  • time_index: If available, the labels for the time indices of the time series.

Details

Currently, segment() can use the following algorithms, depending on the value of the method argument:

  • pelt: Uses the PELT algorithm as implemented in changepoint::cpt.meanvar(). The segmenter is of class cpt.

  • binseg: Uses the Binary Segmentation algorithm as implemented by changepoint::cpt.meanvar(). The segmenter is of class cpt.

  • segneigh: Uses the Segmented Neighborhood algorithm as implemented by changepoint::cpt.meanvar(). The segmenter is of class cpt.

  • single-best: Uses the AMOC criteria as implemented by changepoint::cpt.meanvar(). The segmenter is of class cpt.

  • wbs: Uses the Wild Binary Segmentation algorithm as implemented by wbs::wbs(). The segmenter is of class wbs.

  • ga: Uses the Genetic algorithm implemented by segment_ga(), which wraps GA::ga(). The segmenter is of class tidyga.

  • coen: Uses Coen's heuristic as implemented by segment_coen(). The segmenter is of class seg_basket().

  • manual: Uses the vector of changepoints in the tau argument. The segmenter is of class seg_cpt()`.

  • null: The default. Uses no changepoints. The segmenter is of class seg_cpt()`.

Examples

mod_null <- segment(DataCPSim)
augment(mod_null)
#> # A tsibble: 1,096 x 5 [1]
#> # Groups:    region [1]
#>    index     y region      .fitted .resid
#>    <int> <dbl> <fct>         <dbl>  <dbl>
#>  1     1  35.5 [0,1.1e+03]    63.2 -27.7 
#>  2     2  29.0 [0,1.1e+03]    63.2 -34.2 
#>  3     3  35.6 [0,1.1e+03]    63.2 -27.5 
#>  4     4  33.0 [0,1.1e+03]    63.2 -30.2 
#>  5     5  29.5 [0,1.1e+03]    63.2 -33.6 
#>  6     6  25.4 [0,1.1e+03]    63.2 -37.8 
#>  7     7  28.8 [0,1.1e+03]    63.2 -34.3 
#>  8     8  50.3 [0,1.1e+03]    63.2 -12.9 
#>  9     9  24.9 [0,1.1e+03]    63.2 -38.2 
#> 10    10  58.9 [0,1.1e+03]    63.2  -4.28
#> # ℹ 1,086 more rows
tidy(mod_null)
#> # A tibble: 1 × 9
#>   region      num_obs   min   max  mean    sd begin   end param_mu
#>   <chr>         <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1 [0,1.1e+03]    1096  13.7  299.  63.2  45.7     0  1096     63.2
glance(mod_null)
#> # A tibble: 1 × 8
#>   pkg      version algorithm seg_params model_name criteria fitness elapsed_time
#>   <chr>    <pckg_> <chr>     <list>     <chr>      <chr>      <dbl> <drtn>      
#> 1 tidycha… 0.0.1   manual    <list [0]> meanshift… BIC       11503. 0.004 secs  
segment(DataCPSim, method = "pelt")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#>        ~~   : S4 class containing 12 slots with names
#>               cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est 
#> 
#> Created on  : Wed Apr 24 21:56:29 2024 
#> 
#> summary(.)  :
#> ----------
#> Created Using changepoint version 2.2.4 
#> Changepoint type      : Change in mean and variance 
#> Method of analysis    : PELT 
#> Test Statistic  : Normal 
#> Type of penalty       : MBIC with value, 27.99769 
#> Minimum Segment Length : 2 
#> Maximum no. of cpts   : Inf 
#> Changepoint Locations : 547 822 972 
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:3] 547 822 972
#>  $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>   ..$ param_mu         : num [1:4] 35.3 58.1 96.7 155.9
#>   ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#>   .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "pelt", penalty = "AIC")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#>        ~~   : S4 class containing 12 slots with names
#>               cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est 
#> 
#> Created on  : Wed Apr 24 21:56:29 2024 
#> 
#> summary(.)  :
#> ----------
#> Created Using changepoint version 2.2.4 
#> Changepoint type      : Change in mean and variance 
#> Method of analysis    : PELT 
#> Test Statistic  : Normal 
#> Type of penalty       : AIC with value, 6 
#> Minimum Segment Length : 2 
#> Maximum no. of cpts   : Inf 
#> Number of changepoints: 205 
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:205] 4 7 10 13 15 17 19 27 30 32 ...
#>  $ region_params: tibble [206 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:206] "[0,4)" "[4,7)" "[7,10)" "[10,13)" ...
#>   ..$ param_mu         : num [1:206] 33.4 29.3 34.7 41 37.7 ...
#>   ..$ param_sigma_hatsq: Named num [1:206] 9.59 9.59 124.55 162.27 26.02 ...
#>   .. ..- attr(*, "names")= chr [1:206] "[0,4)" "[4,7)" "[7,10)" "[10,13)" ...
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 33.4 33.4 33.4 29.3 29.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "binseg", penalty = "AIC")
#> Warning: The number of changepoints identified is Q, it is advised to increase Q to make sure changepoints have not been missed.
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#>        ~~   : S4 class containing 14 slots with names
#>               cpts.full pen.value.full data.set cpttype method test.stat pen.type pen.value minseglen cpts ncpts.max param.est date version 
#> 
#> Created on  : Wed Apr 24 21:56:29 2024 
#> 
#> summary(.)  :
#> ----------
#> Created Using changepoint version 2.2.4 
#> Changepoint type      : Change in mean and variance 
#> Method of analysis    : BinSeg 
#> Test Statistic  : Normal 
#> Type of penalty       : AIC with value, 6 
#> Minimum Segment Length : 2 
#> Maximum no. of cpts   : 5 
#> Changepoint Locations : 547 809 813 822 972 
#> Range of segmentations:
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  809   NA   NA   NA   NA
#> [2,]  809  547   NA   NA   NA
#> [3,]  809  547  972   NA   NA
#> [4,]  809  547  972  822   NA
#> [5,]  809  547  972  822  813
#> 
#>  For penalty values: 1485.679 462.0479 160.3649 15.04514 15.04514 
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:5] 547 809 813 822 972
#>  $ region_params: tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:6] "[0,547)" "[547,809)" "[809,813)" "[813,822)" ...
#>   ..$ param_mu         : num [1:6] 35.3 57.9 83.9 52.5 96.7 ...
#>   ..$ param_sigma_hatsq: Named num [1:6] 127 341 2182 122 924 ...
#>   .. ..- attr(*, "names")= chr [1:6] "[0,547)" "[547,809)" "[809,813)" "[813,822)" ...
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "segneigh", penalty = "BIC")
#> Warning: SegNeigh is computationally slow, use PELT instead
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#>        ~~   : S4 class containing 14 slots with names
#>               cpts.full pen.value.full data.set cpttype method test.stat pen.type pen.value minseglen cpts ncpts.max param.est date version 
#> 
#> Created on  : Wed Apr 24 21:56:29 2024 
#> 
#> summary(.)  :
#> ----------
#> Created Using changepoint version 2.2.4 
#> Changepoint type      : Change in mean and variance 
#> Method of analysis    : SegNeigh 
#> Test Statistic  : Normal 
#> Type of penalty       : BIC with value, 20.99827 
#> Minimum Segment Length : 2 
#> Maximum no. of cpts   : 5 
#> Changepoint Locations : 547 822 972 
#> Range of segmentations:
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  809   NA   NA   NA   NA
#> [2,]  547  822   NA   NA   NA
#> [3,]  547  822  972   NA   NA
#> [4,]  547  822  939  980   NA
#> 
#>  For penalty values: 1485.679 475.6009 152.0772 15.72419 
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:3] 547 822 972
#>  $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>   ..$ param_mu         : num [1:4] 35.3 58.1 96.7 155.9
#>   ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#>   .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "random")
#> Seeding initial population with probability: 0.0063863343681642
#> A tidycpt object
#> An object of class "ga"
#> 
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1,     maxiter = 1)
#> 
#> Available slots:
#>  [1] "data"          "model_fn_args" "call"          "type"         
#>  [5] "lower"         "upper"         "nBits"         "names"        
#>  [9] "popSize"       "iter"          "run"           "maxiter"      
#> [13] "suggestions"   "population"    "elitism"       "pcrossover"   
#> [17] "pmutation"     "optim"         "fitness"       "summary"      
#> [21] "bestSol"       "fitnessValue"  "solution"     
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:9] 214 224 315 597 677 811 847 960 1012
#>  $ region_params: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ region  : chr [1:10] "[0,214)" "[214,224)" "[224,315)" "[315,597)" ...
#>   ..$ param_mu: num [1:10] 36 31.7 36.6 38.5 59.9 ...
#>  $ model_params : Named num 619
#>   ..- attr(*, "names")= chr "sigma_hatsq"
#>  $ fitted_values: num [1:1096] 36 36 36 36 36 ...
#>  $ model_name   : chr "meanshift_norm"
#>  - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "manual", tau = c(826))
#> A tidycpt object
#> List of 8
#>  $ data        : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ pkg         : chr "tidychangepoint"
#>  $ algorithm   : chr "manual"
#>  $ changepoints: num 826
#>  $ fitness     : Named num 10571
#>   ..- attr(*, "names")= chr "BIC"
#>  $ seg_params  : list()
#>  $ model_name  : chr "meanshift_norm"
#>  $ penalty     : chr "BIC"
#>  - attr(*, "class")= chr "seg_cpt"
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int 826
#>  $ region_params: tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ region  : chr [1:2] "[0,826)" "[826,1.1e+03]"
#>   ..$ param_mu: num [1:2] 43.2 123.8
#>  $ model_params : Named num 882
#>   ..- attr(*, "names")= chr "sigma_hatsq"
#>  $ fitted_values: num [1:1096] 43.2 43.2 43.2 43.2 43.2 ...
#>  $ model_name   : chr "meanshift_norm"
#>  - attr(*, "class")= chr "mod_cpt"
two_cpts <- segment(DataCPSim, method = "manual", tau = c(365, 826))
plot(two_cpts)

diagnose(two_cpts)

segment(bogota_pm, method = "pelt")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#>        ~~   : S4 class containing 12 slots with names
#>               cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est 
#> 
#> Created on  : Wed Apr 24 21:56:29 2024 
#> 
#> summary(.)  :
#> ----------
#> Created Using changepoint version 2.2.4 
#> Changepoint type      : Change in mean and variance 
#> Method of analysis    : PELT 
#> Test Statistic  : Normal 
#> Type of penalty       : MBIC with value, 27.99769 
#> Minimum Segment Length : 2 
#> Maximum no. of cpts   : Inf 
#> Changepoint Locations : 12 14 393 468 821 1023 1025 
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 39.3 27.7 30.3 33.6 32.9 20.1 23.7 23.5 29.2 27.6 ...
#>  $ tau          : int [1:7] 12 14 393 468 821 1023 1025
#>  $ region_params: tibble [8 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:8] "[0,12)" "[12,14)" "[14,393)" "[393,468)" ...
#>   ..$ param_mu         : num [1:8] 28.9 16.6 28.1 44.5 30.7 ...
#>   ..$ param_sigma_hatsq: Named num [1:8] 26.1 174.2 147.4 309.4 200 ...
#>   .. ..- attr(*, "names")= chr [1:8] "[0,12)" "[12,14)" "[14,393)" "[393,468)" ...
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 28.9 28.9 28.9 28.9 28.9 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"
if (FALSE) {
x <- segment(DataCPSim, method = "gbmdl", num_generations = 10)
}