Segment a time series using a variety of algorithms
segment.Rd
A wrapper function that encapsulates various algorithms for detecting changepoint sets in univariate time series.
Usage
segment(x, method = "null", ...)
# S3 method for tbl_ts
segment(x, method = "null", ...)
# S3 method for xts
segment(x, method = "null", ...)
# S3 method for numeric
segment(x, method = "null", ...)
# S3 method for ts
segment(x, method = "null", ...)
Arguments
- x
a numeric vector coercible into a stats::ts object
- method
a character string indicating the algorithm to use. See Details.
- ...
arguments passed to methods
Value
An object of class tidycpt
. Every tidycpt
object contains:
segmenter
: The object returned by the underlying changepoint detection algorithm.model
: A model object inheriting frommod_cpt
, as created byas.model()
when called on thesegmenter
.elapsed_time
: The clock time that passed while the algorithm was running.time_index
: If available, the labels for the time indices of the time series.
Details
Currently, segment()
can use the following algorithms, depending
on the value of the method
argument:
pelt
: Uses the PELT algorithm as implemented inchangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
.binseg
: Uses the Binary Segmentation algorithm as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
.segneigh
: Uses the Segmented Neighborhood algorithm as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
.single-best
: Uses the AMOC criteria as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
.wbs
: Uses the Wild Binary Segmentation algorithm as implemented bywbs::wbs()
. Thesegmenter
is of classwbs
.ga
: Uses the Genetic algorithm implemented bysegment_ga()
, which wrapsGA::ga()
. Thesegmenter
is of classtidyga
.coen
: Uses Coen's heuristic as implemented bysegment_coen()
. Thesegmenter
is of classseg_basket()
.manual
: Uses the vector of changepoints in thetau
argument. Thesegmenter
is of classseg_cpt()
`.null
: The default. Uses no changepoints. Thesegmenter
is of classseg_cpt()
`.
Examples
mod_null <- segment(DataCPSim)
augment(mod_null)
#> # A tsibble: 1,096 x 5 [1]
#> # Groups: region [1]
#> index y region .fitted .resid
#> <int> <dbl> <fct> <dbl> <dbl>
#> 1 1 35.5 [0,1.1e+03] 63.2 -27.7
#> 2 2 29.0 [0,1.1e+03] 63.2 -34.2
#> 3 3 35.6 [0,1.1e+03] 63.2 -27.5
#> 4 4 33.0 [0,1.1e+03] 63.2 -30.2
#> 5 5 29.5 [0,1.1e+03] 63.2 -33.6
#> 6 6 25.4 [0,1.1e+03] 63.2 -37.8
#> 7 7 28.8 [0,1.1e+03] 63.2 -34.3
#> 8 8 50.3 [0,1.1e+03] 63.2 -12.9
#> 9 9 24.9 [0,1.1e+03] 63.2 -38.2
#> 10 10 58.9 [0,1.1e+03] 63.2 -4.28
#> # ℹ 1,086 more rows
tidy(mod_null)
#> # A tibble: 1 × 9
#> region num_obs min max mean sd begin end param_mu
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 [0,1.1e+03] 1096 13.7 299. 63.2 45.7 0 1096 63.2
glance(mod_null)
#> # A tibble: 1 × 8
#> pkg version algorithm seg_params model_name criteria fitness elapsed_time
#> <chr> <pckg_> <chr> <list> <chr> <chr> <dbl> <drtn>
#> 1 tidycha… 0.0.1 manual <list [0]> meanshift… BIC 11503. 0.004 secs
segment(DataCPSim, method = "pelt")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 12 slots with names
#> cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est
#>
#> Created on : Wed Apr 24 21:56:29 2024
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : PELT
#> Test Statistic : Normal
#> Type of penalty : MBIC with value, 27.99769
#> Minimum Segment Length : 2
#> Maximum no. of cpts : Inf
#> Changepoint Locations : 547 822 972
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int [1:3] 547 822 972
#> $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#> ..$ param_mu : num [1:4] 35.3 58.1 96.7 155.9
#> ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#> .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#> $ model_params : NULL
#> $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#> $ model_name : chr "meanvar"
#> - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "pelt", penalty = "AIC")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 12 slots with names
#> cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est
#>
#> Created on : Wed Apr 24 21:56:29 2024
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : PELT
#> Test Statistic : Normal
#> Type of penalty : AIC with value, 6
#> Minimum Segment Length : 2
#> Maximum no. of cpts : Inf
#> Number of changepoints: 205
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int [1:205] 4 7 10 13 15 17 19 27 30 32 ...
#> $ region_params: tibble [206 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:206] "[0,4)" "[4,7)" "[7,10)" "[10,13)" ...
#> ..$ param_mu : num [1:206] 33.4 29.3 34.7 41 37.7 ...
#> ..$ param_sigma_hatsq: Named num [1:206] 9.59 9.59 124.55 162.27 26.02 ...
#> .. ..- attr(*, "names")= chr [1:206] "[0,4)" "[4,7)" "[7,10)" "[10,13)" ...
#> $ model_params : NULL
#> $ fitted_values: num [1:1096] 33.4 33.4 33.4 29.3 29.3 ...
#> $ model_name : chr "meanvar"
#> - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "binseg", penalty = "AIC")
#> Warning: The number of changepoints identified is Q, it is advised to increase Q to make sure changepoints have not been missed.
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 14 slots with names
#> cpts.full pen.value.full data.set cpttype method test.stat pen.type pen.value minseglen cpts ncpts.max param.est date version
#>
#> Created on : Wed Apr 24 21:56:29 2024
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : BinSeg
#> Test Statistic : Normal
#> Type of penalty : AIC with value, 6
#> Minimum Segment Length : 2
#> Maximum no. of cpts : 5
#> Changepoint Locations : 547 809 813 822 972
#> Range of segmentations:
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 809 NA NA NA NA
#> [2,] 809 547 NA NA NA
#> [3,] 809 547 972 NA NA
#> [4,] 809 547 972 822 NA
#> [5,] 809 547 972 822 813
#>
#> For penalty values: 1485.679 462.0479 160.3649 15.04514 15.04514
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int [1:5] 547 809 813 822 972
#> $ region_params: tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:6] "[0,547)" "[547,809)" "[809,813)" "[813,822)" ...
#> ..$ param_mu : num [1:6] 35.3 57.9 83.9 52.5 96.7 ...
#> ..$ param_sigma_hatsq: Named num [1:6] 127 341 2182 122 924 ...
#> .. ..- attr(*, "names")= chr [1:6] "[0,547)" "[547,809)" "[809,813)" "[813,822)" ...
#> $ model_params : NULL
#> $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#> $ model_name : chr "meanvar"
#> - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "segneigh", penalty = "BIC")
#> Warning: SegNeigh is computationally slow, use PELT instead
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 14 slots with names
#> cpts.full pen.value.full data.set cpttype method test.stat pen.type pen.value minseglen cpts ncpts.max param.est date version
#>
#> Created on : Wed Apr 24 21:56:29 2024
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : SegNeigh
#> Test Statistic : Normal
#> Type of penalty : BIC with value, 20.99827
#> Minimum Segment Length : 2
#> Maximum no. of cpts : 5
#> Changepoint Locations : 547 822 972
#> Range of segmentations:
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 809 NA NA NA NA
#> [2,] 547 822 NA NA NA
#> [3,] 547 822 972 NA NA
#> [4,] 547 822 939 980 NA
#>
#> For penalty values: 1485.679 475.6009 152.0772 15.72419
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int [1:3] 547 822 972
#> $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#> ..$ param_mu : num [1:4] 35.3 58.1 96.7 155.9
#> ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#> .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#> $ model_params : NULL
#> $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#> $ model_name : chr "meanvar"
#> - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "random")
#> Seeding initial population with probability: 0.0063863343681642
#> A tidycpt object
#> An object of class "ga"
#>
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1, maxiter = 1)
#>
#> Available slots:
#> [1] "data" "model_fn_args" "call" "type"
#> [5] "lower" "upper" "nBits" "names"
#> [9] "popSize" "iter" "run" "maxiter"
#> [13] "suggestions" "population" "elitism" "pcrossover"
#> [17] "pmutation" "optim" "fitness" "summary"
#> [21] "bestSol" "fitnessValue" "solution"
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int [1:9] 214 224 315 597 677 811 847 960 1012
#> $ region_params: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:10] "[0,214)" "[214,224)" "[224,315)" "[315,597)" ...
#> ..$ param_mu: num [1:10] 36 31.7 36.6 38.5 59.9 ...
#> $ model_params : Named num 619
#> ..- attr(*, "names")= chr "sigma_hatsq"
#> $ fitted_values: num [1:1096] 36 36 36 36 36 ...
#> $ model_name : chr "meanshift_norm"
#> - attr(*, "class")= chr "mod_cpt"
segment(DataCPSim, method = "manual", tau = c(826))
#> A tidycpt object
#> List of 8
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ pkg : chr "tidychangepoint"
#> $ algorithm : chr "manual"
#> $ changepoints: num 826
#> $ fitness : Named num 10571
#> ..- attr(*, "names")= chr "BIC"
#> $ seg_params : list()
#> $ model_name : chr "meanshift_norm"
#> $ penalty : chr "BIC"
#> - attr(*, "class")= chr "seg_cpt"
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#> $ tau : int 826
#> $ region_params: tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:2] "[0,826)" "[826,1.1e+03]"
#> ..$ param_mu: num [1:2] 43.2 123.8
#> $ model_params : Named num 882
#> ..- attr(*, "names")= chr "sigma_hatsq"
#> $ fitted_values: num [1:1096] 43.2 43.2 43.2 43.2 43.2 ...
#> $ model_name : chr "meanshift_norm"
#> - attr(*, "class")= chr "mod_cpt"
two_cpts <- segment(DataCPSim, method = "manual", tau = c(365, 826))
plot(two_cpts)
diagnose(two_cpts)
segment(bogota_pm, method = "pelt")
#> A tidycpt object
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 12 slots with names
#> cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est
#>
#> Created on : Wed Apr 24 21:56:29 2024
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : PELT
#> Test Statistic : Normal
#> Type of penalty : MBIC with value, 27.99769
#> Minimum Segment Length : 2
#> Maximum no. of cpts : Inf
#> Changepoint Locations : 12 14 393 468 821 1023 1025
#> List of 6
#> $ data : Time-Series [1:1096] from 1 to 1096: 39.3 27.7 30.3 33.6 32.9 20.1 23.7 23.5 29.2 27.6 ...
#> $ tau : int [1:7] 12 14 393 468 821 1023 1025
#> $ region_params: tibble [8 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:8] "[0,12)" "[12,14)" "[14,393)" "[393,468)" ...
#> ..$ param_mu : num [1:8] 28.9 16.6 28.1 44.5 30.7 ...
#> ..$ param_sigma_hatsq: Named num [1:8] 26.1 174.2 147.4 309.4 200 ...
#> .. ..- attr(*, "names")= chr [1:8] "[0,12)" "[12,14)" "[14,393)" "[393,468)" ...
#> $ model_params : NULL
#> $ fitted_values: num [1:1096] 28.9 28.9 28.9 28.9 28.9 ...
#> $ model_name : chr "meanvar"
#> - attr(*, "class")= chr "mod_cpt"
if (FALSE) {
x <- segment(DataCPSim, method = "gbmdl", num_generations = 10)
}