Skip to contents

Build an initial population set for genetic algorithms

Usage

build_gabin_population(x, ...)

log_gabin_population(x, ...)

Arguments

x

a numeric vector coercible into a stats::ts object

...

arguments passed to methods

Value

A function that can be passed to the population argument of GA::ga() (through segment_ga())

Details

Genetic algorithms require a method for randomly generating initial populations (i.e., a first generation). The default method used by GA::ga() for changepoint detection is usually GA::gabin_Population(), which selects candidate changepoints uniformly at random with probability 0.5. This leads to an initial population with excessively large candidate changepoint sets (on the order of \(n/2\)), which makes the genetic algorithm slow.

  • build_gabin_population() takes a ts object and runs several fast changepoint detection algorithms on it, then sets the initial probability to 3 times the average value of the size of the changepoint sets returned by those algorithms. This is a conservative guess as to the likely size of the optimal changepoint set.

  • log_gabin_population() takes a ts object and sets the initial probability to the natural logarithm of the length of the time series.

Examples

# Build a function to generate the population
f <- build_gabin_population(CET)

# Segment the time series using the population generation function
segment(CET, method = "ga", population = f, maxiter = 5)
#> Seeding initial population with probability: 0.0276243093922652
#> A tidycpt object
#> An object of class "ga"
#> 
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1,     maxiter = 5)
#> 
#> Available slots:
#>  [1] "data"          "model_fn_args" "call"          "type"         
#>  [5] "lower"         "upper"         "nBits"         "names"        
#>  [9] "popSize"       "iter"          "run"           "maxiter"      
#> [13] "suggestions"   "population"    "elitism"       "pcrossover"   
#> [17] "pmutation"     "optim"         "fitness"       "summary"      
#> [21] "bestSol"       "fitnessValue"  "solution"     
#> List of 6
#>  $ data         : Time-Series [1:362] from 1 to 362: 8.87 9.1 9.78 9.52 8.63 9.34 8.29 9.86 8.52 9.51 ...
#>  $ tau          : int [1:5] 40 205 284 305 330
#>  $ region_params: tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ region  : chr [1:6] "[0,40)" "[40,205)" "[205,284)" "[284,305)" ...
#>   ..$ param_mu: num [1:6] 8.7 9.15 9.27 9.64 9.43 ...
#>  $ model_params : Named num 0.322
#>   ..- attr(*, "names")= chr "sigma_hatsq"
#>  $ fitted_values: num [1:362] 8.7 8.7 8.7 8.7 8.7 ...
#>  $ model_name   : chr "meanshift_norm"
#>  - attr(*, "class")= chr "mod_cpt"
f <- log_gabin_population(CET)
segment(CET, method = "ga", population = f, maxiter = 10)
#> Seeding initial population with probability: 0.0162752602536624
#> A tidycpt object
#> An object of class "ga"
#> 
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1,     maxiter = 10)
#> 
#> Available slots:
#>  [1] "data"          "model_fn_args" "call"          "type"         
#>  [5] "lower"         "upper"         "nBits"         "names"        
#>  [9] "popSize"       "iter"          "run"           "maxiter"      
#> [13] "suggestions"   "population"    "elitism"       "pcrossover"   
#> [17] "pmutation"     "optim"         "fitness"       "summary"      
#> [21] "bestSol"       "fitnessValue"  "solution"     
#> List of 6
#>  $ data         : Time-Series [1:362] from 1 to 362: 8.87 9.1 9.78 9.52 8.63 9.34 8.29 9.86 8.52 9.51 ...
#>  $ tau          : int [1:4] 48 74 237 337
#>  $ region_params: tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ region  : chr [1:5] "[0,48)" "[48,74)" "[74,237)" "[237,337)" ...
#>   ..$ param_mu: num [1:5] 8.71 9.33 9.14 9.49 10.31
#>  $ model_params : Named num 0.321
#>   ..- attr(*, "names")= chr "sigma_hatsq"
#>  $ fitted_values: num [1:362] 8.71 8.71 8.71 8.71 8.71 ...
#>  $ model_name   : chr "meanshift_norm"
#>  - attr(*, "class")= chr "mod_cpt"