Initialize populations in genetic algorithms
build_gabin_population.Rd
Build an initial population set for genetic algorithms
Arguments
- x
a numeric vector coercible into a stats::ts object
- ...
arguments passed to methods
Value
A function
that can be passed to the population
argument of
GA::ga()
(through segment_ga()
)
Details
Genetic algorithms require a method for randomly generating initial
populations (i.e., a first generation).
The default method used by GA::ga()
for changepoint detection is usually
GA::gabin_Population()
, which selects candidate changepoints uniformly at
random with probability 0.5.
This leads to an initial population with excessively large candidate
changepoint sets (on the order of \(n/2\)), which makes the genetic
algorithm slow.
build_gabin_population()
takes ats
object and runs several fast changepoint detection algorithms on it, then sets the initial probability to 3 times the average value of the size of the changepoint sets returned by those algorithms. This is a conservative guess as to the likely size of the optimal changepoint set.log_gabin_population()
takes ats
object and sets the initial probability to the natural logarithm of the length of the time series.
Examples
# Build a function to generate the population
f <- build_gabin_population(CET)
# Segment the time series using the population generation function
segment(CET, method = "ga", population = f, maxiter = 5)
#> Seeding initial population with probability: 0.0273224043715847
#> A tidycpt object
#> An object of class "ga"
#>
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1, maxiter = 5)
#>
#> Available slots:
#> [1] "data" "model_fn_args" "call" "type"
#> [5] "lower" "upper" "nBits" "names"
#> [9] "popSize" "iter" "run" "maxiter"
#> [13] "suggestions" "population" "elitism" "pcrossover"
#> [17] "pmutation" "optim" "fitness" "summary"
#> [21] "bestSol" "fitnessValue" "solution"
#> List of 6
#> $ data : Time-Series [1:366] from 1 to 366: 8.87 9.1 9.78 9.52 8.63 9.34 8.29 9.86 8.52 9.51 ...
#> $ tau : int [1:4] 40 205 284 330
#> $ region_params: tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:5] "[1,40)" "[40,205)" "[205,284)" "[284,330)" ...
#> ..$ param_mu: num [1:5] 8.7 9.15 9.27 9.5 10.34
#> $ model_params : Named num 0.326
#> ..- attr(*, "names")= chr "sigma_hatsq"
#> $ fitted_values: num [1:366] 8.7 8.7 8.7 8.7 8.7 ...
#> $ model_name : chr "meanshift_norm"
#> - attr(*, "class")= chr "mod_cpt"
f <- log_gabin_population(CET)
segment(CET, method = "ga", population = f, maxiter = 10)
#> Seeding initial population with probability: 0.0161274134792387
#> A tidycpt object
#> An object of class "ga"
#>
#> Call:
#> GA::ga(type = "binary", fitness = obj_fun, nBits = n, population = ..1, maxiter = 10)
#>
#> Available slots:
#> [1] "data" "model_fn_args" "call" "type"
#> [5] "lower" "upper" "nBits" "names"
#> [9] "popSize" "iter" "run" "maxiter"
#> [13] "suggestions" "population" "elitism" "pcrossover"
#> [17] "pmutation" "optim" "fitness" "summary"
#> [21] "bestSol" "fitnessValue" "solution"
#> List of 6
#> $ data : Time-Series [1:366] from 1 to 366: 8.87 9.1 9.78 9.52 8.63 9.34 8.29 9.86 8.52 9.51 ...
#> $ tau : int [1:3] 70 233 333
#> $ region_params: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ region : chr [1:4] "[1,70)" "[70,233)" "[233,333)" "[333,367)"
#> ..$ param_mu: num [1:4] 8.89 9.15 9.44 10.35
#> $ model_params : Named num 0.34
#> ..- attr(*, "names")= chr "sigma_hatsq"
#> $ fitted_values: num [1:366] 8.89 8.89 8.89 8.89 8.89 ...
#> $ model_name : chr "meanshift_norm"
#> - attr(*, "class")= chr "mod_cpt"