Skip to content

Commit

Permalink
Redesigns API to allow for pre/post generation data generation. (#3)
Browse files Browse the repository at this point in the history
* Demonstrates implementation of epsilon lexicase selection.
* Demonstrates implementation of downsampling.
* Expands the toolbox.
  • Loading branch information
erp12 authored Jan 3, 2022
1 parent 7984dac commit e3c764d
Show file tree
Hide file tree
Showing 14 changed files with 624 additions and 262 deletions.
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Outline
- Not easily extended via interop.
- Decoupling from assumptions
- Genomes can be any data.
- Phenomes/individuals are open maps that hold genomes, errors, and anything else the user wants to add to them.
- Individuals are open maps that hold genomes, errors, and anything else the user wants to add to them.
- "breeding" new genomes is done via a user supplied function.
- Solving common issues
- Ensure same, well tested, implementations of common algorithms. (aka toolbox)
Expand All @@ -35,16 +35,17 @@ In the future, we may also publish releases to Clojars.

## Guide

Current ga-clj only supports generational genetic algorithms.
Currently, ga-clj only supports generational genetic algorithms.

### Terminology

- `genome-factory`: A nullary function for creating random genomes.
- `genome->phenome`: A function from genome to a "phenome" map containing additional used to drive breeding.
Often this map contains the errors/fitness associated with the genome but could also contain any other values.
- `breed`: A function that takes the current population of individuals as input and returns a new child genome.
Often this function will perform parent selection and variation operators. The `erp12.ga-clj.toolbox` namespace
provides implementations of commonly used algorithms that will likely be useful to call in breed functions.
- `genome->individual`: A function from genome to an "individual" map containing additional data used to drive breeding.
Often this map contains the errors/fitness associated with the genome but could also contain any other values.
- `breed`: A function that takes the current population of individuals and maybe other data about the state of evolution
and returns a new child genome. Often this function will perform parent selection and variation operators.
The `erp12.ga-clj.toolbox` namespace provides implementations of commonly used algorithms that will likely be useful
to call in breed functions.

Additional terminology and configuration parameters can be found in the docstring of `evolve` functions found
in namespaces that provide a specific kind of genetic algorithm. For example:
Expand All @@ -71,6 +72,4 @@ See the `CONTRIBUTING.md` for more information, including how to run tests.
- Add more examples that test the design/abstraction in a wider range of scenarios.
- TSP?
- Knapsack problem?
- Figure out how best to handle logging, monitoring, data collection, etc.
- In the library or in user code?
- Rationale and Guide
12 changes: 9 additions & 3 deletions build.clj
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,18 @@
(process-result (b/process {:command-args ["node" "out/node-tests.js"]}))
opts)

(defn example
[opts]
;; @todo Pass args to example file.
(process-result (b/process {:command-args ["clojure" "-M:examples" "-m" (name (:ns opts))]})))

(defn examples
[_]
(doseq [example-ns ['erp12.ga-clj.examples.alphabet]]
[opts]
(doseq [example-ns ['erp12.ga-clj.examples.alphabet
'erp12.ga-clj.examples.symbolic-regression]]
(println "\nRunning example" example-ns)
;; @todo Pass smaller population sizes and max generations to examples via command args to keep CI fast.
(process-result (b/process {:command-args ["clojure" "-M:examples" "-m" (name example-ns)]}))))
(example (assoc opts :ns example-ns))))

(defn ci
[opts]
Expand Down
16 changes: 12 additions & 4 deletions deps.edn
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
{:paths ["src"]
:deps {}
:deps {kixi/stats {:mvn/version "0.5.4"}}
:aliases {:build {:extra-deps {io.github.seancorfield/build-clj {:git/tag "v0.5.4" :git/sha "bc9c0cc"}}
:ns-default build}
:test {:extra-paths ["test"]
:extra-deps {com.cognitect/test-runner {:git/url "https://github.com/cognitect-labs/test-runner.git"
:git/tag "v0.5.0" :git/sha "b3fd0d2"}}
:extra-deps {io.github.cognitect-labs/test-runner {:git/tag "v0.5.0" :git/sha "b3fd0d2"}}
:main-opts ["-m" "cognitect.test-runner"]
:exec-fn cognitect.test-runner.api/test}
:test-cljs {:extra-paths ["test"]
:extra-deps {thheller/shadow-cljs {:mvn/version "2.16.6"}}
:main-opts ["-m" "shadow.cljs.devtools.cli"]}
:examples {:extra-paths ["examples"]}}}
:examples {:extra-paths ["examples"]}
:codox {:extra-deps {codox/codox {:mvn/version "0.10.8"}}
:exec-fn codox.main/generate-docs
:exec-args {:source-paths ["src"]
:doc-paths ["docs"]
:output-path "../ga-clj-DOC"
:source-uri "https://github.com/erp12/ga-clj/blob/{version}/{filepath}#L{line}"
:project {:name "GA CLJ"
:version "0.0.0"
:description "No-assumptions genetic algorithms in Clojure"}}}}}
3 changes: 3 additions & 0 deletions docs/Intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Getting Started

Write me!
28 changes: 14 additions & 14 deletions examples/erp12/ga_clj/examples/alphabet.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,29 @@
[& _]
(println
(ga/evolve {;; Generates random genomes as a permutation of the target genome.
:genome-factory #(shuffle target)
;; Phenomes are a map containing a scalar `:error` for the genome.
:genome-factory #(shuffle target)
;; Individuals are a map containing a scalar `:error` for the genome.
;; In this case, we use the hamming distance.
;; The `:genome` is added implicitly.
:genome->phenome (fn [gn] {:error (tb/hamming-distance gn target)})
:genome->individual (fn [gn _] {:error (tb/hamming-distance gn target)})
;; To "breed" a new genome from the population, we:
;; 1. Select 2 parents with tournament selection.
;; 2. Pass their genomes to uniform-crossover.
;; 3. Mutate the resulting genome by swapping the position of 2 genes.
:breed (fn [population]
(->> (repeatedly 2 #(tournament population))
(map :genome)
tb/uniform-crossover
tb/swap-2-genes))
:breed (fn [generation]
(->> (repeatedly 2 #(tournament generation))
(map :genome)
tb/uniform-crossover
tb/swap-2-genes))
;; We compare individuals on the basis of the error values. Lower is better.
:phenome-cmp (comparator #(< (:error %1) (:error %2)))
:individual-cmp (comparator #(< (:error %1) (:error %2)))
;; We stop evolution when either:
;; 1. We find an individual with zero error or
;; 2. We reach 300 generations.
:stop-fn (fn [{:keys [generation best]}]
(cond
(= (:error best) 0) :solution-found
(= generation 300) :max-generation-reached))
:stop-fn (fn [{:keys [generation best]}]
(cond
(= (:error best) 0) :solution-found
(= generation 300) :max-generation-reached))
;; Each generation will contain 1000 individuals.
:population-size 1000}))
:population-size 1000}))
(shutdown-agents))
47 changes: 0 additions & 47 deletions examples/erp12/ga_clj/examples/alphabet_lexicase.cljc

This file was deleted.

134 changes: 134 additions & 0 deletions examples/erp12/ga_clj/examples/symbolic_regression.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
(ns erp12.ga-clj.examples.symbolic-regression
"Symbolic regression to fit data generated from x^2 + 3x + 2 with a bit of noise added.
Adapted from: Lee Spector (https://github.com/lspector/gp)"
(:require [kixi.stats.math :as math]
[kixi.stats.distribution :as dist]
[erp12.ga-clj.generational :as ga]
[erp12.ga-clj.toolbox :as tb]))

;; x^2 + 3x + 2 + noise
(def x-train (vec (range -20.0 20.0 0.05)))
(def y-train (mapv (fn [x noise]
(+ (+ (+ (* x x) (* 3 x)) 2)
noise))
x-train
;; Add some noise to the data.
(dist/sample (count x-train) (dist/normal {:mu 0 :sd 0.1}))))

(defn p-div
"Protected division."
[n d]
(if (zero? d)
0
(/ n d)))

(def fn->arity
{+ 2
- 2
* 2
p-div 2
math/sin 1
math/cos 1})

(defn random-function
[]
(rand-nth (keys fn->arity)))

(defn random-terminal
[]
(rand-nth (list 'x (- (rand 5) 1))))

(defn random-code
"Create a random tree of code."
[depth]
(if (or (zero? depth)
(zero? (rand-int 2)))
(random-terminal)
(let [f (random-function)]
(cons f (repeatedly (get fn->arity f)
#(random-code (dec depth)))))))

(def mutate
;; We mutate trees by replacing a random subtree with a random tree.
(tb/make-subtree-mutation {:tree-generator #(random-code 2)}))

(def select
;; Creates a parent selection function that uses the lexicase selection
;; algorithm. We supply a function-live value to the `:epsilon` field
;; to specify how a value for epsilon will be pulled from the generation.
;; In this case, we will store a vector of per-case epsilon under the `:epsilon`
;; key of the generation map.
(tb/make-lexicase-selection {:epsilon :epsilon})

;; To use traditional lexicase selection (no epsilon) we can specify `:epsilon` as
;; a falsey value, which is also the default.
; (tb/make-lexicase-selection)

;; We can also supply a static, constant, value for epsilon.
; (tb/make-lexicase-selection {:epsilon 0.1})
)

(defn -main
[& _]
(println
(ga/evolve {;; Creates a random tree, corresponding to an equation, with a max depth of 3.
:genome-factory #(random-code 3)
;; Before each generation (evaluation), randomly select 5% of the training
;; cases to use for evaluation. All genomes in the generation will be evaluated
;; on these same cases.
:pre-generation (fn []
{:batch-cases (random-sample 0.05 (range (count x-train)))})
;; Individuals are a maps containing
;; 1. A `:model` represented as a callable Clojure funciton.
;; 2. A vector of predictions, stored under `:y-pred`.
;; 3. A vector of `:errors`, one for each training case in this generation's batch.
;; 4. The mean error across all cases, stored under `:mae`.
;; 5. The `:genome` tree which created the model. This is added implicitly.
:genome->individual (fn [gn {:keys [batch-cases]}]
(let [model (eval `(fn ~(vector 'x) ~gn))
x-batch (mapv #(nth x-train %) batch-cases)
y-batch (mapv #(nth y-train %) batch-cases)
y-pred (mapv model x-batch)
errors (mapv #(Math/abs (- %1 %2)) y-pred y-batch)]
{:model model
:y-pred y-pred
:errors errors
:mae (tb/mean errors)}))
;; After each generation is evaluated, compute a vector of `:epsilon` values
;; to use in parent selection. In this case, we will use the default computation
;; of epsilon: the median absolute deviation.
:post-generation (fn [population]
{:epsilon (tb/compute-epsilon-per-case population)})
;; To "breed" a new genome from the population, we:
;; 1. Select 2 parents with lexicsae selection. This will look-up
;; 2. Pass their genomes to subtree crossover.
;; 3. Mutate the resulting genome by with subtree mutation.
:breed (fn [generation]
(->> (repeatedly 2 #(select generation))
(map :genome)
(apply tb/subtree-crossover)
mutate))
;; We compare individuals on the basis of their mean absolute error. Lower is better.
:individual-cmp (comparator #(and (< (:mae %1) (:mae %2))
(not (math/infinite? (:mae %1)))))
;; We stop evolution when either:
;; 1. We reach 300 generations.
;; 2. We find an individual with zero MAE on the entire dataset.
:stop-fn (fn [{:keys [generation-number best new-best?]}]
(println "Generation:" generation-number
"Best MAE:" (:mae best)
"Best Tree Size:" (tb/tree-size (:genome best))
"Best Tree Depth:" (tb/tree-depth (:genome best)))
(cond
;; Stop evolution after 300 generations.
(= generation-number 300) :max-generation-reached
;; If a new "best" individual is found (based on MEA of a batch)
;; Test the new best individual on the full training set.
;; If the full MAE is below 0.3, report that the solution is found
new-best? (let [y-pred (mapv (:model best) x-train)
mae (tb/mae y-pred y-train)]
(when (<= mae 0.2)
:solution-found))))
;; Each generation will contain 1000 individuals.
:population-size 1000}))
(shutdown-agents))
Loading

0 comments on commit e3c764d

Please sign in to comment.