- November 2022.
This is a revised version of the earlier Unsupervised Natural Language Learning README Version 3, updated to reflect a new integrated process that enables continuous learing.
A general overview, including pre-requistes, is provided by the Unsupervised Natural Language Learning README file.
This integration is in development; the instructions below are incomplete. Look at the older version for general guidance.
There are technical challenges to fully implementing continuous learning. The instructions below will be a hybrid of the older-style batch process, and the newer style.
The setup of the integrated pipeline requires many prequisites and preliminaries. These are given in the earlier
- Setting up the AtomSpace
- Bulk Pair Counting
- Mutual Information of Word Pairs
- The Vector Structure Encoded in Pairs
- Maximum Spanning Trees
- MST Disjunct Counting
- Disjunct Marginal Statistics
- Determining Grammatical Classes
- Creating Grammatical Classes
- Exporting a Lexis
- Clustering
- Precomputed LXC containers
The goal is to have MI be computed dynamically, on the fly. The code
to get this working is half written, but incomplete. So, for now, do
it as before, as a batch process. Run the code in
run-common/marginals-pair.scm
. It works.
As a demo of what is about to happen, aim the link-parser
at a
running instance of the CogServer, containing word-pairs (and
word-pair MI data.) Type in any sentence, and then patiently wait
(about 5-10 seconds) for data to fly over the net. The resulting
parses will be maximal planar graphs (MPG), which are similar to
maximal spanning trees (MST), but contain loops. What's being
maximized is the sum-total of all of the MI of the links between
words.
Use the dictionary in run-config/dict-combined
after adjusting
the URL in it. Like so:
link-parser run-config/dict-combined
As before, but with modernized infrastucture. (This is not yet the "continuous learning" design...)
- Edit
run-config/3-mpg-conf.sh
and modify as needed, - Start CogServer with
run/3-mst-parsing/run-mst-cogserver.sh
or simplyguile -l run-common/cogserver-mst.scm
. - Place text data into
$CORPORA_DIR
as configured in3-mpg-conf.sh
- Run
./run/3-mst-parsing/mst-submit.sh
THE END.