forked from ilarinieminen/SOM-Toolbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
som_demo2.m
406 lines (283 loc) · 13 KB
/
som_demo2.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
%SOM_DEMO2 Basic usage of the SOM Toolbox.
% Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
% http://www.cis.hut.fi/projects/somtoolbox/
% Version 1.0beta juuso 071197
% Version 2.0beta juuso 070200
clf reset;
figure(gcf)
echo on
clc
% ==========================================================
% SOM_DEMO2 - BASIC USAGE OF SOM TOOLBOX
% ==========================================================
% som_data_struct - Create a data struct.
% som_read_data - Read data from file.
%
% som_normalize - Normalize data.
% som_denormalize - Denormalize data.
%
% som_make - Initialize and train the map.
%
% som_show - Visualize map.
% som_show_add - Add markers on som_show visualization.
% som_grid - Visualization with free coordinates.
%
% som_autolabel - Give labels to map.
% som_hits - Calculate hit histogram for the map.
% BASIC USAGE OF THE SOM TOOLBOX
% The basic usage of the SOM Toolbox proceeds like this:
% 1. construct data set
% 2. normalize it
% 3. train the map
% 4. visualize map
% 5. analyse results
% The four first items are - if default options are used - very
% simple operations, each executable with a single command. For
% the last, several different kinds of functions are provided in
% the Toolbox, but as the needs of analysis vary, a general default
% function or procedure does not exist.
pause % Strike any key to construct data...
clc
% STEP 1: CONSTRUCT DATA
% ======================
% The SOM Toolbox has a special struct, called data struct, which
% is used to group information regarding the data set in one
% place.
% Here, a data struct is created using function SOM_DATA_STRUCT.
% First argument is the data matrix itself, then is the name
% given to the data set, and the names of the components
% (variables) in the data matrix.
D = rand(1000,3); % 1000 samples from unit cube
sData = som_data_struct(D,'name','unit cube','comp_names',{'x','y','z'});
% Another option is to read the data directly from an ASCII file.
% Here, the IRIS data set is loaded from a file (please make sure
% the file can be found from the current path):
try
sDiris = som_read_data('iris.data');
catch
echo off
warning('File ''iris.data'' not found. Using simulated data instead.')
D = randn(50,4);
D(:,1) = D(:,1)+5; D(:,2) = D(:,2)+3.5;
D(:,3) = D(:,3)/2+1.5; D(:,4) = D(:,4)/2+0.3;
D(find(D(:)<=0)) = 0.01;
D2 = randn(100,4); D2(:,2) = sort(D2(:,2));
D2(:,1) = D2(:,1)+6.5; D2(:,2) = D2(:,2)+2.8;
D2(:,3) = D2(:,3)+5; D2(:,4) = D2(:,4)/2+1.5;
D2(find(D2(:)<=0)) = 0.01;
sDiris = som_data_struct([D; D2],'name','iris (simulated)',...
'comp_names',{'SepalL','SepalW','PetalL','PetalW'});
sDiris = som_label(sDiris,'add',[1:50]','Setosa');
sDiris = som_label(sDiris,'add',[51:100]','Versicolor');
sDiris = som_label(sDiris,'add',[101:150]','Virginica');
echo on
end
% Here are the histograms and scatter plots of the four variables.
echo off
k=1;
for i=1:4,
for j=1:4,
if i==j,
subplot(4,4,k);
hist(sDiris.data(:,i)); title(sDiris.comp_names{i})
elseif i<j,
subplot(4,4,k);
plot(sDiris.data(:,i),sDiris.data(:,j),'k.')
xlabel(sDiris.comp_names{i})
ylabel(sDiris.comp_names{j})
end
k=k+1;
end
end
echo on
% Actually, as you saw in SOM_DEMO1, most SOM Toolbox functions
% can also handle plain data matrices, but then one is without the
% convenience offered by component names, labels and
% denormalization operations.
pause % Strike any key to normalize the data...
clc
% STEP 2: DATA NORMALIZATION
% ==========================
% Since SOM algorithm is based on Euclidian distances, the scale of
% the variables is very important in determining what the map will
% be like. If the range of values of some variable is much bigger
% than of the other variables, that variable will probably dominate
% the map organization completely.
% For this reason, the components of the data set are usually
% normalized, for example so that each component has unit
% variance. This can be done with function SOM_NORMALIZE:
sDiris = som_normalize(sDiris,'var');
% The function has also other normalization methods.
% However, interpreting the values may be harder when they have
% been normalized. Therefore, the normalization operations can be
% reversed with function SOM_DENORMALIZE:
x = sDiris.data(1,:)
orig_x = som_denormalize(x,sDiris)
pause % Strike any key to to train the map...
clc
% STEP 3: MAP TRAINING
% ====================
% The function SOM_MAKE is used to train the SOM. By default, it
% first determines the map size, then initializes the map using
% linear initialization, and finally uses batch algorithm to train
% the map. Function SOM_DEMO1 has a more detailed description of
% the training process.
sMap = som_make(sDiris);
pause % Strike any key to continues...
% The IRIS data set also has labels associated with the data
% samples. Actually, the data set consists of 50 samples of three
% species of Iris-flowers (a total of 150 samples) such that the
% measurements are width and height of sepal and petal leaves. The
% label associated with each sample is the species information:
% 'Setosa', 'Versicolor' or 'Virginica'.
% Now, the map can be labelled with these labels. The best
% matching unit of each sample is found from the map, and the
% species label is given to the map unit. Function SOM_AUTOLABEL
% can be used to do this:
sMap = som_autolabel(sMap,sDiris,'vote');
pause % Strike any key to visualize the map...
clc
% STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW
% =====================================================
% The basic visualization of the SOM is done with function SOM_SHOW.
colormap(1-gray)
som_show(sMap,'norm','d')
% Notice that the names of the components are included as the
% titles of the subplots. Notice also that the variable values
% have been denormalized to the original range and scale.
% The component planes ('PetalL', 'PetalW', 'SepalL' and 'SepalW')
% show what kind of values the prototype vectors of the map units
% have. The value is indicated with color, and the colorbar on the
% right shows what the colors mean.
% The 'U-matrix' shows distances between neighboring units and thus
% visualizes the cluster structure of the map. Note that the
% U-matrix visualization has much more hexagons that the
% component planes. This is because distances *between* map units
% are shown, and not only the distance values *at* the map units.
% High values on the U-matrix mean large distance between
% neighboring map units, and thus indicate cluster
% borders. Clusters are typically uniform areas of low
% values. Refer to colorbar to see which colors mean high
% values. In the IRIS map, there appear to be two clusters.
pause % Strike any key to continue...
% The subplots are linked together through similar position. In
% each axis, a particular map unit is always in the same place. For
% example:
h=zeros(sMap.topol.msize); h(1,2) = 1;
som_show_add('hit',h(:),'markercolor','r','markersize',0.5,'subplot','all')
% the red marker is on top of the same unit on each axis.
pause % Strike any key to continue...
clf
clc
% STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW_ADD
% =========================================================
% The SOM_SHOW_ADD function can be used to add markers, labels and
% trajectories on top of SOM_SHOW created figures. The function
% SOM_SHOW_CLEAR can be used to clear them away.
% Here, the U-matrix is shown on the left, and an empty grid
% named 'Labels' is shown on the right.
som_show(sMap,'umat','all','empty','Labels')
pause % Strike any key to add labels...
% Here, the labels added to the map with SOM_AUTOLABEL function
% are shown on the empty grid.
som_show_add('label',sMap,'Textsize',8,'TextColor','r','Subplot',2)
pause % Strike any key to add hits...
% An important tool in data analysis using SOM are so called hit
% histograms. They are formed by taking a data set, finding the BMU
% of each data sample from the map, and increasing a counter in a
% map unit each time it is the BMU. The hit histogram shows the
% distribution of the data set on the map.
% Here, the hit histogram for the whole data set is calculated
% and visualized on the U-matrix.
h = som_hits(sMap,sDiris);
som_show_add('hit',h,'MarkerColor','w','Subplot',1)
pause % Strike any key to continue...
% Multiple hit histograms can be shown simultaniously. Here, three
% hit histograms corresponding to the three species of Iris
% flowers is calculated and shown.
% First, the old hit histogram is removed.
som_show_clear('hit',1)
% Then, the histograms are calculated. The first 50 samples in
% the data set are of the 'Setosa' species, the next 50 samples
% of the 'Versicolor' species and the last 50 samples of the
% 'Virginica' species.
h1 = som_hits(sMap,sDiris.data(1:50,:));
h2 = som_hits(sMap,sDiris.data(51:100,:));
h3 = som_hits(sMap,sDiris.data(101:150,:));
som_show_add('hit',[h1, h2, h3],'MarkerColor',[1 0 0; 0 1 0; 0 0 1],'Subplot',1)
% Red color is for 'Setosa', green for 'Versicolor' and blue for
% 'Virginica'. One can see that the three species are pretty well
% separated, although 'Versicolor' and 'Virginica' are slightly
% mixed up.
pause % Strike any key to continue...
clf
clc
% STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_GRID
% =====================================================
% There's also another visualization function: SOM_GRID. This
% allows visualization of the SOM in freely specified coordinates,
% for example the input space (of course, only upto 3D space). This
% function has quite a lot of options, and is pretty flexible.
% Basically, the SOM_GRID visualizes the SOM network: each unit is
% shown with a marker and connected to its neighbors with lines.
% The user has control over:
% - the coordinate of each unit (2D or 3D)
% - the marker type, color and size of each unit
% - the linetype, color and width of the connecting lines
% There are also some other options.
pause % Strike any key to see some visualizations...
% Here are four visualizations made with SOM_GRID:
% - The map grid in the output space.
subplot(2,2,1)
som_grid(sMap,'Linecolor','k')
view(0,-90), title('Map grid')
% - A surface plot of distance matrix: both color and
% z-coordinate indicate average distance to neighboring
% map units. This is closely related to the U-matrix.
subplot(2,2,2)
Co=som_unit_coords(sMap); U=som_umat(sMap); U=U(1:2:size(U,1),1:2:size(U,2));
som_grid(sMap,'Coord',[Co, U(:)],'Surf',U(:),'Marker','none');
view(-80,45), axis tight, title('Distance matrix')
% - The map grid in the output space. Three first components
% determine the 3D-coordinates of the map unit, and the size
% of the marker is determined by the fourth component.
% Note that the values have been denormalized.
subplot(2,2,3)
M = som_denormalize(sMap.codebook,sMap);
som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
view(-80,45), axis tight, title('Prototypes')
% - Map grid as above, but the original data has been plotted
% also: coordinates show the values of three first components
% and color indicates the species of each sample. Fourth
% component is not shown.
subplot(2,2,4)
som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
hold on
D = som_denormalize(sDiris.data,sDiris);
plot3(D(1:50,1),D(1:50,2),D(1:50,3),'r.',...
D(51:100,1),D(51:100,2),D(51:100,3),'g.',...
D(101:150,1),D(101:150,2),D(101:150,3),'b.')
view(-72,64), axis tight, title('Prototypes and data')
pause % Strike any key to continue...
% STEP 5: ANALYSIS OF RESULTS
% ===========================
% The purpose of this step highly depends on the purpose of the
% whole data analysis: is it segmentation, modeling, novelty
% detection, classification, or something else? For this reason,
% there is not a single general-purpose analysis function, but
% a number of individual functions which may, or may not, prove
% useful in any specific case.
% Visualization is of course part of the analysis of
% results. Examination of labels and hit histograms is another
% part. Yet another is validation of the quality of the SOM (see
% the use of SOM_QUALITY in SOM_DEMO1).
[qe,te] = som_quality(sMap,sDiris)
% People have contributed a number of functions to the Toolbox
% which can be used for the analysis. These include functions for
% vector projection, clustering, pdf-estimation, modeling,
% classification, etc. However, ultimately the use of these
% tools is up to you.
% More about visualization is presented in SOM_DEMO3.
% More about data analysis is presented in SOM_DEMO4.
echo off
warning on