-
Notifications
You must be signed in to change notification settings - Fork 0
/
files_sorted.xml
11758 lines (10013 loc) · 571 KB
/
files_sorted.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<author>
<name>Sigfrid Lundberg</name>
</author>
<title>Sigfrid Lundberg's Stuff</title>
<id>http://sigfrid-lundberg.se/files.atom</id>
<link rel="self" href="https://feeds.feedburner.com/SigfridLundbergsStuff?format=xml"/>
<updated>2006-06-08T00:03:00+01:00</updated>
<entry>
<author>
<name>Sigfrid Lundberg</name>
</author>
<title>The Copenhagen cholera epidemic 1853 in contemporary
Danish newspaper prose: Some back-of-an-envelope calculations</title>
<link href="/entries/2022/12/cholera/"/>
<summary>Inspired by the fact that we were all hiding away from the
Covid-19 2020 – 2021, I wanted to take a closer look at some other
epidemic. My hope was to find patterns of change in lan- guage use
mirroring sentiments and attitudes expressed in words, bigrams and
trigrams and frequency distributions. I started to write this May
2020 and completed it about two years later.</summary>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>A tentative analysis of how the cholera epidemic 1853 is covered
in contemporary Danish newspapers. Words related to the
epidemics increases rapidly in frequency at the outbreaks, but
decreases much more slowly. Obviously the public was not
prepared for the appearance, but the discussions of the outbreak
continues several years after. The temporal distribution of
trigrams containing locations (likec holerainplace name) and
temporal info (lik ec holera er,harorhavde, i.e., is, has and
had follo wed by some word, often an adjective) follow the
outbreak as well.</p>
<p>
<a href="https://raw.githubusercontent.com/siglun/traces-of-historical-plagues/master/bar_diagram.pdf">PDF file</a><br/>
<a href="https://github.com/siglun/traces-of-historical-plagues">github project</a>
</p>
</div>
</content>
<dc:date>2022</dc:date>
<category label="essays" term="Essays"/>
<category label="stories" term="Stories"/>
<category label="media" term="Media"/>
<updated>2022-12-02T12:52:03+01:00</updated>
<id>https://sigfrid-lundberg.se/entries/2022/12/cholera/</id>
</entry>
<entry><author>
<name>Sigfrid Lundberg</name>
</author><title>Sex, death and sonnets</title>
<link href="/entries/2022/11/sonnets/"/>
<summary>A note on how to analyse poetry encoded in Text Encoding Initiative XML</summary>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml"><!-- shit comes below --><h1 class="title">Sex, death and sonnets<br/>Musings of a software developer</h1><p class="author">Sigfrid Lundberg<br/>slu@kb.dk<br/>Digital Transformation<br/>Royal Danish Library<br/>Post box 2149<br/>1016 Copenhagen K<br/>Denmark<br/></p><blockquote class="abstract"><h3>Abstract</h3>
<p>This note discusses how software can recognize sonnets, by
analysis of text length, strophe structure and number of syllables
per line. It also makes a simple content analysis based on
word frequency analyses.</p>
<p>The results clearly shows that simple Unix™ for Poets
analyses combines seamlessly with TEI markup and XML technologies.</p>
</blockquote>
<h2>Introduction</h2>
<p>If there are any sonnets, do they rhyme and what are they about?</p>
<p>I have since many years been a great fan of the tutorial <em>Unix™ for Poets</em> by <a href="#kennethchurch">Kenneth Ward Church.</a>
This note is an investigation of what can be done with a corpus of literary text with very simple tools similar to the ones described by Church in his tutorial.
I do not claim that there is anything novel or even significant in
this text. Being a scientist, I think like a scientist and don't
expect any deep literary theory here.</p>
<h2>Finding poems</h2>
<p>The ADL text corpus contains <a href="#adlcorpus">literary texts.</a>
Since the texts are encoded according to the <a href="#teiguidelines">TEI guidelines</a> it is easy to find poetry in those files.
Typically a piece of poetry is encoded as <a href="#tei-ref-lg">lines within line groups</a>.
More often than not the line groups are embedded in <kbd><div> ... </div></kbd> elements.</p>
<p>A poem may look like this in the source.
The poem is by <a href="#sophus">Sophus Michaëlis (1883).</a></p>
<pre>
<div decls="#biblid68251">
<head>Jeg elsker —</head>
<lg>
<l>Jeg elsker Himlens høje Harmoni,</l>
<l>dens Purpurblomst, som blaaner i det Fjærne,</l>
<l>den Fred, som risler ned fra Nattens Stjerne,</l>
<l>det Glimt af Gud, der glider mig forbi;</l>
</lg>
<lg>
<l>og Evighedens tavse Melodi,</l>
<l>de svundne Slægters kaldende Orkester,</l>
<l>et Tonehav om en usynlig Mester,</l>
<l>en Klang af Gud, der bruser mig forbi;</l>
</lg>
<lg>
<l>en magisk Magt fra Hjertets mørke Celle,</l>
<l>de stærke Længsler, som mod Lyset vælde,</l>
<l>Naturens evigunge Fantasi;</l>
</lg>
<lg>
<l>det Liv, der spirer midt i selve Døden,</l>
<l>den Sol, der stiger midt i Aftenrøden,</l>
<l>— o Glimt af Gud, der glider mig forbi!</l>
</lg>
<p>
<date>12. Septbr. 1893.</date>
</p>
</div>
</pre>
<p>The default name space is declared as
xmlns="http://www.tei-c.org/ns/1.0", which we in following refer to
with the namespace prefix 't'.</p>
<p>The poem comprises four line groups with four, four, three
and three lines. That is a very common strophe structure
(according to the <a href="#sonnets">Sonnets</a> article
in Wikipedia), at least in Scandinavia. It is not always like
that, but they all contain 14 lines.</p>
<p>Shakespeare wrote often his 14 lines typographically in one
strophe, whereas Francesco Petrarca wrote them in two strophes
with eight and six lines, respectively (again see article
<a href="#sonnets">Sonnets</a> in Wikipedia).</p>
<p>To be more precise, a sonnet has one more characteristics
than having 14 lines, the lines should be in <a href="#pentameter">iambic pentameter.</a></p>
<h2>Finding sonnets</h2>
<p>You can easily find all poems in the corpus based on a
XPATH query like:</p>
<pre>
//t:div[t:lg and @decls]
</pre>
<p>We can use that query in XSLT like this:</p>
<pre>
<xsl:for-each select="//t:div[t:lg and @decls]">
<xsl:if test="count(.//t:lg/t:l)=14">
<!-- script's got to do what a script's got to do -->
</xsl:if>
</xsl:for-each>
</pre>
<p>So we iterate over all <kbd><div>...</div></kbd>s having
line groups inside and have a `@decls` attribute containing a
reference to metadata in the TEI header.
The latter is not universal, but we use it in ADL and that attribute is only set on pieces that a cataloger has designated as a <em>work.</em>
The decisions as to what is a work was based on the experience of what library patrons ask for at the information desk.
I have implemented this using the shell script <a href="https://github.com/siglun/danish-sonnets/blob/main/find_sonnet_candidates.sh">find_sonnet_candidates.sh</a> and a transform <a href="https://github.com/siglun/danish-sonnets/blob/main/sonnet_candidate.xsl">sonnet_candidate.xsl</a>.
Finally, we don't do anything unless there are 14 lines of poetry.</p>
<p>This transformation creates a long, <a href="https://github.com/siglun/danish-sonnets/blob/main/sonnet_candidates.xml">sonnet_candidates.xml</a>, table with data about
the sonnet candidates it finds.</p>
<h2>Approximately pentametric</h2>
<p>Finding <kbd> <div>...</div></kbd>s having 14 lines of poetry isn't good
enough. We are expecting iambic pentameter, don't we? To actually analyse
the texts for their rythmical properties is beyond me, but we could
make an approximation.</p>
<p>Iambic verse consists of feet with two syllables, i.e. if there are
five feet per line we could say that iambic verse has approximately 10
vowels per line. It is an approximation since a iamb should have the
stress on the second syllable (due to ignorance I ignore the musical
aspect of this; we will include false positives since lines of poetry
with five feet must not be <strong>iambic.</strong></p>
<p>Any way, this script calculates the average number of
vowels per line in poems with 14 lines:</p>
<pre>
<xsl:variable name="vowel_numbers" as="xs:integer *">
<xsl:for-each select=".//t:lg/t:l">
<xsl:variable name="vowels">
<xsl:value-of select="replace(.,'[^iyeæøauoå]','')"/>
</xsl:variable>
<xsl:value-of select="string-length($vowels)"/>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="format-number(sum($vowel_numbers) div 14, '#.####')"/>
</pre>
<p>We use the replace function and a regular expression to
remove everything in each line except the vowels. Then we
measure the string length which should equal the number of
vowels per line and add them together for all lines in the
poem. Finally we divide that sum with 14 and get the average
number of vowels per line.</p>
<p>For a sonnet it would be about 10,
<a href="#hendecasyllable">or occasionally a little more.</a>
Danish is a language rich in diftons,
which could be another reason for lines deviating from the expected 10 vowels.
In the Michaëlis poem quoted above it is 10.4.</p>
<h2>Strophe structure</h2>
<p>You can write a lot of nice poetry with 14 lines.
Like Gustaf Munch-Petersen's <a href="https://tekster.kb.dk/text/adl-texts-munp1-shoot-workid62017">en borgers livshymne</a> with one strophe with one line,
then three strophes with four lines and finally a single line.
The number of syllables per line seem to decrease towards the end.
Gustaf was a modernist. There are no fixed structures and very few rhymes i his poetry.</p>
<p>You can easily find out the strophe structure for each poem:</p>
<pre>
<xsl:variable name="lines_per_strophe" as="xs:integer *">
<xsl:for-each select=".//t:lg[t:l]">
<xsl:value-of select="count(t:l)"/>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="$lines_per_strophe"/>
</pre>
<p>That is, iterate over the line groups in a poem, and count the lines
in each of them.</p>
<p>I have summarized these data about all poems in ADL with 14lines.
There are 243 of them (there might be more, but then they have erroneous markup).</p>
<p>You find these sonnet candidates in a table here <a href="https://github.com/siglun/danish-sonnets/blob/main/sonnet_candidates.xml">sonnet_candidates.xml.</a>
Please, find an extract from it below.</p>
<table>
<h2>sonnet candidates</h2>
<tr>
<th>File name (link to source)</th>
<th>Title (link to view)</th>
<th>xml:id</th>
<th>metadata reference</th>
<th>Strophe structure</th>
<th>average number of vowels per line</th>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/aarestrup07val.xml">./aarestrup07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-aarestrup07val-shoot-workid73888">Jeg havde faaet Brev fra dig, Nanette</a>
</td>
<td>workid73888</td>
<td>#biblid73888</td>
<td>4 4 3 3</td>
<td>11.0</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/aarestrup07val.xml">./aarestrup07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-aarestrup07val-shoot-workid75376">Tag dette Kys, og tusind til, du Søde ...</a>
</td>
<td>workid75376</td>
<td>#biblid75376</td>
<td>4 4 3 3</td>
<td>11.0714</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/aarestrup07val.xml">./aarestrup07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-aarestrup07val-shoot-workid76444">Sonet</a>
</td>
<td>workid76444</td>
<td>#biblid76444</td>
<td>4 4 3 3</td>
<td>11.5</td>
</tr>
<tr><td><a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/./brorson03grval.xml">./brorson03grval.xml</a></td>
<td><a href="https://tekster.kb.dk/text/adl-texts-brorson03grval-shoot-workid76607">1.</a></td>
<td>workid76607</td>
<td>#biblid76607</td>
<td>14</td>
<td>8.7143</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/claussen07val.xml">./claussen07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-claussen07val-shoot-workid63580">SKUMRING</a>
</td>
<td>workid63580</td>
<td>#biblid63580</td>
<td>14</td>
<td>10.8571</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/claussen07val.xml">./claussen07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-claussen07val-shoot-workid66036">TAAGE OG REGNDAGE</a>
</td>
<td>workid66036</td>
<td>#biblid66036</td>
<td>4 4 3 3</td>
<td>13.9286</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/claussen07val.xml">./claussen07val.xml</a>
</td>
<td>
<a href="https://tekster.kb.dk/text/adl-texts-claussen07val-shoot-workid66131">MAANENS TUNGSIND</a>
</td>
<td>workid66131</td>
<td>#biblid66131</td>
<td>4 4 3 3</td>
<td>13.8571</td>
</tr>
<tr>
<td>
<a href="https://github.com/kb-dk/public-adl-text-sources/blob/master/texts/jacobjp08val.xml">./jacobjp08val.xml</a>
</td>
<td><a href="https://tekster.kb.dk/text/adl-texts-jacobjp08val-shoot-workid63094">I Seraillets Have</a></td>
<td>workid63094</td>
<td>#biblid63094</td>
<td>14</td>
<td>6.7143</td>
</tr>
</table>
<p>Sophus Claussen's first poem may or may not be a sonnet,
Brorson's poem is not. All of those with strophe structure 4
4 3 3 are definitely sonnets, as implied by strophe
structure and the "approximately pentametric" number of
vowels per line (and, by the way, Aarestrup often points out
that he is actually writing sonnets in text or titles).</p>
<h2>Then we have the rhymes</h2>
<p>Beauty is in the eye of the beholder, says Shakespeare. I believe that
he is right. Then, however, I would like to add that the rhymes and
meters of poetry (like the pentameter) is in the ear of listener. It
is time consuming to read houndreds of poems aloud and figure out the
rhyme structure. So an approximate idea of the rhymes could be have
comparing the verse line endings.</p>
<p>This is error prone, though. Consider this <a href="https://tekster.kb.dk/text/adl-texts-moeller01val-shoot-workid62307">sonnet by P.M. Møller</a>.</p>
<div id="">
<p><small>SONET</small></p>
<p>
Den Svend, som Tabet af sin elskte frister,<br/>
Vildfremmed vanker om blandt Jordens Hytter;<br/>
Med Haab han efter Kirkeklokken lytter,<br/>
Som lover ham igen, hvad her han mister.<br/>
</p>
<p>
Men næppe han med en usalig bytter,<br/>
Hvis Hjerte, stedse koldt for Elskov, brister,<br/>
Som sig uelsket gennem Livet lister,<br/>
Hans Armod kun mod Tabet ham beskytter.<br/></p>
<p>
Til Livets Gaade rent han savner Nøglen,<br/>
Hver Livets Blomst i Hjærtets Vinter fryser,<br/>
Han gaar omkring med underlige Fagter.<br/>
</p>
<p>
Ræd, Spøgelser han ser, naar Solen lyser,<br/>
Modløs og syg, foragtet han foragter<br/>
Det skønne Liv som tom og ussel Gøglen.<br/>
</p>
</div>
<p>The the last syllable of the eight first lines are the same '-ter'. If
you use some script to compare the endings you'll only find single
syllable rhymes and miss double syllable ones rhymes. I.e., you can
erroneously categorize feminine rhymes (with two syllables) as
masculine ones (with one syllable). (Sorry, I don't know a
politically correct vocabulary for these concepts.)</p>
<p>In order to understand what we hear when reading, we have to consider
'-ister' and '-ytter'. I.e., it starts with rhyme structure 'abbabaab'
not 'aaaaaaaa'. Furthermore, it continues 'cdedec'.</p>
<p>I have written a set of scripts that traverse the
<a href="https://github.com/siglun/danish-sonnets/blob/main/sonnet_candidates.xml">sonnet_candidates.xml</a>
table.
Transform that file using <a href="https://github.com/siglun/danish-sonnets/blob/main/iterate_the_rhyming.xsl">iterate_the_rhyming.xsl</a>
selects poems with 14 lines and strophe structure 4 4 3 3.
It generates a shell script which when executed pipes the content through other scripts that retrieve content,
remove punctuation and finally detags them.
The actual text is then piped through a perl script that
analyse the endings according to the silly and flawed method described
above.</p>
<p>It works, sort of, until it doesn't. For poems with 4
4 3 3 strophe structure, you can find the result in <a href="https://github.com/siglun/danish-sonnets/blob/main/rhymes_3chars.text">rhymes_3chars.text</a> and <a href="https://github.com/siglun/danish-sonnets/blob/main/rhymes_2chars.text">rhymes_2chars.text</a> for three
and two letter rhymes, respectively. Run </p>
<pre>
grep -P '^[a-q]{14}' rhymes_3chars.text | sort | uniq -c | sort -rn
</pre>
<p>to get a list of rhyme structure and their frequencies. The rhyme
structures that occur more than twice are:</p>
<pre>
6 abbaabbacdecde
5 abbaabbacdcdcd
4 abcaadeafgghii
4 abbaabbacdcede
3 abcaadeafghgig
</pre>
<p>This silly algorithm does actually give two of the most common rhyme structure
for sonnets, but misses a lot of order in the remaining chaos:</p>
<pre>abbaabbacdcdcd</pre>
<p>and</p>
<pre>abbaabbacdecde</pre>
<p>So while it may fail more often than it succeeds, the successes give
results that are reasonable.</p>
<p>The rhyme structure abbaabbacdecde is one is the most
common ones found. Also it is one of the socalled Petrarchan
rhyme schemes (<a href="#everysonnet">Eberhart, 2018</a>).</p>
<h2>What are the sonnets about?</h2>
<p>Any piece of art is meant to be consumed by humans. Poems should
ideally be understood when read aloud and listened to. By humans.</p>
<p>The cliché says that art and literature is about what it means to be
human. Could we therefore hypothesize that the sonnets address this
from the point of view of dead Danish male poets who wrote sonnets
some 100 – 200 years ago?</p>
<p>Assume that, at least as a first approximation, the words chosen by
poets mirror those subjects. For instance, if being human implies
lethality, we could, on a statistical level hypothesize that words like
"mourning", "grief", "death", "grave", etc appear in the sonnet corpus
more than in a random sample of text. The opposites would also be
expected: Concepts related to "love", "birth", "compassion" belong
to the sphere of being human.</p>
<p>I have detagged the poems with 14 lines and strophe structure 4 4 3 3,
tokenized their texts and calculated the word frequencies. As a matter
of fact, I've done that in two ways:</p>
<p>(i) The first being doing a classical tokenization followed by
piping the stuff through</p>
<pre>
sort | uniq -c | sort -n
</pre>
<p>such that I get a list of the 4781 Danish words that are used in our
sonnet sample, sorted by their frequencies.</p>
<p>(ii) The second way is the same, but I do it twice, once for each
sonnet such that I get a list of words for each sonnet. Then I repeat
that for the concatenated lists for all sonnets.</p>
<p>This means that I get </p>
<ul>
<li>one list of word frequencies in the entire sample and </li>
<li>a second list giving not of the number of occurences of each word, but the number of sonnets the word occurs in.</li>
</ul>
<p>There are 160 sonnets in the selection, and the most frequent word occurs in all of them.
These are the fifteen most commont word measured by the <a href="https://github.com/siglun/danish-sonnets/blob/main/poem_frequencies.text">number of sonnets they occur in</a>.
Number of poems in the left column.</p>
<pre>
75 du
76 sig
82 er
85 jeg
86 det
89 for
94 den
101 paa
104 en
105 af
106 til
119 som
122 med
150 i
160 og
</pre>
<p>and this is the list of the same thing,
but measured as the grand total <a href="https://github.com/siglun/danish-sonnets/blob/main/frequencies.text">occurrence of the words in the corpus</a>.
Number of words in corpus in left column.</p>
<pre>
109 min
130 for
144 du
148 er
155 paa
164 til
167 det
169 den
173 af
206 en
217 med
229 som
246 jeg
382 i
588 og
</pre>
<p>As you can see this corroborates the established observation that the
most frequent words in a corpus hardly ever describes the subject
matter of texts (the words are conjunctions, pronouns,
prepositions and the like). The distribution of the number of sonnets
the words appear in:</p>
<div id="">
<img src="https://github.com/siglun/danish-sonnets/raw/main/distro.png"/>
</div>
<p>The distribution shows number of words graphed against
number of sonnets. There are 3304 words occurring in just one
sonnet. The leftmost, and highest, point on the graph has the
coordinate (1,3304).</p>
<p>There is just one word appearing in all 160 sonnets. It is
'og' meaning 'and' correspoding to the rightmost point on the
graph which has the coordinate (160,1). As a rule of thumb the
most common words are all conjunctions, next to them comes
prepositions and after those come pronomina.</p>
<p>The <a href="https://github.com/siglun/danish-sonnets/blob/main/distribution.text">distribution.text</a>
is generated from <a href="https://github.com/siglun/danish-sonnets/blob/main/poem_frequencies.text">poem_frequencies.text</a>
using (the line has been folded)</p>
<pre>
sed 's/\ [a-z]*$//' poem_frequencies.text | sort | uniq -c |
sort -n -k 2 > distribution.text
</pre>
<p>See above. Column 1 is plotted against column 2.</p>
<p>In this particular corpus, it seems that <strong>aboutishness</strong> start at words occuring in about 25% of the sonnets, or less.
I.e., words occuring in 40 sonnets, or fewer.</p>
<p>In what follows,
I have simply used the utility <kbd>grep</kbd> find words and derivates in the file <a href="https://github.com/siglun/danish-sonnets/blob/main/poem_frequencies.text">poem_frequencies.text</a> mentioned above.</p>
<p>As example we have death, dead and lethal etc (basically
words containing <em>død</em>) in a number of
sonnets. In the left column the number of sonnets containing
the word. These appear in about 7% of the sonnets.</p>
<pre>
1 dødehavet
1 dødeklokker
1 dødelige
1 dødeliges
1 dødningvuggeqvad
1 dødsberedthed
1 glemselsdøden
1 udødeliges
2 dødes
5 dødens
9 død
9 døden
11 døde
</pre>
<p>There are interesting derivatives and compound words on the list.
Like <em>dødsberedthed</em> meaning preparedness for death.
<em>Glemselsdøden</em> refers, I believe, to the death or disappearance due
to the disappearance of traces or memories of someone who belonged to generations.</p>
<p>Love (elskov) is not as popular as death (about 5% of the sonnets).</p>
<pre>
1 elskoven
1 elskovsbrev
1 elskovsbrevet
2 elskovsild
6 elskovs
7 elskov
</pre>
<p><em>elskovsild</em> means the fire of
love. <em>elskovsbrev</em> has to be love
letter. <em>women (kvinde)</em> are not as
popular as love</p>
<pre>
1 dobbeltkvinde
1 kvindens
1 kvindetække
4 kvinder
</pre>
<p>Men more than women, and in particular words implying bravery and male virtues</p>
<pre>
1 baadsmandstrille
1 dobbeltmand
1 ejermand
1 manddom
1 manddomstrods
1 manden
2 mand
2 manddoms
5 mandens
</pre>
<p>Remember that these sonnets are by men.
mandom implies a man's existence as a grownup man.
Originally,
in <a href="#oldnorse">old norse</a>,
mand meant,
just as in Old English,
human.
That, however, was when it was doubtful if women were actually human.
Baadsmandstrille is a derivative of baadsmand (boatswain) which is another name for a sailor or petty officer.
A baadsmandstrille is presumably a song sung by sailors.</p>
<p>Graves occur, for some reason, less than deaths</p>
<pre>
1 begravet
1 graven
1 gravene
1 gravhøi
1 indgraves
3 grav
3 grave
4 gravens
</pre>
<p>indgraves is most likely a kind of <em>homonym</em>, if you look up that sonnet it is
clear that it means engrave. There both the verb in past tense
begravet (buried) from begrave (as in bury) and grav (as in
grave) and gravhøi (tumulus).</p>
<h2>Conclusions</h2>
<p>I think I could go on studying this for quite some
time. However, I have to conclude this here, before the actual
conclusions. There are interesting things to find here,
though.
Some of them are possible to study using simple methods,
such as those described by <a href="#kennethchurch">Kenneth Ward Church</a>
in his
<em>Unix™ for Poets</em>.</p>
<p>The preliminary result from my armchair text processing exercise supports the
notion that life was already in early modern Europe about sex, death
and rock n'roll. Since rock wasn't there just yet, people had to be
content with sonnets for the time being.</p>
<h2>References</h2>
<p id="kennethchurch"><span class="biblAuthor">Church, Kenneth Ward</span>,
[date unknown].
<em>Unix™ for Poets</em>. <a href="https://web.stanford.edu/class/cs124/kwc-unix-for-poets.pdf">https://web.stanford.edu/class/cs124/kwc-unix-for-poets.pdf</a></p><p id="adlcorpus"><span class="biblAuthor">Det Kgl. Bibliotek</span>, and <span class="biblAuthor">Det Danske Sprog- og Litteraturselskab</span>,
2000 - 2022.
<em>The ADL text corpus</em>. <a href="https://github.com/kb-dk/public-adl-text-sources">https://github.com/kb-dk/public-adl-text-sources</a></p><p id="everysonnet"><span class="biblAuthor">Eberhart, Larry</span>,
2018.
Italian or Petrarchan Sonnet. In: <em>Every Sonnet: The sonnet forms database</em>. <a href="https://poetscollective.org/everysonnet/tag/abbaabbacdecde/#post-119">https://poetscollective.org/everysonnet/tag/abbaabbacdecde/#post-119</a></p><p id="hendecasyllable"> Hendecasyllable. In: <em>Wikipedia</em>. <a href="https://en.wikipedia.org/wiki/Hendecasyllable">https://en.wikipedia.org/wiki/Hendecasyllable</a></p><p id="pentameter"> Iambic pentameter. In: <em>Wikipedia</em>. <a href="https://en.wikipedia.org/wiki/Iambic_pentameter">https://en.wikipedia.org/wiki/Iambic_pentameter</a></p><p id="sophus"><span class="biblAuthor">Michaëlis, Sophus</span>,
1883.
Jeg elsker —. In: <em>Solblomster</em>. <a href="https://tekster.kb.dk/text/adl-texts-michs_03-shoot-workid68251">https://tekster.kb.dk/text/adl-texts-michs_03-shoot-workid68251</a></p><p id="oldnorse"> Old Norse. In: <em>Wikipedia</em>. <a href="https://en.wikipedia.org/wiki/Old_Norse">https://en.wikipedia.org/wiki/Old_Norse</a></p><p id="sonnets"> Sonnet. In: <em>Wikipedia</em>. <a href="https://en.wikipedia.org/wiki/Sonnet">https://en.wikipedia.org/wiki/Sonnet</a></p><p id="teiguidelines"><span class="biblAuthor">The TEI Consortium</span>,
2022.
<em>TEI P5: Guidelines for Electronic Text Encoding and Interchange</em>. <a href="https://tei-c.org/release/doc/tei-p5-doc/en/html/index.html">https://tei-c.org/release/doc/tei-p5-doc/en/html/index.html</a></p><p id="tei-ref-lg"><span class="biblAuthor">The TEI Consortium</span>,
2022.
Passages of Verse or Drama. In: <em>TEI P5: Guidelines for Electronic Text Encoding and Interchange</em>. <a href="https://tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CODV">https://tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CODV</a></p>
</div>
</content>
<dc:date>2022</dc:date>
<category label="literature" term="Literature"/>
<category label="poetry" term="Poetry"/>
<category label="essays" term="Essays"/>
<updated>2022-11-22T10:13:05+01:00</updated>
<id>https://sigfrid-lundberg.se/entries/2022/11/sonnets/</id>
</entry>
<entry>
<author>
<name>Sigfrid Lundberg</name>
</author>
<title>Between the market and the arts</title>
<link href="/entries/2014/11/artmarkets/"/>
<summary>
A discussion of photography as an art form, and its relation to the art markets.
</summary>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<div style="width:50%;float:right;margin:1%;">
<a href="https://www.flickr.com/photos/sigfridlundberg/15774812515" title="Between Realities">
<img src="https://farm6.staticflickr.com/5613/15774812515_529ac95fdc_c.jpg" width="100%" alt="Between Realities"/></a>
<p><small>Between Realities. 1769 gram photography. The cover
is covered by a bubble gum bubble.</small></p>
</div>
<p>November 2014 I started to write this text after having seen
the exhibition <em>Between realities</em> at Dunkers. For
various reasons I never completed it. Then came, spring 2017, I felt
that I had to polish it a little, and publish it. It was my
reading of Charlotte Cotton's <em>The Photograph as Contemporary
Art,</em> a text I have difficulties to really appreciate, which
triggered me to continue. It never happened, though.</p>
<p>19 August 2014 marked the 175th anniversary of the art of
photography. At least if we see the release of <a href="https://en.wikipedia.org/wiki/Daguerreotype#Invention">Daguerre's
patent</a> as the starting point of photography. Of those 175
years, about 50 were good years for photojournalism. <a href="https://en.wikipedia.org/wiki/Photojournalism#Golden_age">Its
Golden Age started in the 1930s</a>.</p>
<div style="width:50%;float:left;margin:1%;">
<a title="" href="/entries/2014/11/artmarkets/bubble_in_the_art_market_22.png">
<img src="/entries/2014/11/artmarkets/bubble_in_the_art_market_22.png" alt="Sales volume of paintings in different segments" width="100%"/>
</a>
<p>
<small>Figure 1. The number of sales of paintings recorded at major auction
houses from year 1970 to 2013. The four graphs represents four
segments. These are from top to bottom graph year 2000: (1)
Impressionism and modern (solid curve), (2) post worldwar 2 and
contemporary (short dashed), (3) American (long dashed) and (4)
Latin American (long and short dashed). For more precises
definitions of the categories please refer to Kräussl <em>et
al.</em> (2014).</small>
</p>
</div>
<blockquote>
<p>The 'Golden Age of Photojournalism' is often considered to be roughly
the 1930s through the 1950s. It was made possible by the
development of the commercial 35mm Leica camera in 1925, and the first
flash bulbs between 1927 and 1930 that allowed the journalist true
flexibility in taking pictures.</p>
</blockquote>
<p>The glossy magazines continued another 20 years, though. <a href="https://en.wikipedia.org/wiki/Photojournalism#Decline">But
then</a>:</p>
<blockquote>
<p>The Golden Age of Photojournalism ended in the 1970s when many
photo-magazines ceased publication. They found that they could not
compete with other media for advertising revenue to sustain their
large circulations and high costs.</p>
</blockquote>
<p>Those other media were in many parts of the commercial
television. Interestingly, now thirty years after the demise of the
photo-magazines we see how the daily newspapers are losing advertising
revenue since advertising is now moving to the Internet.</p>
<p>It is also interesting to note that the genre started because of one
technical innovation, and that it in its current form started to decline
because of two other changed the viability business models of the press:
television and the Internet.</p>
<div style="width:50%;float:right;margin:1%;">
<a href="https://www.flickr.com/photos/sigfridlundberg/12971021693" title="L1007144_v1 by Sigfrid Lundberg, on Flickr">
<img src="https://farm8.staticflickr.com/7304/12971021693_8c859c16c1_c.jpg" width="100%" alt="L1007144_v1"/></a>
<p>
<small>
From <q>A WAY OF LIFE</q>.<br/>
This is<br/>
This is where<br/>
This is where I<br/>
This is where I'm from<br/>
Hommage à J.H. Engström
</small>
</p>
</div>
<h3>The Photograph as Contemporary Art</h3>
<p>Be that as it may, but the curators has recently been even
more inclined to summarize their stories of Swedish photography
in a number of retrospective exhibitions here in Sweden. There
have also been group exhibitions following some threads from the
past into the present. Less than half a year ago there was one
opening at Moderna Malmö which has now moved the Stockholm head
office, entitled <em>A WAY OF LIFE: Swedish photography from
Christer Strömholm until Today</em>.</p>
<p>The whole idea of this uncompleted entry is to correlate the
demise of the glossy magazine with rise of photography as an art
form, and how the prices of photography follows the ones in art
into the hypothetical bubbles of the western economies.</p>
<p>As usual the reality isn't as easy to grasp as you believe
when you start formulating the hypotheses.</p>
<h3>Addendum</h3>
<p>A lot of things distracted me: Someone didn't want to share
their data, and I got bought a new computer going from SUSE to
Ubuntu forced me to migrate my environment, and when that was
done there was no energy left for what is importan: Writing and
making photographs. My attention was drawn in other directions,
the most important one was the social media <a href="https://twitter.com/sigfridlundberg">notably twitter</a>
which didn't require a whole computer and permitted me to sit
and participate in the global exchange of fast food like content.</p>
<div style="width:50%;float:left;margin:1%;">
<a href="https://www.flickr.com/photos/sigfridlundberg/15154883253" title="L1011113_v1 by Sigfrid Lundberg, on Flickr"><img src="https://farm6.staticflickr.com/5615/15154883253_2c6d33e2b4_c.jpg" width="100%" alt="L1011113_v1"/></a>
</div>
<h3>References</h3>
<p id="kristoffer-louise-niclas">Kristoffer Arvidsson, Louise Wolthers & Niclas Östlind (editors),
2014. <em>Between realities. Photography in Sweden 1970–2000</em>
Bokförlaget Arena, Lund 2014.</p>
<p>Charlotte Cotton, <em>The Photograph as Contemporary
Art</em>, Thames & Hudson, 2014.</p>
<p>A. E. Scorcu & R. Zanola, 2011. "<b><a href="https://ideas.repec.org/p/rim/rimwps/36_11.html">Survival in the
Cultural Market: The Case of Temporary Exhibitions</a></b>," <em><a href="https://ideas.repec.org/s/rim/rimwps.html">Working Paper Series</a></em>
36_11, The Rimini Centre for Economic Analysis.</p>
<p>Kenneth Wieand, Jeff Donaldson & Socorro Quintero, 1998. "<b><a href="https://ideas.repec.org/a/mfj/journl/v2y1998i3p167-187.html">Are
Real Assets Priced Internationally? Evidence from the Art
Market</a></b>," <em><a href="https://ideas.repec.org/s/mfj/journl.html">Multinational Finance
Journal</a></em>, vol. 2(3), pages 167-187,
September.</p>
<p>Nandini Srivastava & Stephen Satchell, 2012. "<b><a href="https://ideas.repec.org/p/bbk/bbkefp/1209.html">Are There Bubbles
in the Art Market? The Detection of Bubbles when Fair Value is
Unobservable</a></b>," <a href="https://ideas.repec.org/s/bbk/bbkefp.html"><em>Birkbeck Working Papers
in Economics and Finance</em></a> 1209, Birkbeck, Department of Economics,
Mathematics & Statistics.</p>
<p>Roman Kräussl, Thorsten Lehnert & Nicolas Martelin, 2014.
"<b><a href="http://ideas.repec.org/p/crf/wpaper/14-07.html">Is
there a Bubble in the Art Market?</a></b>,"
<a href="http://ideas.repec.org/s/crf/wpaper.html"><em>LSF Research Working
Paper Series</em></a>
14-07, Luxembourg School of Finance, University of Luxembourg.</p>
<p>Jeffrey Pompe, 1996.
<a href="https://www.jstor.org/stable/1061182">An Investment Flash:
The Rate of Return for Photographs</a>.
<em>Southern Economic Journal</em>, Vol. 63, No. 2, pp. 488-495</p>
</div>
</content>
<dc:date>2014</dc:date>
<category label="art" term="Art"/>
<category label="photography" term="Photography"/>
<category label="economics" term="Economics"/>
<category label="essays" term="Essays"/>
<updated>2018-04-27T10:19:33+01:00</updated>
<id>https://sigfrid-lundberg.se/entries/2014/11/artmarkets/</id>
<!-- $Id$ -->
</entry>
<entry>
<author>
<name>Sigfrid Lundberg</name>
</author>
<title>Are the Ways of Seeing Francesca Woodman and Edith Gowin the
same?</title>
<link href="/entries/2015/10/francesca/"/>
<summary>Francesca Woodman's work is on exhibition in Stockholm. I look at
her photography, and felt that I had to reread parts of John Berger's <cite>Ways
of seeing</cite>.</summary>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<div style="margin: +1%; width:45%;float:left;">
<a data-flickr-embed="true" href="https://www.flickr.com/photos/sigfridlundberg/21376316183/in/dateposted/" title="In the background are two of Francesca Woodman's Caryatids. There are a large sample of photobooks about Francesca on the table in the foreground">
<img src="https://farm6.staticflickr.com/5671/21376316183_216f52d8ee_z.jpg" width="100%" alt="L1014105_v1"/></a>
<p><small>In the background are two of Francesca Woodman's
Caryatids. There are a large sample of photobooks about
Francesca on the table in the foreground</small></p>
</div>
<p>Francesca Woodman took her life 1981 at 22 years of age. By then she
had become a serious, hardworking and very good artistic
photographer. She had started her exploration of art photography when
she was 13. When considering these early works, we are talking of the
serious endeavour of a precocious, talented teenager, not a child's
play.</p>
<p>Much of her work is self portraiture, and often she portrays herself
undraped. There are several female photographers who have earned
well-deserved fame for (among other things) photographing naked humans,
such as Imogen Cunningham and Ruth Bernard. They both depicted naked
women, but as far as I know, neither Cunningham nor Bernard turned their
cameras towards themselves; with or without clothes.</p>
<p>Francesca Woodman's work is on exhibition, entitled <cite>On being an
Angle</cite> at Moderna Museet, Stockholm. According to Anna Tellgren,
the curator, the reason why Woodman used herself as model was just
practical<a href="#note0"><sup>1</sup></a>. She was there herself when
she needed one, at a lower cost than the alternatives and there
would never be any problems with model release contracts.</p>
<p>However, I am sure there is more to than that. There were brilliant
contemporaries, like Cindy Sherman, who started similar projects late
1970ties. Photographers that have since become important players on the
photography scene. The interest in self-portraiture has increased
through the decades. It is now a fairly common genre and I think it is
more common among women than men.</p>
<p>This text is about my attempt to understand that difference between
the sexes.</p>
<div style="margin: +1%; width:500;float:right;">
<iframe src="https://player.vimeo.com/video/113375054" width="500" height="281" frameborder="0" webkitallowfullscreen="webkitallowfullscreen" mozallowfullscreen="mozallowfullscreen" allowfullscreen="allowfullscreen">
<!-- iframe -->
</iframe>
<p>
<small>
<a href="https://vimeo.com/113375054">Emmett Gowin</a> from
<a href="https://vimeo.com/landscapestories">Landscape Stories</a> on
<a href="https://vimeo.com">Vimeo</a>.
</small>
</p>
</div>
<p>The only things I know about Edith Gowin is that she is a beautiful
woman and that her husband is photographer Emmet Gowin. He is a well
known photographer who earned some of his fame for portraits of his
often scantily clad wife. The two have now celebrated their golden
wedding anniversary since several years. In the interviews you find of
them (for example on YouTube) they seem to be still today a loving
couple. It might be that Emmet don't take as many photographs
of Edith now as he did 40-50 years ago.</p>
<p>Other husbands and photographers take photos of their wives. Some go
far artistically such as Alfred Stieglitz' (1864-1946) portraits
of Georgia O'Keeffe (1887-1986). They arose from an intense love story,
marriage and a long relationship. <sup><a href="#note1">2</a></sup>
Edward Weston shot countless of nudes, of which, according to Robert
Adams, most are fairly uninteresting:</p>
<blockquote>
<p><em>With the exception of two full length nudes of Tina Modotti and
five of Charis Wilson in the Oceano dunes (not many, considering the
number of nudes Weston took), the pictures that supposedly resulted
from Weston's love for his subjects are relative to the rest of his
life's work, unsuccessful; they are cold to the point of being
dead.</em> <a href="#note2"><sup>3</sup></a></p>
</blockquote>
<p>These photos are to be found on many web sites. Suitable searches in
Google images:</p>
<ul>
<li>Edward Westons <a href="https://goo.gl/dGVnjY">portraits of Charis