Skip to content

Commit

Permalink
OPENNLP-1648 Update Website with new models (1.2)
Browse files Browse the repository at this point in the history
- updates models.ad to cover all new 9 languages
- adds lemmatization subsection to models.ad
- adds news entry for the models release 1.2 (2024-11-23)
- adjusts number of models: 23 -> 32 in models
  • Loading branch information
mawiesne committed Nov 23, 2024
1 parent 48f52b8 commit ef80e3c
Show file tree
Hide file tree
Showing 8 changed files with 713 additions and 239 deletions.
24 changes: 12 additions & 12 deletions src/main/jbake/assets/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,27 @@
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://opennlp.apache.org/</loc>
<lastmod>2017-11-02T10:51:58+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>1.00</priority>
</url>
<url>
<loc>https://opennlp.apache.org/download.html</loc>
<lastmod>2017-12-25T21:04:09+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/models.html</loc>
<lastmod>2017-12-25T21:04:09+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/maven-dependency.html</loc>
<lastmod>2017-12-25T21:04:09+00:00</lastmod>
<lastmod>2024-10-29T12:30:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/news/</loc>
<lastmod>2017-12-25T21:04:09+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
Expand All @@ -46,17 +46,17 @@
</url>
<url>
<loc>https://opennlp.apache.org/docs/</loc>
<lastmod>2017-12-25T21:04:09+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/faq.html</loc>
<lastmod>2017-11-02T10:51:58+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/get-involved.html</loc>
<lastmod>2017-11-19T19:45:09+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
Expand All @@ -66,12 +66,12 @@
</url>
<url>
<loc>https://opennlp.apache.org/using-git.html</loc>
<lastmod>2017-11-02T10:51:58+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/building.html</loc>
<lastmod>2023-06-09T00:10:58+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
Expand All @@ -81,12 +81,12 @@
</url>
<url>
<loc>https://opennlp.apache.org/release.html</loc>
<lastmod>2017-11-02T10:51:58+00:00</lastmod>
<lastmod>2024-11-23T10:10:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
<loc>https://opennlp.apache.org/team.html</loc>
<lastmod>2017-11-02T10:51:58+00:00</lastmod>
<lastmod>2024-05-20T07:20:00+00:00</lastmod>
<priority>0.80</priority>
</url>
<url>
Expand Down
2 changes: 1 addition & 1 deletion src/main/jbake/content/faq.ad
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ link:https://stackoverflow.com/questions/tagged/opennlp[forums,role=external,win

[qanda]
Where can I download the pre-trained models used in OpenNLP?::
Models for 23 languages are available at the project's link:/models.html[Models download] page or
Models for 32 languages are available at the project's link:/models.html[Models download] page or
bundled in JAR files distributed via *Maven Central*
(link:https://central.sonatype.com/search?q=opennlp+models+sentdetect[Sentence-Detector,role=external,window=_blank],
link:https://central.sonatype.com/search?q=opennlp+models+tokenizer[Tokenization,role=external,window=_blank],
Expand Down
856 changes: 640 additions & 216 deletions src/main/jbake/content/models.ad

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion src/main/jbake/content/news/news-2021-05-30.ad
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,7 @@ Apache OpenNLP
:idprefix:

Pre-trained sentence, parts of speech, and token models are now available for English, French, Italian, German, and Dutch.
These models were trained on Universal Dependencies and are intended to provide usable models under the Apache 2.0 license.
These models were trained on link:https://universaldependencies.org[Universal Dependencies,role=external,window=_blank] (UD) and are intended to provide usable models under the Apache 2.0 license.
See the models' README for more information on the models including how each was created and evaluated.

--The Apache OpenNLP Team
16 changes: 10 additions & 6 deletions src/main/jbake/content/news/news-2024-10-28.ad
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
specific language governing permissions and limitations
under the License.
////
= New OpenNLP Pre-trained Models released
= OpenNLP Pre-trained Models 1.1 released
Apache OpenNLP
2024-10-28
:jbake-type: post
Expand All @@ -25,16 +25,20 @@ Apache OpenNLP
:category: news
:idprefix:

New pre-trained sentence, parts of speech, and token models for 18 (Indo-European) languages are now available for:
New pre-trained _sentence detection_, _tokenization_, and _parts of speech tagging_ models for 18 (Indo-European) languages are now available for:

* Bulgarian, Czech, Croatian, Danish, Estonian, Finnish, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, and Ukrainian.

The existing sentence, parts of speech, and token models for these 5 languages:
The existing _sentence detection_, _tokenization_, and _parts of speech tagging_ models for these 5 languages:

* Dutch, English, French, German, and Italian

were re-trained. The French models are now based on a GSD treebank, as the previously used FTB treebank https://universaldependencies.org/fr/index.html[is not maintained] and has therefore been discontinued by the https://universaldependencies.org[Universal Dependencies] (UD) project.
were re-trained. The French models are now based on a GSD treebank, as the previously used
FTB treebank link:https://universaldependencies.org/fr/index.html[is not maintained,role=external,window=_blank] and
has therefore been discontinued by the link:https://universaldependencies.org[Universal Dependencies,role=external,window=_blank] (UD) project.

All models were trained with OpenNLP 2.4.0 based on the UD release https://hdl.handle.net/11234/1-5502[2.14] and are intended to provide usable models under the Apache 2.0 license.
All models were trained with OpenNLP 2.4.0 based on the UD release link:https://hdl.handle.net/11234/1-5502[2.14,role=external,window=_blank] and are intended to provide usable models under the Apache 2.0 license.
These models are available as JAR artifacts via Maven Central, or directly as plain, binary files via our link:/models.html[models page].
See the models' README for more information on the models including how each was created and evaluated.
See the models' link:https://dist.apache.org/repos/dist/release/opennlp/models/ud-models-1.1/README[README,role=external,window=_blank] for more information on the models including how each was created and evaluated.

--The Apache OpenNLP Team
43 changes: 43 additions & 0 deletions src/main/jbake/content/news/news-2024-11-23.ad
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
= OpenNLP Pre-trained Models 1.2 released
Apache OpenNLP
2024-11-23
:jbake-type: post
:jbake-tags: news
:jbake-status: published
:category: news
:idprefix:

New pre-trained _sentence detection_, _tokenization_, _parts of speech tagging_, and _lemmatization_ models for 9 languages are now available for:

* Armenian, Basque, Catalan, Georgian, Greek, Kazakh, Korean, Icelandic, and Turkish.

The existing _sentence detection_, _tokenization_, and _parts of speech tagging_ models for the 23 languages,
published with the link:/news/news-2024-10-28.html[models release 1.1], have been re-trained.
In addition, new _lemmatization_ models have been trained and added,
based on https://universaldependencies.org[Universal Dependencies,role=external,window=_blank] (UD) treebanks.

All 32 models were trained with OpenNLP 2.5.0 based on the UD release https://hdl.handle.net/11234/1-5787[2.15,role=external,window=_blank]
and are intended to provide usable models under the Apache 2.0 license.
These models will be available as JAR artifacts via Maven Central, or directly as binary files via our link:/models.html[models page].
See the models' link:https://dist.apache.org/repos/dist/release/opennlp/models/ud-models-1.2/README[README,role=external,window=_blank]
for more information, including how each was created and evaluated.

--The Apache OpenNLP Team
2 changes: 1 addition & 1 deletion src/main/jbake/content/news/release-250.ad
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,6 @@ The OpenNLP Brat Annotator component has been moved to the OpenNLP sandbox repos

Thank you to everyone who contributed to this release, including all of our users and the people who submitted bug reports, contributed code or documentation enhancements.

For a full list of improvements, please see the full list found in link:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12354554[Jira].
For a full list of improvements, please see the full list found in link:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12354554[Jira,role=external,window=_blank].

--The Apache OpenNLP Team
5 changes: 3 additions & 2 deletions src/main/jbake/jbake.properties
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,6 @@ asciidoctor.attributes.export=true
asciidoctor.attributes.export.prefix=
opennlp.version=2.5.0
opennlp.next.version=2.5.1-SNAPSHOT
opennlp.downloads=https://downloads.apache.org/opennlp/models/ud-models-1.1
ud.version=2.14
opennlp.models=ud-models-1.2
opennlp.downloads=https://downloads.apache.org/opennlp/models
ud.version=2.15

0 comments on commit ef80e3c

Please sign in to comment.