forked from kurtmckee/feedparser
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
324 lines (270 loc) · 17 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
5.1 - ?? ??, 2011
* Extensive, extensive unit test refactoring
* Support Jython somewhat (almost all unit tests pass)
* Support Python 3.2
* Fix Python 3 issues exposed by improved unit tests
* Fix international domain name issues exposed by improved unit tests
* Issue 148 (loose parser doesn't always return unicode strings)
* Issue 247 (mssql date parser uses hardcoded tokyo timezone)
* Issue 249 (KeyboardInterrupt and SystemExit exceptions being caught)
* Issue 250 (`updated` can be a 9-tuple or a string, depending on context)
* Issue 252 (running setup.py in Python 3 fails due to missing sgmllib)
* Issue 260 (Python 3 doesn't decompress gzip'ed or deflate'd content)
* Issue 261 (popping from empty tag list)
* Issue 264 (vcard parser crashes on non-ascii characters)
* Issue 265 (http header comparisons are case sensitive)
* Issue 271 (monkey-patching sgmllib breaks other libraries)
* Issue 272 (can't pass bytes or str to `parse()` in Python 3)
* Issue 275 (`_parse_date()` doesn't catch OverflowError)
* Issue 276 (mutable types used as default values in `parse()`)
* Issue 277 (`python3 setup.py install` fails)
* Issue 281 (`_parse_date()` doesn't catch ValueError)
* Issue 282 (`_parse_date()` crashes when passed `None`)
* Issue 285 (crash on empty xmlns attribute)
* Issue 286 ('apos' character entity not handled properly)
* Issue 289 (add an option to disable microformat parsing)
* Issue 290 (Blogger's invalid img tags are unparseable)
* Issue 292 (atom id element not explicitly supported)
* Issue 294 ('categories' key exists but raises KeyError)
* Issue 297 (unresolvable external doctype causes crash)
* Issue 298 (nested nodes clobber actual values)
* Issue 300 (performance improvements)
* Issue 303 (unicode characters cause crash during relative uri resolution)
* Remove "Hot RSS" support since the format doesn't actually exist
* Remove the feedparser command line interface
* Remove the Zope interoperability hack
* Remove extraneous whitespace
5.0.1 - February 20, 2011
* Fix issue 91 (invalid text in XML declaration causes sanitizer to crash)
* Fix issue 254 (sanitization can be bypassed by malformed XML comments)
* Fix issue 255 (sanitizer doesn't strip unsafe URI schemes)
5.0 - January 25, 2011
* Improved MathML support
* Support microformats (rel-tag, rel-enclosure, xfn, hcard)
* Support IRIs
* Allow safe CSS through sanitization
* Allow safe HTML5 through sanitization
* Support SVG
* Support inline XML entity declarations
* Support unescaped quotes and angle brackets in attributes
* Support additional date formats
* Added the `request_headers` argument to parse()
* Added the `response_headers` argument to parse()
* Support multiple entry, feed, and source authors
* Officially make Python 2.4 the earliest supported version
* Support Python 3
* Bug fixes, bug fixes, bug fixes
===============================================================================
1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements,
added Simon Fell's test suite
1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections
2.0 - 10/19/2002
JD - use inchannel to watch out for image and textinput elements which can
also contain title, link, and description elements
JD - check for isPermaLink='false' attribute on guid elements
JD - replaced openAnything with open_resource supporting ETag and
If-Modified-Since request headers
JD - parse now accepts etag, modified, agent, and referrer optional
arguments
JD - modified parse to return a dictionary instead of a tuple so that any
etag or modified information can be returned and cached by the caller
2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything
because of etag/modified, return the old etag/modified to the caller to
indicate why nothing is being returned
2.0.2 - 10/21/2002 - JB - added the inchannel to the if statement, otherwise its
useless. Fixes the problem JD was addressing by adding it.
2.1 - 11/14/2002 - MAP - added gzip support
2.2 - 1/27/2003 - MAP - added attribute support, admin:generatorAgent.
start_admingeneratoragent is an example of how to handle elements with
only attributes, no content.
2.3 - 6/11/2003 - MAP - added USER_AGENT for default (if caller doesn't specify);
also, make sure we send the User-Agent even if urllib2 isn't available.
Match any variation of backend.userland.com/rss namespace.
2.3.1 - 6/12/2003 - MAP - if item has both link and guid, return both as-is.
2.4 - 7/9/2003 - MAP - added preliminary Pie/Atom/Echo support based on Sam Ruby's
snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>; changed
project name
2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree);
removed unnecessary urllib code -- urllib2 should always be available anyway;
return actual url, status, and full HTTP headers (as result['url'],
result['status'], and result['headers']) if parsing a remote feed over HTTP --
this should pass all the HTTP tests at <http://diveintomark.org/tests/client/http/>;
added the latest namespace-of-the-week for RSS 2.0
2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom
User-Agent (otherwise urllib2 sends two, which confuses some servers)
2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for
inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or
textInput, and also to return the character encoding (if specified)
2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking
nested divs within content (JohnD); fixed missing sys import (JohanS);
fixed regular expression to capture XML character encoding (Andrei);
added support for Atom 0.3-style links; fixed bug with textInput tracking;
added support for cloud (MartijnP); added support for multiple
category/dc:subject (MartijnP); normalize content model: 'description' gets
description (which can come from description, summary, or full content if no
description), 'content' gets dict of base/language/type/value (which can come
from content:encoded, xhtml:body, content, or fullitem);
fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang
tracking; fixed bug tracking unknown tags; fixed bug tracking content when
<content> element is not in default namespace (like Pocketsoap feed);
resolve relative URLs in link, guid, docs, url, comments, wfw:comment,
wfw:commentRSS; resolve relative URLs within embedded HTML markup in
description, xhtml:body, content, content:encoded, title, subtitle,
summary, info, tagline, and copyright; added support for pingback and
trackback namespaces
2.7 - 1/5/2004 - MAP - really added support for trackback and pingback
namespaces, as opposed to 2.6 when I said I did but didn't really;
sanitize HTML markup within some elements; added mxTidy support (if
installed) to tidy HTML markup within some elements; fixed indentation
bug in _parse_date (FazalM); use socket.setdefaulttimeout if available
(FazalM); universal date parsing and normalization (FazalM): 'created', modified',
'issued' are parsed into 9-tuple date format and stored in 'created_parsed',
'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified'
and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa
2.7.1 - 1/9/2004 - MAP - fixed bug handling " and '. fixed memory
leak not closing url opener (JohnD); added dc:publisher support (MarekK);
added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK)
2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed <br/> tags in
encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL);
fixed relative URI processing for guid (skadz); added ICBM support; added
base64 support
2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many
blogspot.com sites); added _debug variable
2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing
3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available);
added several new supported namespaces; fixed bug tracking naked markup in
description; added support for enclosure; added support for source; re-added
support for cloud which got dropped somehow; added support for expirationDate
3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking
xml:base URI, one for documents that don't define one explicitly and one for
documents that define an outer and an inner xml:base that goes out of scope
before the end of the document
3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level
3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version']
will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized;
added support for creativeCommons:license and cc:license; added support for
full Atom content model in title, tagline, info, copyright, summary; fixed bug
with gzip encoding (not always telling server we support it when we do)
3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail
(dictionary of 'name', 'url', 'email'); map author to author_detail if author
contains name + email address
3.0b8 - 1/28/2004 - MAP - added support for contributor
3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added
support for summary
3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from
xml.util.iso8601
3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain
dangerous markup; fiddled with decodeEntities (not right); liberalized
date parsing even further
3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right);
added support to Atom 0.2 subtitle; added support for Atom content model
in copyright; better sanitizing of dangerous HTML elements with end tags
(script, frameset)
3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img,
etc.) in embedded markup, in either HTML or XHTML form (<br>, <br/>, <br />)
3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under
Python 2.1
3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS;
fixed bug capturing author and contributor URL; fixed bug resolving relative
links in author and contributor URL; fixed bug resolvin relative links in
generator URL; added support for recognizing RSS 1.0; passed Simon Fell's
namespace tests, and included them permanently in the test suite with his
permission; fixed namespace handling under Python 2.1
3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15)
3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023
3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei);
use libxml2 (if available)
3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author
name was in parentheses; removed ultra-problematic mxTidy support; patch to
workaround crash in PyXML/expat when encountering invalid entities
(MarkMoraes); support for textinput/textInput
3.0b20 - 4/7/2004 - MAP - added CDF support
3.0b21 - 4/14/2004 - MAP - added Hot RSS support
3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in
results dict; changed results dict to allow getting values with results.key
as well as results[key]; work around embedded illformed HTML with half
a DOCTYPE; work around malformed Content-Type header; if character encoding
is wrong, try several common ones before falling back to regexes (if this
works, bozo_exception is set to CharacterEncodingOverride); fixed character
encoding issues in BaseHTMLProcessor by tracking encoding and converting
from Unicode to raw strings before feeding data to sgmllib.SGMLParser;
convert each value in results to Unicode (if possible), even if using
regex-based parsing
3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain
high-bit characters in attributes in embedded HTML in description (thanks
Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in
FeedParserDict; tweaked FeedParserDict.has_key to return True if asking
about a mapped key
3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and
results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could
cause the same encoding to be tried twice (even if it failed the first time);
fixed DOCTYPE stripping when DOCTYPE contained entity declarations;
better textinput and image tracking in illformed RSS 1.0 feeds
3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed
my blink tag tests
3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that
failed to parse utf-16 encoded feeds; made source into a FeedParserDict;
duplicate admin:generatorAgent/@rdf:resource in generator_detail.url;
added support for image; refactored parse() fallback logic to try other
encodings if SAX parsing fails (previously it would only try other encodings
if re-encoding failed); remove unichr madness in normalize_attrs now that
we're properly tracking encoding in and out of BaseHTMLProcessor; set
feed.language from root-level xml:lang; set entry.id from rdf:about;
send Accept header
3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between
iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are
windows-1252); fixed regression that could cause the same encoding to be
tried twice (even if it failed the first time)
3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types;
recover from malformed content-type header parameter with no equals sign
('text/xml; charset:iso-8859-1')
3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities
to Unicode equivalents in illformed feeds (aaronsw); added and
passed tests for converting character entities to Unicode equivalents
in illformed feeds (aaronsw); test for valid parsers when setting
XML_AVAILABLE; make version and encoding available when server returns
a 304; add handlers parameter to pass arbitrary urllib2 handlers (like
digest auth or proxy support); add code to parse username/password
out of url and send as basic authentication; expose downloading-related
exceptions in bozo_exception (aaronsw); added __contains__ method to
FeedParserDict (aaronsw); added publisher_detail (aaronsw)
3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always
convert feed to UTF-8 before passing to XML parser; completely revamped
logic for determining character encoding and attempting XML parsing
(much faster); increased default timeout to 20 seconds; test for presence
of Location header on redirects; added tests for many alternate character
encodings; support various EBCDIC encodings; support UTF-16BE and
UTF16-LE with or without a BOM; support UTF-8 with a BOM; support
UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no
XML parsers are available; added support for 'Content-encoding: deflate';
send blank 'Accept-encoding: ' header if neither gzip nor zlib modules
are available
3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure
problem tracking xml:base and xml:lang if element declares it, child
doesn't, first grandchild redeclares it, and second grandchild doesn't;
refactored date parsing; defined public registerDateHandler so callers
can add support for additional date formats at runtime; added support
for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added
zopeCompatibilityHack() which turns FeedParserDict into a regular
dictionary, required for Zope compatibility, and also makes command-
line debugging easier because pprint module formats real dictionaries
better than dictionary-like objects; added NonXMLContentType exception,
which is stored in bozo_exception when a feed is served with a non-XML
media type such as 'text/plain'; respect Content-Language as default
language if not xml:lang is present; cloud dict is now FeedParserDict;
generator dict is now FeedParserDict; better tracking of xml:lang,
including support for xml:lang='' to unset the current language;
recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default
namespace; don't overwrite final status on redirects (scenarios:
redirecting to a URL that returns 304, redirecting to a URL that
redirects to another URL with a different type of redirect); add
support for HTTP 303 redirects
4.0 - MAP - support for relative URIs in xml:base attribute; fixed
encoding issue with mxTidy (phopkins); preliminary support for RFC 3229;
support for Atom 1.0; support for iTunes extensions; new 'tags' for
categories/keywords/etc. as array of dict
{'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0
terminology; parse RFC 822-style dates with no time; lots of other
bug fixes
4.1 - MAP - removed socket timeout; added support for chardet library