Rough draft implementation #1

geowurster · 2015-08-05T13:17:02Z

This PR and associated draft branch is intended to start a discussion but not necessarily be merged.

For those not aware of the background, this was spurred by Toblerity/fio-buffer#4. Fiona has a fio cat CLI command that reads an OGR supported vector datasource and prints each feature on its own line. Being able to stream vector features and geometries around is very unixy and makes it easier to develop standalone tools that are good at doing one thing, but there is no standard method for reading and writing these streams.

This PR aims to make reading GeoJSON feature sequences just like reading a text file with open().

import geojseq

with geojseq.open('coutwildrnp.geojson') as src, geojson.open('-', 'w', use_rs=True) as dst:
    for feat in src:
        dst.write(feat)

So now doing something like supporting OGR datasources via something like Fiona AND feature streams would look something like:

import fiona
import geojseq

infile = sys.argv[1]
input_is_sequence = sys.argv[2]

with geojseq.open(infile) if input_is_sequence else fiona.open(infile) as src:
    for feat in src:
        pass

sgillies · 2015-08-12T05:51:43Z

@geowurster i'm back in my home office after a work trip, have shipped Rasterio 0.26, and am 👀 on this at last.

sgillies · 2015-08-13T05:56:12Z

@geowurster I suggest we put geojseq.core.open() aside for now and just consider the stream class.

I feel like FeatureStream in read mode is right on. Using fio cat --rs, I wrote Fiona's test dataset out as an RS-delimited sequence of GeoJSON features and FeatureStream handles these exactly as I'd like. The analogy to csv.reader is going to be super useful for Python programmers.

>>> from geojseq.core import FeatureStream
>>> with open('/tmp/coutwldrnp.jseq', 'r') as f, FeatureStream(f) as src:
...     for ftr in src:
...         print((ftr['id'], ftr['properties']['NAME']))
...
('0', 'Mount Naomi Wilderness')
('1', 'Wellsville Mountain Wilderness')
('2', 'Mount Zirkel Wilderness')
('3', 'High Uintas Wilderness')
('4', 'Rawah Wilderness')
('5', 'Mount Olympus Wilderness')
('6', 'Comanche Peak Wilderness')
('7', 'Cache La Poudre Wilderness')
...

Let's consider separate classes for the reading and writing duties.

geowurster · 2015-08-15T19:46:00Z

@sgillies While the csv module is certainly very similar to what we're trying to accomplish, I'm not convinced its what we should use as a model. I have been using a bunch of different I/O libraries lately and have found that modules lacking something similar open() feel antiquated. I can get behind focusing on the core class or classes first but I think the module.open() pattern is important to a modern I/O library.

I didn't use it for this project because our core file-like object will need to have GeoJSON specific properties etc., but I already have a NewlineJSON project (that needs a bit of work) that started out with Reader() and Writer() classes that were intended to be drop-in replacements for csv.DictReader/Writer() but I found that a central file-like object and newlinejson.open() worked lot better.

Are there more compelling reasons that I'm missing for the csv model? My guess is that csv doesn't have a open() because of the additional complication headers and DictReader/Writer() introduce but we don't have that problem.

Some examples of this change in the stdlib:

Library	Python 2.7	Python 3
gzip	`gzip.open()` for file paths but `gzip.GzipFile(filename=None, fileobj=None)` does both.	`gzip.open()` transparently handles file paths and file-like objects.
bz2	`bz2.BZFile()` only opens file paths. To decompress a file-like object `BZ2Decompressor().decompress()` must be used. Same exists for compressing.	Introduced `bz2.open()` for transparently reading file paths and file-like objects.
lzma	External library	`lzma.open()` that behaves like `gzip` and `bz2`.
tarfile	`tarfile.open(filename=None, fileobj=None)`	Same as Python 2

In contrast, libraries like csv and MsgPack just aren't as streamlined:

import csv

with open('data.csv') as f:
    for line in csv.DictReader(f):
        # Do something

import msgpack

with open('data.msg') as f:
   for msg in msgpack.Unpacker(f):
        # Do something

with open('data.msg', 'w') as f:
    packer = msgpack.Packer()
    for item in something_else:
        f.write(packer.pack(item))

Objects like BZ2Compressor() are still useful so you can do stuff like compress data before sending across the network, but they seem antiquated when reading and writing files.

sgillies · 2015-08-15T23:18:39Z

Sold. I'm game to start with one class and add an open().

Draft.

ea323e0

geowurster added 2 commits August 18, 2015 20:36

Remove geojseq.open().

06cf91e

Properties and dict-like syntax for Feature() and Geometry().

b0b2564

This was referenced Aug 19, 2015

Feature .id, .geometry, .properties, .type properties #4

Open

What about properties of collections? #3

Open

geowurster mentioned this pull request Dec 3, 2015

Datasets API support mapbox/mapbox-sdk-py#80

Merged

geowurster mentioned this pull request Mar 1, 2016

compare to cligj #5

Open

geowurster closed this Jul 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rough draft implementation #1

Rough draft implementation #1

geowurster commented Aug 5, 2015

sgillies commented Aug 12, 2015

sgillies commented Aug 13, 2015

geowurster commented Aug 15, 2015

sgillies commented Aug 15, 2015

Rough draft implementation #1

Rough draft implementation #1

Conversation

geowurster commented Aug 5, 2015

sgillies commented Aug 12, 2015

sgillies commented Aug 13, 2015

geowurster commented Aug 15, 2015

sgillies commented Aug 15, 2015