tableschema-pandas-py

Generate and load Pandas data frames Table Schema descriptors.

Features

implements tableschema.Storage interface

Getting Started

Installation

The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specify package version range in your setup/requirements file e.g. package>=1.0,<2.0.

$ pip install tableschema-pandas

Documentation

# pip install datapackage tableschema-pandas
from datapackage import Package

# Save to Pandas

package = Package('http://data.okfn.org/data/core/country-list/datapackage.json')
storage = package.save(storage='pandas')

print(type(storage['data']))
#  <class 'pandas.core.frame.DataFrame'>

print(storage['data'].head())
#               Name   Code
#  0     Afghanistan   AF
#  1   Åland Islands   AX
#  2         Albania   AL
#  3         Algeria   DZ
#  4  American Samoa   AS

# Load from Pandas

package = Package(storage=storage)
print(package.descriptor)
print(package.resources[0].read())

Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create method:

>>> from tableschema_pandas import Storage

>>> storage = Storage()

>>> storage.create('data', {
...     'primaryKey': 'id',
...     'fields': [
...         {'name': 'id', 'type': 'integer'},
...         {'name': 'comment', 'type': 'string'},
...     ]
... })

>>> storage.buckets
['data']

>>> storage['data'].shape
(0, 0)

Use storage.write to populate data frame with data:

>>> storage.write('data', [(1, 'a'), (2, 'b')])

>>> storage['data']
id comment
1        a
2        b

Also you can use tabulator to populate data frame from external data file. As you see, subsequent writes simply appends new data on top of existing ones:

>>> import tabulator

>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
...     storage.write('data', stream)

>>> storage['data']
id comment
1        a
2        b
1     good

API Reference

`Storage`

Storage(self, dataframes=None)

Pandas storage

Package implements Tabular Storage interface (see full documentation on the link):

Only additional API is documented

Arguments

dataframes (object[]): list of storage dataframes

Contributing

The project follows the Open Knowledge International coding standards.

Recommended way to get started is to create and activate a project virtual environment. To install package and development dependencies into active environment:

$ make install

To run tests with linting and coverage:

$ make test

Changelog

Here described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted commit history.

v1.1

Added support for composite primary keys (loading to pandas)

v1.0

Initial driver implementation

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
data		data
examples		examples
tableschema_pandas		tableschema_pandas
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LEAD.md		LEAD.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pylama.ini		pylama.ini
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tableschema-pandas-py

Features

Contents

Getting Started

Installation

Documentation

API Reference

`Storage`

Contributing

Changelog

v1.1

v1.0

About

Releases 10

Packages

Contributors 6

Languages

License

frictionlessdata/tableschema-pandas-py

Folders and files

Latest commit

History

Repository files navigation

tableschema-pandas-py

Features

Contents

Getting Started

Installation

Documentation

API Reference

Storage

Contributing

Changelog

v1.1

v1.0

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 6

Languages

`Storage`

Packages