Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicate original source of data (and via what aggregator) #40

Open
nvkelso opened this issue Jan 7, 2017 · 13 comments
Open

Indicate original source of data (and via what aggregator) #40

nvkelso opened this issue Jan 7, 2017 · 13 comments
Assignees

Comments

@nvkelso
Copy link
Member

nvkelso commented Jan 7, 2017

Right now we have data from Quattroshapes which is actually originates from multiple difference sources. Each source needs to be credited, so we need a consistent WOF property to deal with this.

I propose a new property like src:via (was src_via originally) where the src should state the original source, and then we should credit the data aggregator in src:via as well.

Examples:

  • Quattroshapes:
    • The city of San Francisco has a "qs:source" value of "AUS Census" (should just be US Census, oops) and "src:geom" of quattroshapes.
    • Propose that the "src:geom" should be uscensus instead, with "src_via" set to quattroshapes
  • Mesoshapes:
    • The county feature of Samba has a "meso:source" value of "EDP", though no EDP.json file is currently in the sources folder.
    • Propose that the "src:geom" should be eep instead, with "src_via" set to meso
@nvkelso
Copy link
Member Author

nvkelso commented Jan 7, 2017

Related: #39.

@thisisaaronland
Copy link
Member

I would only change this to be src:via or and equivalent prefix + ":" + key pair, to be consistent with everything else.

@nvkelso
Copy link
Member Author

nvkelso commented Jan 7, 2017

Works for me :)

@stepps00 stepps00 added bug and removed bug labels May 18, 2018
@nvkelso
Copy link
Member Author

nvkelso commented May 18, 2018

Seems like most the above applies to the whosonfirst-data repo.

To give credit to our src:via sources we'll also need to elevate some of the buried remarks (like for Quattroshapes) so they are listed directly in the big sources README so there is one page with all the sources on it for consumers of Who's On First data to link to in their apps for proper and good credit where credit is due.

All need to print out in a section under https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md#quattroshapes

After license bullet point, a new paragraph with:

This source includes data from the following organizations:

With bullet points listed below, alphabetically eg:

  • Europe-wide: European Environment Agency (EEA) urban morphological zones 2006
  • France: Institut Géographique National
  • Netherlands: Kadaster
  • Spain: Instituto Geográfico Nacional
  • Switzerland: swisstopo
  • United Kingdom: Contains Ordnance Survey data © Crown copyright and database right [2012]

And that list needs to be from a new JSON list in the quattroshapes.json source.

Ideally it could contain HTML text with hyperlinks (?) since I think we had problems with Markdown before.

@nvkelso
Copy link
Member Author

nvkelso commented May 23, 2018

The textual description part of this here in the sources repo is done.

Leaving this issue open as there is related work to followup about.

@nvkelso
Copy link
Member Author

nvkelso commented Jun 22, 2018

For this county in Tanzania:

Let's pretend it has the following properties:

  • "src:geom" = "meso"
  • "src:geom_alt" = ["naturalearth","quattroshapes"]
  • "meso:source" = "TNBS"
  • "qs:source" = "statscan"

We want to track generically the sources sources in predictable machine readable way, and in a way that doesn't need constant shuffling around as default and alt geoms are shuffled around, and without adding more sources JSONs, and making use of the existing "src:via" properties in the sources JSON we added recently. In this case Mesoshapes includes data from "TNBS" and let's pretend like quattroashapes includes data from "statscan".

NOTE: This new property would only be added in cases of WOF records where multiple sources exist for a source (eg Mesoshapes, Quattroshapes, and other *shapes sources), then all sources would be listed out in the extended format. Else no change if not multiple source sources.

We propose to add a new "src_via" prefix that accepts the same property names as src, but stores as list of lists (versus string for geom and list for geom_alt) because any one source can actually be composed of multiple sources:

  • "src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]
    • which links to a new source_code entry in the meso and tracks both default geoms and alt geoms in one big list.

Another example:

  • "src_via:population" = [["statoids:othercensus"],["uscensus"]]

Then in the sources repo (this repo), modify the meso.json:

From:

"src:via" : {
			"context": "Tanzania",
			"source_link": "",
			"source_name": "Tanzania National Bureau of Statistics (TNBS)",
			"source_note": ""
		},

Add: "source_code": "tza_tnbs"

"src:via" : {
			"context": "Tanzania",
			"source_link": "",
			"source_name": "Tanzania National Bureau of Statistics (TNBS)",
			"source_code": "tza_tnbs",
			"source_note": ""
		},

@nvkelso
Copy link
Member Author

nvkelso commented Jun 22, 2018

Does this need to be a different structure?

"src_via:geom"={  
   "meso":[  
      "tza_tnbs"
   ],
   "naturalearth":[  
      "naturalearth"
   ],
   "quattroshapes":[  
      "statscan"
   ]
}

And should we riff on "src:via" ala "src_via" instead of "src_src"? (updated to src_via).

@stepps00
Copy link
Member

@nvkelso - the example in #40 (comment) makes more sense.

@nvkelso
Copy link
Member Author

nvkelso commented Jun 22, 2018

Flagging @thisisaaronland for comments. We'd like to make this change next week.

@thisisaaronland
Copy link
Member

With regards to the source_code key I would change it to source_prefix since that's what it is.

Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

"src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]

Like why wouldn't it just be:

"src_via:geom" = ["meso:tza_tnbs","naturalearth","quattroshapes:statscan"]

@nvkelso
Copy link
Member Author

nvkelso commented Jun 26, 2018

With regards to the source_code key I would change it to source_prefix since that's what it is.

👍

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

That's because some sources include multiple sources so they need to be lists of lists.

@thisisaaronland
Copy link
Member

Okay.

@nvkelso
Copy link
Member Author

nvkelso commented Jun 27, 2018

Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".

@stepps00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants