Skip to content

Latest commit

 

History

History
364 lines (251 loc) · 33.3 KB

data-mapping.md

File metadata and controls

364 lines (251 loc) · 33.3 KB

Data mapping

Introduction

The Unified Model ("UM" hereafter) is highly normalized and it may seem overwhelming at first. That is understandable. Remember that the UM is meant to be a comprehensive representation that accommodates all use cases. It may not seem the simplest way to represent the data you'll be mapping, but that is because it has to cover other prespectives as well. As such, please also keep in mind that this exercise is to test the capacity of the UM to faithfully represent the data in collection management systems in aggregate, not to determine a least common denominator publishing model, such as is the case with Darwin Core archives.

Your task is to populate a postgresql database using the UM structure we have provided in the creation script schema.sql, using data from your database as a source. This will require "mapping" between your structure and that of the UM.

General considerations

In this document we will use figures to illustrate the structure of the UM. These figures take the form of Entity-Relationship (ER) diagrams. In these diagrams, concepts (implemented as tables for this exercise) are denoted by boxes with labels in UpperCamelCase. The properties (fields) for these concepts are listed within the box for the concept they are properties of, and are in lowerCamelCase. The figures do not necessarily show the full set of fields for the tables they represent, nor do they show data types and other constraints. At times we will show snippets of the schema (such as table definitions) for reference. The definitive version of the tables to populate is in schema.sql. The term names in the figures (e.g., eventType) and correspond to their equivalents in lower_snake_case in the database (e.g., event_type).

You will not be expected to parse the data in your database to make it fit into the UM, but you will be asked in some cases to provide explicit data in the UM that are only implicit in your data. It is likely that the source data won't have all tables needed and some will need to be "invented". For example, a source database may have collecting events and locations merged into a single table. This will require the table to be split to map correctly into Events, Locations, and Georeferences.

For example, you may have database records based on material in your collection, but no field that identifies the event during which that material was collected. In the UM those would be non-overlapping and required concepts, and each MUST be identified separately.

The use of record identifiers for concepts in the UM is ubiquitous, and required whenever you have data that correspond to a given concept. For this exercise, when creating tables in the UM, use resolvable global unique identifiers for the ID fields if you have them. If you don't, use non-resolvable global unique identifiers if you have them. If you don't, generate UUIDS as identifiers in place of the identifiers that are unique only within the scope of your database. In cases where your database does not have identifiers for records that can be inferred for the UM, generate UUIDs for these identifiers. For every identifier you have to create in the place of a local one in your database, you CAN also create an Identifier record that translates between your local identifier and the one you created for sharing via the UM. If you do this, set the identifierType to local. Here is the statement from schema.sql that creates the structure of the Identifier table.

CREATE TABLE identifier (
  identifier_target_id TEXT NOT NULL,
  identifier_target_type COMMON_TARGETS NOT NULL,
  identifier_type TEXT NOT NULL,
  identifier_value TEXT NOT NULL,
  PRIMARY KEY (identifier_target_id, identifier_target_type, identifier_type, identifier_value)
);

The Identifier and other "common model" tables are described in GBIF Common Models document and will be discussed in context as we proceed through the Suggested steps for data mapping.

Some AgentRoles are currently made explicit in the UM. Most of these are simply fields for the name of the Agent fulfilling the role (e.g., georeferencedBy), while others are fields for an Agent identifier (e.g., recordedByID. Following are lists of explicit AgentRole fields in the UM, separated by the concept they can be found in. Separate AgentRole records for these are not necessary.

Assertion: assertionByAgentName, assertionByAgentID
DigitalEntity: rightsHolder, creator, nameAccordingTo, taxonAuthority, recordedBy, recordedByID
Identification: typeDesignatedBy, identifiedByID
Location: locationAccordingTo, georeferencedBy
MaterialEntity: institutionCode, institutionID, collectionCode, collectionID, ownerCollectionCode, recordedBy, recordedByID, chronometricAgeDeterminedBy
Organism: identifiedBy
Taxon: nameAccordingToID

Most of the tables in the UM have fields that benefit from using controlled vocabularies. Some of these fields MUST use values from a specific controlled vocabulary. In the database creation script these can be found as ENUMs where the values are in UPPER_SNAKE_CASE. Following is a simple example for the strictly controlled vocabulary for the entity_type field in the entity table (no other values are valid):

CREATE TYPE ENTITY_TYPE AS ENUM (
  'DIGITAL_ENTITY',
  'MATERIAL_ENTITY'
);

CREATE TABLE entity (
  entity_id TEXT PRIMARY KEY,
  entity_type ENTITY_TYPE NOT NULL,
  dataset_id TEXT NOT NULL,
  entity_name TEXT,
  entity_remarks TEXT
);

Most "type" fields in the UM are not controlled by an ENUM. For these type fields, and any other fields for which a controlled vocabulary is suggested, any requirements that might exists will be given in the Suggested steps as they are encountered. Other than the requirements encountered, feel free to use values that make sense for your data.

Part of what this exercise will reveal is the diversity of data that are being stored in collection management systems. The aggregation of vocabulary values that are actually in use locally will be a very interesting outcome of this project that may help to inform future work on vocabularies of values that help us all translate to concepts we have in common with different labels.

The UM provides four special tables (AgentRole, Assertion, Citation, and Identifier) to supplement the core information of other tables (e.g., Organism AgentRole, Event Assertion, GeneticSequence Citation, Agent Identifier). The document GBIF Common Models describes how these concepts fit into the UM.

Each of the "common model" tables can be linked to the set of tables given in the COMMON_TARGETS enumeration, which is defined in schema.sql as shown below. How to use the COMMON_TARGETS enumeration for the various targetType fields of the "common model" tables will be explained in context throughout the Suggested steps.

CREATE TYPE COMMON_TARGETS AS ENUM (
  'ENTITY',
  'MATERIAL_ENTITY',
  'MATERIAL_GROUP',
  'ORGANISM',
  'DIGITAL_ENTITY',
  'GENETIC_SEQUENCE',
  'EVENT',
  'OCCURRENCE',
  'LOCATION',
  'GEOREFERENCE',
  'GEOLOGICAL_CONTEXT',
  'PROTOCOL',
  'AGENT',
  'COLLECTION',
  'ENTITY_RELATIONSHIP',
  'IDENTIFICATION',
  'TAXON',
  'REFERENCE',
  'AGENT_GROUP',
  'ASSERTION',
  'CHRONOMETRIC_AGE'
);

Suggested steps

Below is a list of the steps we suggest to follow to map your collection management system data to the UM. Each step has a link to a more detailed description of what to do. The order of these steps was designed to make sure that you will already have records for concepts that will be linked to in subsequent steps of the mapping process.

1. Agents

2. References

3. Assertions, Citations, and Identifiers for Agents

4. Protocols

5. MaterialEntities

6. AgentRoles, Assertions, Citations, Identifiers and ChronometricAges for MaterialEntities and their subtypes

7. DigitalEntities

8. AgentRoles, Assertions, Citations, and Identifiers for DigitalEntities

9. EntityRelationships

10. Locations, Georeferences, and GeologicalContexts

11. AgentRoles, Assertions, Citations, and Identifiers for Locations, Georeferences, and GeologicalContexts

12. Occurrences and other Events

13. AgentRoles, Assertions, Citations, and Identifiers for Occurrences and other Events

14. Taxa

15. AgentRoles, Assertions, Citations, and Identifiers for Taxa

16. Identifications

17. AgentRoles, Assertions, Citations, and Identifiers for Identifications

1. Agents

NOTE: Skip this step if your Agents are identified only by name (i.e., not with a separate agent identifier).

We recommend to map Agents ((e.g., people, groups of people, organizations, collections, see Figure 1) first, if you have them, because their identifiers will be used in the construction of many of the other tables in the UM. If you don't track agents separately in your database, don't worry about it, they can be designated by their names where appropriate in the UM.

agents

Figure 1. Agents and their relationships in the Unified Model

agentType vocabulary

If an Agent is a Collection or an AgentGroup, the agentType MUST be COLLECTION or AGENT_GROUP respectively. However, the agentType field is not controlled by an ENUM, because there are other possible values that are not subtypes of Agent, such as ORGANIZATION, PERSON, and even ORGANISM. If you need to use an agentType we haven't mentioned here, please create it in UPPER_SNAKE_CASE.

collectionType vocabulary

For this exercise, We suggest values such as MUSEUM, HERBARIUM, BOTANICAL_GARDEN, ZOO. If you need to use an collectionType we haven't mentioned here, please create it in UPPER_SNAKE_CASE.

agentGroupType vocabulary

An AgentGroup is a way to refer to a single Agent entity that is composed of multiple other Agents. Thus, a group of Collections might be a CONSORTIUM, a group of university students might be a CLASS. If you need to create an agentGroupType, please use UPPER_SNAKE_CASE.

agentRelationshipType vocabulary

The range of possible relationships between Agents is vast. Note that the relationship has directionality. The subjectAgentID is related to the objectAgentID in the direction expressed in the agentRelationshipType, thus it helps to express the directionality in the agentRelationshipType term itself, for example, DOCTORAL_ADVISOR_OF instead of DOCTORAL_ADVISOR, which would be ambiguous to interpret. If you need to create an agentRelationshipType, please use UPPER_SNAKE_CASE.

2. References

NOTE: Skip this step if do not have References in your data or if your References are identified only by bibliographic citations.

In the UM, a Reference, like an Agent, has the potential to be related to many different kinds of things (e.g., MaterialEntity, Event, Taxon) through Citations. If you track references with identifiers, create Reference records for them so that they can be connected in later steps when the other tables they are related to are created. If you don't track reference separately in your database, don't worry about it, they can be designated by their bibliographic citations where appropriate in the UM.

citations

Figure 2. Citations of References in the Unified Model

referenceType vocabulary

Here are some suggestions for values of referenceType, but feel free to use others if none of these suffices: JOURNAL_ARTICLE, BOOK, BOOK_SECTION, DISSERTATION, FIELD_NOTEBOOK, WEB_PAGE, OTHER.

3. Assertions, Citations, and Identifiers for Agents

NOTE: Skip this step if you created no Agent records in Step 1

It is possible to create Assertions, Citations, and Identifiers for Agents. See GBIF Common Models for general discussions about how to map to these three types of tables and considerations when developing the vocabularies for assertionType and assertionUnit.

assertionTargetType vocabulary

The value for this term MUST be one of AGENT, AGENT_GROUP, or COLLECTION and MUST match the table to which the Assertion applies.

4. Protocols

NOTE: Skip this step if your Protocols are identified only by simple strings (names or descriptions) or if you do not have Protocols mentioned in your data.

A Protocol can be used by the classes Event, ChronometricAge, and the various Assertions. If you track protocols with identifiers, create Protocol records for them so that they can be connected when the tables they are related to are created.

5. MaterialEntities

agents

Figure 3. Entities and their relationships in the Unified Model

A MaterialEntity can be any physical object (same as bco:material entity and dcterms:PhysicalResource). In the UM there can be many types of MaterialEntitys, which are distinguished by the value of materialEntityType. These can be as specific as desired, but there are two MaterialEntity subtype classes to distinguish two important concepts, MaterialGroup and Organism. For each MaterialEntity record you create, also create an Entity record using the same identifier for the entityID as for the materialEntityID. The entityType for the Entity MUST be MATERIAL_ENTITY.

A MaterialGroup is any set of MaterialEntitys and the utility of this concept is to be able to make Assertions about the group as a whole, distinct from Assertions about its individual members (e.g., the weight of an entire catch as opposed to the weights of selected individuals in the catch). A MaterialGroup record MUST have a corresponding MaterialEntity record, which in turn MUST have MATERIAL_GROUP as its materialEntityType. Potential vocabulary terms for materialGroupType are HAUL and LOT. Feel free to create others as needed.

An Organism (same as dwc:Organism) is modeled in the UM as a MaterialEntity, even if none of the material remains accessible (such as in the case of some observations, or the case of a specimen that was lost or destroyed). Even though an Organism might also act as an Agent, we do not currently model it in this way. In the most basic case, a cataloged item consists of the entire existing accessible material remains of a single Organism. These may be separated into "parts" in a database, which may or may not be tracked separately. When they are tracked separately, the Entity that unites them is the Organism. The derivation of the "parts" from the Organism (or from each other) are expressed through EntityRelationships. An Organism record MUST have a corresponding MaterialEntity record with its materialEntityID the same as the organismID. The materialEntityType of of the MaterialEntity record MUST be ORGANISM. The MaterialEntity record for the Organism must in turn have a corresponding Entity record with its entityID the same as the organismID an materialEntityID. The entityType of the Entity record MUST be MATERIAL_ENTITY.

6. AgentRoles, Assertions, Citations, Identifiers and ChronometricAges for MaterialEntities and their subtypes

Figure 4 shows the relationships between MaterialEntity and associated tables, including the "common model" tables. The relationships between MaterialEntity and other Entity tables was shown in Figure 3. Each of the Entity tables can be connected to the common model tables. The important thing is to make sure that the connections happen at the appropriate, most specific level in the hierarchy. For example, suppose a blood sample was taken from an Organism and its volume was measured. The blood sample is a MaterialEntity (NOT an Organism). There should be an EntityRelationship showing that the subject MaterialEntity had the relationship extractedFrom the object Organism. The blood sample volume should result in an Assertion for the MaterialEntity, not an Assertion for the corresponding parent Entity record, nor the related Organism record. Specifically, the assertionTargetID should be the same as the materialEntityID for the blood sample, the assertionTargetType MUST be MATERIAL_ENTITY, the assertionType should be VOLUME, the assertionValue should be left empty, the assertionValueNumeric should have the numerical value of the volume, and the assertionUnit should have an appropriate SI unit (e.g., ml). The same principles apply to relationships to the Citation, AgentRole and Identifier tables - they should be associated with the correct level of the Entity hierarchy.

A ChronometricAges MUST only be related directly to a MaterialEntity.

The following AgentRoles related to MaterialEntitys are currently made explicit in the UM, these roles do not require separate AgentRoles to be made: institutionCode, institutionID, collectionCode, collectionID, ownerCollectionCode, recordedBy, recordedByID, chronometricAgeDeterminedBy.

agents

Figure 4. MaterialEntities and their "common model" tables in the Unified Model

7. DigitalEntities

In the UM there can be many types of DigitalEntity. These are distinguished by the digitalEntityType field, which has a strictly controlled vocabulary consisting of the values in the following enumeration:

CREATE TYPE DIGITAL_ENTITY_TYPE AS ENUM (
  'DATASET',
  'INTERACTIVE_RESOURCE',
  'MOVING_IMAGE',
  'SERVICE',
  'SOFTWARE',
  'SOUND',
  'STILL_IMAGE',
  'TEXT',
  'GENETIC_SEQUENCE'
);

One of these, the GENETIC_SEQUENCE is a formal subtype of DigitalEntity (see Figure 3). This means that when a GENETIC_SEQUENCE record is created, a corresponding MaterialEntity record MUST also be created, and the digitalEntityType for it MUST be GENETIC_SEQUENCE. For each DigitalEntity, also create an Entity record using the same unique identifier for the entityID as for the digitalEntityID. The entityType for the Entity MUST be DIGITAL_ENTITY.

8. AgentRoles, Assertions, Citations, and Identifiers for DigitalEntities

The same kinds of "common model" associations shown in Figure 4 for MaterialEntitys can be made for DigitalEntitys, except that each targetID MUST be the same as the identifier (digitalEntityID or geneticSequenceID) for the DigitalEntity or GeneticSequence it is directly associated with. The values for the targetType fields MUST be DIGITAL_ENTITY or GENETIC_SEQUENCE, depending on the table they are to be directly related to.

The following AgentRoles related to MaterialEntitys are currently made explicit in the UM, these roles do not require separate AgentRoles to be made: rightsHolder, creator, nameAccordingTo, taxonAuthority, recordedBy, recordedByID.

9. EntityRelationships

At this stage in the process, all of the Entity records will have been created, providing the prerequisite for being able to create the relationships between them. The supertype/subtype relationships between Entity tables were shown above in Figure 3, and should already heve been created at this point. The "common model" associations will also already have been made. Here we will concentrate on other associations, ones that should be captured in the EntityRelationship table, the definition of which is:

CREATE TABLE entity_relationship (
  entity_relationship_id TEXT PRIMARY KEY,
  depends_on_entity_relationship_id TEXT REFERENCES entity_relationship ON DELETE CASCADE,
  subject_entity_id TEXT REFERENCES entity ON DELETE CASCADE,
  entity_relationship_type TEXT NOT NULL,
  object_entity_id TEXT REFERENCES entity ON DELETE CASCADE,
  object_entity_iri TEXT,
  entity_relationship_date TEXT,
  entity_relationship_order SMALLINT NOT NULL DEFAULT 0 CHECK (entity_relationship_order >= 0) 
);

The EntityRelationship table is a powerful way to make just about any connection between Entitys in the UM. Any Entity can be related to any other one with any relationship you care to create in the entityRelationshipType. There are two things to keep in mind here. The first is that the subtype relationships should be strictly relegated to the correspondence of the values of identifier fields (e.g., entityID and materialEntityID for a MaterialEntity) and should already have been done in previous steps. This would be the equivalent of an EntityRelationship stating that a particular Entity isA MaterialEntity, which would be superfluous. The second thing to keep in mind is that the semantics of the relationships is entirely dependent on the clear understanding of the predicate (the entityRelationshipType) and the correct assignment of Entitys to the subject and object roles. The relationships should always be read as subject->predicate->object - that is, the relationship has a direction. Each relationship can have a complementary one where the subject/object roles are reversed and the predicate shows what the relationship looks like from the opposite direction. For example, if Organism 'A' was eaten by another Organism 'B', it follows that Organism 'B' ate Organism 'A'. There are certainly cases in which reverse roles might be necessary. For example, if 'B' was a parasitoid of 'A', it isn't enough to understand this by saying 'A' was a host of 'B'. Of course, there are alternative ways to express the relationship in the predicate to solve this issue, such as 'A' was a parasitoid host of 'B'. We leave it to your discretion which relationships to capture from your original data, but be aware that the semantics are tied up entirely in the predicates, and care should be taken when developing these vocabulary terms.

10. Locations, Georeferences, and GeologicalContexts

Locations in the UM are used to provide both textual and geospatial context to Events. Location can be expressed as a denormalized (flattened) construct with the (Darwin Core part of a) geographic classification in the same record, or as a normalized construct with the geography built of parent/child relationships of successive administrative regions. Figure 5 shows the structural relationships between the Location-related tables in the UM.

Georeferences are special assertions of the geospatial interpretation of a Locations. As assertions, the model supports zero, one, or multiple georeferences per Location, whether current or historical, accepted or disputed. The UM also supports the designation of zero or one accepted Georeferences by populating acceptedGeoreferenceID in the Location table with the georeferenceID of the corresponding accepted Georeference, if any.

GeologicalContext is modeled similarly to a Georeference, but with an acceptedGeologicalContextID in the Location table that MUST match the geologicalContextID of the corresponding GeologicalContext, if any.

locations

Figure 5. Locations, Georeferences and GeologicalContexts in the Unified Model

11. AgentRoles, Assertions, Citations, and Identifiers for Locations, Georeferences, and GeologicalContexts

The "common model" tables associated with the three Location-related tables can be populated at this point. The values for the targetType fields of the "common model" tables MUST be LOCATION, GEOREFERENCE or GEOLOGICAL_CONTEXT, and their targetIDs MUST correspond to the locationID, georeferenceID or geologicalContextID, depending on the table they are to be directly related to.

12. Occurrences and other Events

An Event is something that happens within a place during a period of time. The spatial scale and temporal duration of the Event may be as specific or vague as necessary, and may or may not be provided. Events are hierarchical in the UM, with a parent Event containing all of its child Events both spatially and temporally. A project (or any other higher organizational initiative) might be a parent-most Event, the spatial and temporal limits of which encompass all of the Events within it. The next level down might consist of collecting expeditions launched as part of the parent project, for example. Each Event can likewise encompass sub-Events to an arbitrary hierarchical depth, each with the same or distinct Location and temporal bounds as its parent (under the limitation of being contained).

events

Figure 6. Events in the Unified Model

In the UM, an Occurrence is a subtype of Event in which the activity (observing, collecting, sampling) established the existence of an Organism within a spatiotemporal context, usually with accompanying evidence. The OccurrenceEvidence table serves to connect the Occurrence with digital and/or material evidence, such as images, material samples or whole organisms, and genetic sequences. In collections, an Organism is often effectively the Entity that gets cataloged, with an accompanying list of preparations that represent the "parts" of the Organism that are or were present in the collection. If you do not track "parts" separately with their own characteristics, the Organism record should be the one used for the OccurrenceEvidence. Note that the organismID is not an occurrenceID - the former is an identifer for an Organism (a MaterialEntity), while the latter is an identifier for the Occurrence (an Event), and MaterialEntitys are not Events. In the absence of unique (and distinct) identifers for Organisms and Occurrences, they will have to be generated to populate the UM correctly, as described in the General considerations section.

The Occurrence carries with it the ephemeral characteristics of the Organism at the place and time of the Event. Thus, for example, an Organism that had blood samples taken multiple times over its lifetime may have had a reproductiveCondition of juvenile in an early Occurrence and a reproductiveCondition of adult in later one.

Each Occurrence has its own occurrenceID. The Occurrences associated with a given Organism can be discovered by the organismID they have in common. Every Occurrence must have a corresponding Event record in which the eventID is the same is the occurrenceID and the eventType for the Event record MUST be OCCURRENCE.

Entities

Figure 7. Occurrences and their evidence in the Unified Model

13. AgentRoles, Assertions, Citations, and Identifiers for Occurrences and other Events

With Locations, Protocols, and Events now in place, the "common model" tables associated with the Event-related tables can be populated. The values for the targetType fields of the "common model" tables MUST be EVENT or OCCURRENCE and their targetIDs MUST correspond to the eventID or occurrenceID, depending on the table they are to be directly related to. Remember that Assertions about ephemeral characterics of the Organism should be attached to Occurrence record rather than to the Organism record.

14. Taxa

In the UM, a Taxon can be expressed as a denormalized (flattened) construct with the (Darwin Core part of a) taxonomic classification in the same record, or as a normalized construct with the classification built of parent/child relationships of taxa of successive ranks. Feel free to use the construct that best matches how your data are structured. The table definition for Taxon from schema.sql is:

CREATE TABLE taxon (
  -- common to all
  taxon_id TEXT PRIMARY KEY,
  scientific_name TEXT NOT NULL,
  scientific_name_authorship TEXT,
  name_according_to TEXT,
  taxon_rank TEXT,
  taxon_source TEXT, -- From what taxonomic authority is the information taken
  scientific_name_id TEXT,
  taxon_remarks TEXT,  
  
  -- normalized view
  parent_taxon_id TEXT REFERENCES taxon ON DELETE CASCADE,
  taxonomic_status TEXT,

  -- denormalized
  kingdom TEXT,
  phylum TEXT,
  class TEXT,
  "order" TEXT,
  family TEXT,
  subfamily TEXT,
  genus TEXT,
  subgenus TEXT,
  accepted_scientific_name TEXT -- populated only when scientific name is a synonym
);

15. AgentRoles, Assertions, Citations, and Identifiers for Taxa

The "common model" tables associated with Taxon can now be populated. The values for the targetType fields of the "common model" tables MUST be TAXON and their targetIDs MUST correspond to the taxonID of the Taxon they are to be directly related to.

16. Identifications

In the UM, an Identification applies to an Organism, though the IdentificationEvidence may consist of any number of MaterialEntitys and/or DigitalEntitys. An Organism can also have multiple Identifications, though only zero or one of these can be marked as 'accepted'. The Identification record itself consists of the verbatimIdentification string applied to the Organism and a taxonFormula from a controlled vocabulary that indicates the pattern of taxon names mixed with qualifiers in the verbatimIdentification. This allows for Identifications that are not strictly scientific names, but that can point to all of the real scientific names involved. For example, the hybrid verbatimIdentification Canis latrans x Canis lupus familiaris (see example below) isn't a scientificName, but its component parts Canis latrans and Canis lupus familiaris are. For reference, here is the statement to create the Identification table:

CREATE TABLE identification (
  identification_id TEXT PRIMARY KEY,
  identification_type TEXT NOT NULL,
  taxon_formula TEXT NOT NULL,
  verbatim_identification TEXT,
  type_status TEXT,
  identified_by TEXT,
  identified_by_id TEXT,
  date_identified TEXT,
  identification_references TEXT,
  identification_verification_status TEXT,
  identification_remarks TEXT,
  type_designation_type TEXT,
  type_designated_by TEXT
);

Entities

Figure 8. Identifications in the Unified Model

taxonFormula vocabulary

The recommended controlled vocabulary for taxonFormula can be found in the Arctos taxa_formula code table documentation, repeated here for convenience:

A
A / B intergrade
A ?
A aff.
A and B
A cf.
A or B
A ssp.
A x B
A {string}

Identification Example

For the hybrid verbatimIdentification Canis latrans x Canis lupus familiaris, the taxonFormula would be A x B. There are two taxon_ids involved, one for Canis latrans (the A in the taxonFormula) and one for Canis lupus familiaris (the B in the taxonFormula). We would expect to find Taxon records for these two taxa, and their taxonIDs would be used in two records of TaxonIdentification. The TaxonIdentification record corresponding to Canis latrans would include the identificationID for the Identification record that has verbatimIdentification Canis latrans x Canis lupus familiaris and taxonFormula A x B. That same TaxonIdentification record would have the taxonID for Canis latrans and the taxonOrder would be 1 (because it is the first taxon that appears in the formula). The TaxonIdentification record corresponding to Canis lupus familiaris would include the identificationID for the Identification record that has verbatimIdentification Canis latrans x Canis lupus familiaris and taxonFormula A x B. That same TaxonIdentification record would have the taxonID for Canis lupus familiaris and the taxonOrder would be 2 (because it is the second taxon that appears in the formula).

17. AgentRoles, Assertions, Citations, and Identifiers for Identifications

The final modeling step is to populate the "common model" tables associated with Identification. The values for the targetType fields of the "common model" tables MUST be IDENTIFICATION and their targetIDs MUST correspond to the identificationID of the Identification they are to be directly related to.