2.1. Software Configuration

Most of the software configuration described in this section is done by changing content inside of files of the HADatAc distribution. If the installation is in a local computer, the HADatAc distribution folder should be in the filesystem of the local computer. If the installation is in a remote computer, the HADatAc distribution folder should be in the filesystem of the local computer, that may require the use of some utility like SSH to gain access to the distribution folder. The "HADatAc distribution folder" is what is called "[HADatAc_folder]" in this documentation. The actual value of [HADatAc_folder] is a path in a filesystem of the machine hosting HADatAc, and it may look something like /home/data/git/hadatac/conf (this location in set during HADatAc's installation).

Assuming that HADatAc is installed at [HADatAc_folder], configuration files are located at [HADatAc_folder]/conf. The way content in [HADatAc_folder]/conf is used is different when using HADatAc in development mode and production mode.

In production, a software upgrade DOES NOT replace the content of [HADatAc_folder]/conf in the distribution because the upgrade should preserve current configuration. To change any configuration file, one needs to change the configuration file in both the current distribution and in "/data/conf" that should not be in "[HADatAc_folder]/conf".

In development mode, you may just need to update a few files, including hadatac.conf, and that change can be done directly in the local copy of the master branch. In case code is contributed back to the master branch, please do not add and commit changes to [HADatAc_folder]/conf.

2.1.1. Setting up hadatac.conf

This is the main configuration file and tells the system important information about how the webapp connects to SOLR and Blazegraph repositories, and what is going to be the URL of the webapp once it is deployed.

If you are a developer, using a local copy of HADatAc on your machine, and are able to call 'http://localhost:9000' to invoke HADatAc, you may not need to change this part of the configuration.

If you are deploying HADatAc on a server and you expect users to access HADAtAc over the web, you will need the parameters below according to your domain name and firewall/port restrictions.

HADatAc URL Configuration

the application's base host URL

default value: host="http://localhost:9000"
the url that the application is deployed on

default value: host_deploy="http://localhost:9000"
the base url that the application uses to send and receive email. This is the value prefixed to HADatAc services used to communicate with users during email authentication or password reset. Emails may be sent out if this value is not set properly, but the confirmation of any action embedded in email messages may not have any effect on HADatAc.

default value: base_url="127.0.0.1:9000"
the kb's base host URL -- usually, the application's base host URL without any port information. This value is used to build absolute URLs from internal relative URLS, and it is used to inter-connect HADatAc functionalities including the URLs of internal proxies used by javascript code.

default value: kb="http://localhost"

SOLR Configuration

HOME: the path in the file system where the SOLR instances are located

default: home=/../hadatac/solr
URL for data collections

default: data="http://127.0.0.1:8983/solr"

Blazegraph Configuration

URL used to retrieve content from a Blazegraph repository.
- For blazegraph in the local
  
  default: triplestore="http://127.0.0.1:9999/blazegraph/namespace/store"
- For blazegraph in the vm
  
  default: triplestore="http://127.0.0.1:8080/bigdata/namespace"
- URL for user management collection
  
  default: users="http://127.0.0.1:8983/solr"
- URL for user permission management collection
  - For blazegraph in the local
  default: permissions="http://127.0.0.1:9999/blazegraph/namespace/store_users"
  - For blazegraph in the vm default: permissions="http://127.0.0.1:8080/bigdata/namespace"
activity flags are used to verify if HADatAc knowledge base contains
- concepts essential for supported scientific activities
- use true for empirical activities involving the use of sensors
  
  empirical=true
- use true for computational activities involving computational simulations
  
  computational=false
properties about community using current HADatAc installation
- these properties are used to project customization of HADaAc installations
  
  default: fullname="Child Health Exposure Analysis Repository"
  
  default: shortname="CHEAR"
  
  default: description=""
  
  default: ont_prefix="chear"

Setting up BioPortal Searching

The SDD-editor can perform ontology class searches from within the browser using the BioPortal search engine if a user adds their api-key to the search.config file within hadatac. This section will cover acquiring an api-key and adding it to hadatac.

Login or Signup for a BioPortal account: https://bioportal.bioontology.org/login

Signing up is easy, simply provide your: name, desired account name, email, and desired password.

Copy the api-key from your BioPortal account settings: https://bioportal.bioontology.org/account

After signing in simple go here and under your account information, you will be your API Key which is a 36 character string.

Add your BioPortal api-key to Hadatac

Open (hadatac install directory)/conf/hadatac.config in your favorite text editor, paste your api-key in the quotes of the line that says bioportal_api_key="", and save. After the addition the line should look like:

bioportal_api_key="########-####-####-####-############"

2.1.2. Setting up email configuration

You may not need to set up email configuration if you are using HADatAc for development purpose. This configuration is essential if you are planning to create users with authenticated access to the system. In this case, the email configuration will enable users to verify their emails and to request password reset.

Authentication is done through email verification, which requires HADatAc to communicate with users through emails. The configuration file is smtp.conf under [HADatAc_folder]/conf/play_authenticate. Instructions for filling up the configuration file are described inside of the file itself. The email account to be used should be one created used for your system (not the gmail account shown in the example below). The account can be anyone that HADatAc can communicate with it. The name of the account and password of the account in the sniped below are JUST EXAMPLES THAT NEED TO BE REPLACED.

play.mailer {
    # TODO: Disable this in production
    mock=false
    # SMTP server
    # (mandatory)
    # defaults to gmail
    host=smtp.gmail.com

    # SMTP port
    # defaults to 25
    port=465

    # Use SSL
    # for GMail, this should be set to true
    ssl=true

    # authentication user
    # Optional, comment this line if no auth
    # defaults to no auth
    user="hadatac1234@gmail.com"

    # authentication password
    # Optional, comment this line to leave password blank
    # defaults to no password
    password="password"
}

2.1.3. Setting up application.conf

This is the main configuration file of Play Framework behind HADatAc. From this configuration file, Play Framework can locate other configuration files. The session configuration parameters and the java configuration parameters are the parameters that may be changed.

Deadbolt (do not change)

required value: "play-authenticate/deadbolt.conf"
SMTP (do not change)

required value: "play-authenticate/smtp.conf"
Play authenticate (do not change)

required value: "play-authenticate/mine.conf"
HADatAc (do not change)

required value: "hadatac.conf"
Session conf: specifies how long a session can last without any activity from the user.

session.maxAge=1h play.http.session.maxAge=12h
java config: specifies the amount of memory allocated to a HADatAc instance.

jvm.memory=-Xmx2048M -Xms2048M

2.1.4. Setting up autoccsv.config

autoccsc.config specifies the folders where files are stored when they are initially uploaded into HADatAc, and where they are stored after HADatAc has processed them (i.e., after HADatAc has "ingested" the contents of these files). Data Files and metadata files that are uploaded into HADatAc are managed as part of the overall content of the app.

IMPORTANT: the folder paths identified in this configuration file MUST BE OUTSIDE of the HADatAc distribution folder so that existing files are not affected during HADatAc's software upgrade.

The path of processed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
path_proc=processed_csv/
The path of unprocessed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
path_unproc=unprocessed_csv/
identifies whether the autoannotator is on or off by default.
auto=on

2.1.5. Setting up namespaces.properties

This configuration file has a list of all the namespaces used in HADatAc's knowledge base. This list is cached inside of HADatAc when it is running, and HADatAc needs to be restarted for any change to this configuration file to take effect.

This list has two roles:

to indicate which ontologies should be loaded into HADatAc's knowledge graph
to indicate which namespaces may be used as a prefix during the execution of any SPARQL query against the repository

Every loaded ontology should be included in the list of SPARQL prefixes, but the content inside the knowledge graph may refer to terms from namespaces that are not necessarily loaded into the knowledge graph.

Each namespace has four attributes:

definition (mandatory): It is of the form [abbreviation]=[reference_URI] like xsd=http://www.w3.org/2001/XMLSchema# where [abbreviation] is xsd and [reference_URI] is http://www.w3.org/2001/XMLSchema#
mime type (only for loaded ontologies): this attribute identifies the encoding format of the ontology to be loaded. If the ontology is encode in RDF/XML the value for this attribute should be application/rdf+xml. If the ontology is encode in turtle tha value for this attribute should be text/turtle.
loading URL (only for loaded ontologies): this is an URL that should be resolvable on the web (it is suggested that the URL is tested in a browser to verify if the URL is resolvable).

2.1.6. Setting up labkey.config

HADatAc uses LabKey to store some content of its knowledge graph. Content stored in LabKey is used to debug some contents of the knowledge graph. This configuration file uses the site parameter to identify the main URL of the LabKey to be used for HADatAc. The folder parameter is used to identify the project inside of LabKey where HADatAc's knowledge graph may be stored.

Configure the labkey server url

site=[URL OF YOUR LABKEY APP] folder=[NAME OF YOUR LABKEY PROJECT]
Configure the key for encryption on LabKey authority

encryption_key=yourkey

2.1.7. Setting up template.conf

This file is used to define the name of the column headers of HADatAc's metadata files templates. Each one of these template files is composed off a fixed, ordered list of properties, and the column header of each one of these templates must be a column header defined in the vocabulary of this configuration file.

2.1.8. Creating Master User

Each HADatAc installation has one Master user. This user is created once, and the password used for the master user needs to be carefully noted.

Why the Master User is Unique
The Master user is unique because of the following:

every functional HADatAc installation has one Master User only;
it is expected to be created once (it is possible to be recreated, but it is not desirable to be recreated);
it is a user who is assumed to be authenticated without the need of email verification (and thus, without the need of setting up an emailer to work with HADatAc);
it is already created with ADMIN permission; In summary, the Master User is everything needed for ONE USER to start using HADatAc functionalities in HADatAc.

It is assumed that a development installation of HADatAc does not have any other user other than the master user.

How to Create a Master User
The user is created by selecting the "Sign Up" option in HADatAc's main page (on the very top of the page at the black navigation bar). The process is the same as the sign up of any other user. Once the Master user is created, the account is already ready to be used. If a Master User is created, there is no way to create a new Master User, and any use of the Sign Up button will only be allowed to pre-registered users (see Section 6.2).

How to Reset the Password for the Master User
This option consists of erasing the following:

In Blazegraph: the content of the store_users. One easy way of erasing the content from store_users' namespace in Blazegraph is of erasing the namespace itself and creating in again.
In Solr, the content of the following tables under solr/solr-home: users, linked_account, and token_action. Each table in Solr is composed of two folders and one file: conf, data and core.properties. The content of the table is inside the data folder. In this case, it is possible to erase the data folder that is going to be recreated the next time you start Solr. Another option is to replace the entire solr/solr_home folder with its content from github.

2.1.9. Setting up Google Sign-In for Gmail Users

HADatAc currently supports authenticating users using their google accounts with Google Sign-In. To enable this feature, add the following snippet to the play-authenticate/mine.conf:

google {
    redirectUri {
        # Whether the redirect URI scheme should be HTTP or HTTPS (HTTP by default)
        secure=false

        # You can use this setting to override the automatic detection
        # of the host used for the redirect URI (helpful if your service is running behind a CDN for example)
        # host=yourdomain.com
    }

    # Google credentials
    # These are mandatory for using OAuth and need to be provided by you,
    # if you want to use Google as an authentication provider.
    # Get them here: https://code.google.com/apis/console
    clientId="974495099172-adsr39dq03fq875ti67p8tupfokr5jqe.apps.googleusercontent.com"
    clientSecret="xxxxx-xxxxxxxxxxxxxxxxxx"
}

and change the clientId and clientSecret accordingly. Please refer to https://code.google.com/apis/console for further information. Specifically, add http://127.0.0.1:9000/hadatac/authenticate/google for the Authorized redirect URIs.

Data Owner Guide

Installation
1.1. Installing for Linux (Production)
1.2. Installing for Linux (Development)
1.3. Installing for MacOS (Development)
1.4. Deploying with Docker (Production)
1.5. Deploying with Docker (Development)
1.6. Installing for Vagrant under Windows
1.7. Upgrading
1.8. Starting HADatAc
1.9. Stopping HADatAc
Setting Up
2.1. Software Configuration
2.2. Knowledge Graph Bootstrap
2.2.1. Knowledge Graph
2.2.2. Bootstrap without Labkey
2.2.3. Bootstrap with Labkey
2.3. Config Verification
Using HADatAc
3.1. Initial Page
3.1.1. Home Button
3.1.2. Sandbox Mode Button
3.2. File Ingestion
3.2.1. Ingesting Study Content
3.2.2. Manual Submission of Files
3.2.3. Automatic Submission of Files
3.2.4. Data File Operations
3.3. Manage Working Files 3.3.1. [Create Empty Semantic File from Template]
3.3.2. SDD Editor
3.3.3. DD Editor
3.4. Manage Metadata
3.4.1. Manage Instrument Infrastructure
3.4.2. Manage Deployments 3.4.3. Manage Studies
3.4.4. [Manage Object Collections]
3.4.5. Manage Streams
3.4.6. Manage Semantic Data Dictionaries
3.4.7. Manage Indicators
3.5. Data Search
3.5.1. Data Faceted Search
3.5.2. Data Spatial Search
3.6. Metadata Browser and Search
3.7. Knowledge Graph Browser
3.8. API
3.9. Data Download
Software Architecture
4.1. Software Components
4.2. The Human-Aware Science Ontology (HAScO)
Metadata Files
5.1. Deployment Specification (DPL)
5.2. Study Specification (STD)
5.3. Semantic Study Design (SSD)
5.4. Semantic Data Dictionary (SDD)
5.5. Stream Specification (STR)
Content Evolution
6.1. Namespace List Update
6.2. Ontology Update
6.3. [DPL Update]
6.4. [SSD Update]
6.5. SDD Update
Data Governance
7.1. Access Network
7.2. User Status, Categories and Access Permissions
7.3. Data and Metadata Privacy
HADatAc-Supported Projects
Derived Products and Technologies
Glossary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly