Skip to content

2.1. Software Configuration

Paulo Pinheiro edited this page Mar 15, 2019 · 45 revisions

Assuming that HADatAc is installed at "[HADatAc]", configuration files are located at [HADatAc]/conf. The way content in [HADatAc]/conf is used is different when using HADatAc in development mode and production mode.

In production, a software upgrade DOES NOT replace the content of hadatac/conf in the distribution because the upgrade should preserve current configuration. To change any configuration file, one needs to change the configuration file in both the current distribution and in "/data/conf" that should not be in "[HADatAc]/conf".

In development mode, you may just need to update a few files including hadatac..conf, and that that change can be done directly in the local copy of what is in the master branch. In case code is contributed back to the master branch, __please do not add and commit changes to [HADatAc]/conf.

2.1.1. Setting up hadatac.conf

This is the main configuration file and tells the system important information about how the webapp connects to SOLR and Blazegraph repositories, and what is going to be the URL of the webapp once it is deployed.

If you are a developer using a local copy of HADatAc in your machine, and you do not have any restriction in calling 'http://localhost:9000' to invoke HADatAc, you may not need to change this part of the configuration.

If you are deploy HADatAc on a server and you expect users to access HADAtAc over the web, you will need the parameters below accordingly to your domain name and to firewall/port restrictions.

HADatAc URL Configuration

  • the application's base host URL

    default value: host="http://localhost:9000"

  • the url that the application is deployed

    default value: host_deploy="http://localhost:9000"

  • the base url that the application uses to send email and receive email. This is the value prefixed to HADatAc services used to communicate with users during email authentication or password reset. Emails may be sent out if this value is not set properly, but the confirmation of any action embedded in email messages may not have any effect on HADatAc.

    default value: base_url="127.0.0.1:9000"

  • the kb's base host URL -- usually, the application's base host URL without any port information. This value is used to build absolute URLs from internal relative URLS, and it is used to inter-connect HADatAc functionalities including the URLs of internal proxies used by javascript code.

    default value: kb="http://localhost"

SOLR Configuration

  • HOME: the path in the file system where the SOLR instances are located

    default: home=/../hadatac/solr

  • URL for data collections

    default: data="http://127.0.0.1:8983/solr"

Blazegraph Configuration

  • URL used to retrieve content from a Blazegraph repository.

  • activity flags are used to verify if HADatAc knowledge base contains

    • concepts essential for supported scientific activities

    • use true for empirical activities involving the use of sensors

      empirical=true

    • use true for computational activities involving computational simulations

      computational=false

  • properties about community using current HADatAc installation

    • these properties are used to project customization of HADaAc installations

      default: fullname="Child Health Exposure Analysis Repository"

      default: shortname="CHEAR"

      default: description=""

      default: ont_prefix="chear"

2.1.2. Setting up email configuration

You may not need to set up email configuration if you are using HADatAc for development purpose. This configuration is essential if you are planning to create users with authenticated access to the system. In this case, the email configuration will enable users to verify their emails and to request password reset.

Authentication is done through email verification, which requires HADatAc to communicate with users through emails. The configuration file is smtp.conf under /conf/play_authenticate. Instructions for filling up the configuration file are described inside of the file itself. The email account to be used should be one created used for your system (not the gmail account shown in the example below).

play.mailer {
    # TODO: Disable this in production
    mock=false
    # SMTP server
    # (mandatory)
    # defaults to gmail
    host=smtp.gmail.com

    # SMTP port
    # defaults to 25
    port=465

    # Use SSL
    # for GMail, this should be set to true
    ssl=true

    # authentication user
    # Optional, comment this line if no auth
    # defaults to no auth
    user="hadatac1234@gmail.com"

    # authentication password
    # Optional, comment this line to leave password blank
    # defaults to no password
    password="password"
}

2.1.3. Setting up application.conf

This is the main configuration file of Play Framework behind HADatAc. From this configuration file, Play Framework can locate other configuration files. The session configuration parameters and the java configuration parameters are the parameters that may be changed.

  • Deadbolt (do not change)

    required value: "play-authenticate/deadbolt.conf"

  • SMTP (do not change)

    required value: "play-authenticate/smtp.conf"

  • Play authenticate (do not change)

    required value: "play-authenticate/mine.conf"

  • HADatAc (do not change)

    required value: "hadatac.conf"

  • Session conf: specifies how long a session can last without any activity from the user.

    session.maxAge=1h play.http.session.maxAge=12h

  • java config: specifies the amount of memory allocated to a HADatAc instance.

    jvm.memory=-Xmx2048M -Xms2048M

2.1.4. Setting up autoccsv.config

autoccsc.config specifies the folders where files are stored when they are initially uploaded into HADatAc, and where they are stored after HADatAc has processed them (i.e., after HADatAc has "ingested" the contents of these files). Data Files and metadata files that are uploaded into HADatAc are managed as part of the overall content of the app.

IMPORTANT: the folder paths identified in this configuration file MUST BE OUTSIDE of the HADatAc distribution folder so that existing files are not affected during HADatAc's software upgrade.

  • The path of processed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
    path_proc=processed_csv/

  • The path of unprocessed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
    path_unproc=unprocessed_csv/

  • identifies whether the autoannotator is on or off by default.
    auto=on

2.1.5. Setting up namespaces.properties

This configuration file has a list of all the namespaces used in HADatAc's knowledge base. This list is cached inside of HADatAc when it is running, and HADatAc needs to be restarted for any change to this configuration file to take effect.

This list has two roles:

  • to indicate which ontologies should be loaded into HADatAc's knowledge graph
  • to indicate which namespaces may be used as a prefix during the execution of any SPARQL query against the repository

Every loaded ontology should be included in the list of SPARQL prefixes, but the content inside the knowledge graph may refer to terms from namespaces that are not necessarily loaded into the knowledge graph.

Each namespace has four attributes:

  • definition (mandatory): It is of the form [abbreviation]=[reference_URI] like xsd=http://www.w3.org/2001/XMLSchema# where [abbreviation] is xsd and [reference_URI] is http://www.w3.org/2001/XMLSchema#
  • mime type (only for loaded ontologies): this attribute identifies the encoding format of the ontology to be loaded. If the ontology is encode in RDF/XML the value for this attribute should be application/rdf+xml. If the ontology is encode in turtle tha value for this attribute should be text/turtle.
  • loading URL (only for loaded ontologies): this is an URL that should be resolvable on the web (it is suggested that the URL is tested in a browser to verify if the URL is resolvable).

2.1.6. Setting up labkey.config

HADatAc uses LabKey to store some content of its knowledge graph. Content stored in LabKey is used to debug some contents of the knowledge graph. This configuration file uses the site parameter to identify the main URL of the LabKey to be used for HADatAc. The folder parameter is used to identify the project inside of LabKey where HADatAc's knowledge graph may be stored.

  • Configure the labkey server url

    site=[URL OF YOUR LABKEY APP] folder=[NAME OF YOUR LABKEY PROJECT]

  • Configure the key for encryption on LabKey authority

    encryption_key=yourkey

2.1.7. Setting up template.conf

This file is used to define the name of the column headers of HADatAc's metadata files templates. Each one of these template files is composed off a fixed, ordered list of properties, and the column header of each one of these templates must be a column header defined in the vocabulary of this configuration file.

2.1.8. Creating Master User

Each HADatAc installation has one Master user. This user is created once, and the password used for the master user needs to be carefully noted.

Why the Master User is Unique
The Master user is unique because of the following:

  • every functional HADatAc installation has one Master User only;
  • it is expected to be created once (it is possible to be recreated, but it is not desirable to be recreated);
  • it is a user who is assumed to be authenticated without the need of email verification (and thus, without the need of setting up an emailer to work with HADatAc);
  • it is already created with ADMIN permission; In summary, the Master User is everything needed for ONE USER to start using HADatAc functionalities in HADatAc.

It is assumed that a development installation of HADatAc does not have any other user other than the master user.

How to Create a Master User
The user is created by selecting the "Sign Up" option in HADatAc's main page (on the very top of the page at the black navigation bar). The process is the same as the sign up of any other user. Once the Master user is created, the account is already ready to be used. If a Master User is created, there is no way to create a new Master User, and any use of the Sign Up button will only be allowed to pre-registered users (see Section 6.2).

How to Reset the Password for the Master User
This option consists of erasing the following:

  • In Blazegraph: the content of the store_users. One easy way of erasing the content from store_users' namespace in Blazegraph is of erasing the namespace itself and creating in again.
  • In Solr, the content of the following tables under solr/solr-home: users, linked_account, and token_action. Each table in Solr is composed of two folders and one file: conf, data and core.properties. The content of the table is inside the data folder. In this case, it is possible to erase the data folder that is going to be recreated the next time you start Solr. Another option is to replace the entire solr/solr_home folder with its content from github.

Data Owner Guide

  1. Installation
    1.1. Installing for Linux (Production)
    1.2. Installing for Linux (Development)
    1.3. Installing for MacOS (Development)
    1.4. Deploying with Docker (Production)
    1.5. Deploying with Docker (Development)
    1.6. Installing for Vagrant under Windows
    1.7. Upgrading
    1.8. Starting HADatAc
    1.9. Stopping HADatAc
  2. Setting Up
    2.1. Software Configuration
    2.2. Knowledge Graph Bootstrap
    2.2.1. Knowledge Graph
    2.2.2. Bootstrap without Labkey
    2.2.3. Bootstrap with Labkey
    2.3. Config Verification
  3. Using HADatAc
    3.1. Initial Page
    3.1.1. Home Button
    3.1.2. Sandbox Mode Button
    3.2. File Ingestion
    3.2.1. Ingesting Study Content
    3.2.2. Manual Submission of Files
    3.2.3. Automatic Submission of Files
    3.2.4. Data File Operations
    3.3. Manage Working Files 3.3.1. [Create Empty Semantic File from Template]
    3.3.2. SDD Editor
    3.3.3. DD Editor
    3.4. Manage Metadata
    3.4.1. Manage Instrument Infrastructure
    3.4.2. Manage Deployments 3.4.3. Manage Studies
    3.4.4. [Manage Object Collections]
    3.4.5. Manage Streams
    3.4.6. Manage Semantic Data Dictionaries
    3.4.7. Manage Indicators
    3.5. Data Search
    3.5.1. Data Faceted Search
    3.5.2. Data Spatial Search
    3.6. Metadata Browser and Search
    3.7. Knowledge Graph Browser
    3.8. API
    3.9. Data Download
  4. Software Architecture
    4.1. Software Components
    4.2. The Human-Aware Science Ontology (HAScO)
  5. Metadata Files
    5.1. Deployment Specification (DPL)
    5.2. Study Specification (STD)
    5.3. Semantic Study Design (SSD)
    5.4. Semantic Data Dictionary (SDD)
    5.5. Stream Specification (STR)
  6. Content Evolution
    6.1. Namespace List Update
    6.2. Ontology Update
    6.3. [DPL Update]
    6.4. [SSD Update]
    6.5. SDD Update
  7. Data Governance
    7.1. Access Network
    7.2. User Status, Categories and Access Permissions
    7.3. Data and Metadata Privacy
  8. HADatAc-Supported Projects
  9. Derived Products and Technologies
  10. Glossary
Clone this wiki locally