This directory stores the custom generic tests that we use to define our test suite.
test_accepted_range
test_accepted_values
test_column_is_subset_of_external_column
test_column_length
test_columns_match
test_count_is_consistent
test_expression_is_false
test_expression_is_true
test_is_null
test_no_extra_whitespace
test_not_accepted_values
test_not_null
test_relationships
test_res_class_matches_pardat
test_row_count
test_row_values_match_after_join
test_sequential_values
test_unique_combination_of_columns
test_value_is_present
Asserts that a column's values fall inside an expected range. Any combination of min_value
and max_value
is allowed, and the range can be inclusive or exclusive.
Parameters:
min_value
(optional number): Lower bound for the range. Defaults to no lower bound.max_value
(optional number): Upper bound for the range. Defaults to no upper bound.inclusive
(optional boolean): Whether the range is inclusive. Defaults to true.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that a column's values are all present in a canonical list of values
. The
opposite of test_not_accepted_values
.
Parameters:
values
(required list of any value): Canonical list of allowed values.quote
(optional boolean): Whether to single-quote all elements ofvalues
, i.e. whether to convert them to strings. Defaults to true.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that a column is a subset of a column in an external relation. Rows that have no match in the external relation's column will be flagged as failures.
Parameters:
external_model
(required string): The external relation to use for comparison. Use aref()
orsource()
call to specify this relation so that the DAG can understand the relationship.external_column
(required string): The name of the column onexternal_model
to use for comparison.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that all columns in a list of column_names
have the correct length. Returns columns with
the naming pattern len_<column_name>
representing the length of each column for all rows where
one of the columns has an incorrect length, e.g. if column_names = ["foo", "bar"]
the test will
return two additional columns named len_foo
and len_bar
.
Since this test operates on a list of column_names
instead of a scalar column_name
, it must
be defined on the table level rather than on the column level.
Parameters:
column_names
(required list of strings): The list of columns to check for proper length.length
(required integer): The length that the column values should be.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on and the autogeneratedlen_<column_name>
columns will always be selected regardless of this value.
Asserts that two or more columns in the same relation have the same value for each row.
Parameters:
matching_column_names
(required list of strings): The list of columns to check for identical values.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on and the columns defined inmatching_column_names
will always be selected regardless of this value.
Asserts that the count of a given column is the same when grouped by another column, for
example that the number of distinct township codes is the same across years. Returns
the grouping column and a column called count
with the count of rows for that group.
Parameters:
group_column
(required string): The column to use for grouping.
Asserts that a valid SQL expression is false for all rows. In other words, filters for
rows where a given expression
is true. Often useful for idiosyncratic comparisons
across columns that are not easily generalized into generic tests.
Parameters:
expression
(required string): A valid SQL expression to apply to the column or table.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that a valid SQL expression is true for all rows. In other words, filters for
rows where a given expression
is false. Often useful for idiosyncratic comparisons
across columns that are not easily generalized into generic tests.
Parameters:
expression
(required string): A valid SQL expression to apply to the column or table.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that a column contains only null values. The opposite of test_not_null
.
Parameters:
additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that one or more string columns do not contain extraneous whitespace. Returns all
of the columns that are configured in column_names
.
Since this test operates on a list of column_names
instead of a scalar column_name
, it must
be defined on the table level rather than on the column level.
Parameters:
column_names
(required list of strings): The list of columns to check for extra whitespace.allow_interior_space
(optional boolean): If true, will only check for leading and trailing whitespace, and otherwise will also check for multiple consecutive spaces in the interior of the string. Defaults to false.
Asserts that there are no rows that match the given values. The opposite of
test_accepted_values
.
Parameters:
values
(required list of any value): Canonical list of disallowed values.quote
(optional boolean): Whether to single-quote all elements ofvalues
, i.e. whether to convert them to strings. Defaults to true.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that there are no null values present in a column. The opposite of
test_is_null
.
Parameters:
additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
Asserts that all of the records in a child table have a corresponding record in a parent table. This property is referred to as "referential integrity".
to
(required string): The external relation to use for comparison. Use aref()
orsource()
call to specify this relation so that the DAG can understand the relationship.field
(required string): The name of the column onto
to use for comparison.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on will always be selected regardless of this value.
For all residential parcels in a given model, test that there is at least one
class code that matches a class code for that parcel in iasworld.pardat
.
The test filters for residential parcels by anti-joining the model against comdat
using parid
and taxyr
; as a result, it filters out mixed-use parcels as well.
Parameters:
major_class_only
(optional boolean): Compare only the first digit of classes. When set to false, compare the first three digits instead. Defaults to false.parid_column_name
(optional string): The name of the column on the base model that corresponds topardat.parid
, in case the model uses a different name scheme. Defaults toparid
.taxyr_column_name
(optional string): The name of the column on the base model that corresponds topardat.taxyr
, in case the model uses a different name scheme. Defaults totaxyr
.join_type
(optional string): The type of join to use when joining topardat
, e.g."inner"
or"left"
. Defaults to"left"
.additional_pardat_filter
(optional string): A SQL string representing additional conditions to apply in theWHERE
clause of the subquery that selects frompardat
to join to the model, e.g."class != 'EX' AND class != 'RR'"
. Note thatcur = 'Y'
anddeactivat IS NULL
are already set prior to this parameter being applied, hence the "additional" in the param name.additional_select_columns
(optional list of dictionaries): Additional columns to select for failure output. The column the test is defined on andpardat.class
will always be selected regardless of this value. Columns must be represented as dictionaries with the following attributes:column
(required string): The name of the column to select.agg_func
(required string): The aggregation function to use for aggregating column values, likemax
orarray_agg
. Necessary because results are automatically grouped byparid
andtaxyr
.alias
(optional string): The name of the column to use for output. Necessary because aggregation functions as represented byagg_func
require aliases in SQL. Defaults to<agg_func>_<column_name>
.
Asserts that row counts for a model or column are above a certain value.
Parameters:
above
(required integer): The minimum row count (inclusive) for the model or column.
Asserts that row values match after joining two tables.
Row values can be a subset of the values in the joined table, e.g. if a PIN in
table A has one row with class = "212"
and the same PIN in table B has two
rows, one with class = "212"
and one with class = "211"
, then table A passes
the test.
Parameters:
external_model
(required string): The name of the model to join to.external_column_name
(required string): The name of the column inexternal_model
to join to.join_condition
(required string): TheON
(orUSING
) portion of aJOIN
clause, represented as a string. Note that in the case whereON
is used, columns in the base model should be formatted likemodel.<column>
while columns in the external model should be formatted likeexternal_model.<column>
, e.g.ON model.pin = external_model.parid
. This is not necessary in the case of aUSING
expression, sinceUSING
does not need to refer to table names for the purposes of namespacing columns.group_by
(optional list of strings): The columns from the base model to pass to theGROUP BY
function used in the test query. Unlikejoin_condition
, these column names do not have to be prefixed withmodel.*
, since they are assumed to come from the base model for the test and not the external model.join_type
(optional string): The type of join to use, e.g."inner"
or"left"
. Defaults to"inner"
.column_alias
(optional string): An alias to use when selecting the column from the base model for output. An alias is required in this case because the column must be aggregated. Defaults to"model_col"
.external_column_alias
(optional string): An alias to use when selecting the column from the external model for output. Defaults to"external_model_col"
.additional_select_columns
(optional list of dictionaries): Additional columns to select for failure output.model.<column_name>
,external_model.<external_column_name>
, and the columns specified in thegroup_by
parameter will always be selected regardless of this value. Columns must be represented as dictionaries with the following attributes:column
(required string): The name of the column to select.agg_func
(required string): The aggregation function to use for aggregating column values, likemax
orarray_agg
. Necessary because results are automatically grouped by the columns specified in thegroup_by
parameter.alias
(optional string): The name of the column to use for output. Necessary because aggregation functions as represented byagg_func
require aliases in SQL. Defaults to<agg_func>_<column_name>
.
Asserts that a column contains sequential values. Can be used for both numeric values and datetime values.
Parameters:
interval
(optional integer): The expected gap in units between two sequential values. Defaults to1
.datepart
(optional string): When present, indicates that values are datetimes and describes the unit of dates that should be used byinterval
to establish expected gaps, e.g."hour"
or"day"
.group_by_columns
(optional list of strings): The group of columns to use for partitioning in the window function that is used to lag the base column. Defaults to an empty list.additional_select_columns
(optional list of strings): Additional columns to select for failure output. The column the test is defined on, the columns ingroup_by_columns
, and value for the column in the preceding value in the sequence (aliased toprevious_<column_name>
) will always be selected regardless of this value.
Asserts that the combination of columns always produces unique rows in a relation. For example,
the combination of parid
and taxyr
might produce unique rows even though neither column is
unique in isolation.
Since this test operates on a combination_of_columns
list instead of a scalar column_name
,
it must be defined on the table level rather than on the column level.
Parameters:
combination_of_columns
(required list of strings): One or more columns that are unique as a group.allowed_duplicates
(optional integer): The maximum number of duplicates that is considered acceptable for the purposes of uniqueness. Defaults to 1.additional_select_columns
(optional list of strings): Additional columns to select for failure output. Regardless of this value, the columns defined bycombination_of_columns
along with an automatically generated columnnum_duplicates
will always be selected.
Asserts that a given expression returns a non-zero number of rows.
Since this test operates on an expression
instead of a column_name
,
it must be defined on the table level rather than on the column level.
Parameters:
expression
(required string): A valid SQL string representing the expression that should return a non-zero number of rows.