This section illustrates how to extract/infer/predict unseen forms with the Unsupervised version of Form Recognizer. At this stage, the relevant model associated with the form to extact has been identified and we are now ready to run our prediction.
In summary, these files will download files from a Storage Container that represent the Test set, i.e. files not trained in the training process and perform a prediction on each one. The corresponding Ground Truth (GT) value will also be retrieved. Formatting should be applied and both the extracted and formatted value, alongside the GT value will be written to a json file.
This script will:
- Iterate through every container if the name of the container contains ENV VAR
CONTAINER_SUFFIX
+ ENV VARTRAIN_TEST
. If the ENV VARRUN_FOR_SINGLE_ISSUER
is set, only this vendor/issuer will be processed. - Load the corresponding ground truth record (GT) for a form. The GT file is specified by the ENV VAR
GROUND_TRUTH_PATH
- Download the files to a local directory specified by the ENV VAR
LOCAL_WORKING_DIR
. Note, the files will be randomly sampled for evaluation and the number sampled is specified by the ENV VARSAMPLE_NUMBER
- Retrieve the values from the GT for the keys we want to extract/tag/label. The keys to be extracted are
specified by the ENV VAR
KEY_FIELD_NAMES
- Call Read Layout (OCR) for the invoice if no OCR file exists for the form. The endpoint for the OCR is
specified by the ENV VAR
ANALYZE_END_POINT
with the Cognitive Subscription endpoint specified by the ENV VARSUBSCRIPTION_KEY
- Invoke the prediction/evaluation of Form Recognizer, apply some basic formatting and write to a local directory
specified by the ENV VAR
LOCAL_WORKING_DIR
+ '/unsupervised_predict_' + [issuername].json
Have a look at the accelerator Predict Form Recognizer Unsupervised
With the Unsupervised version of Form Recognizer, it can save on post-processing developer time by filtering only the keys that are needed for extraction, and by correlating the keys identified by Form Recognizer with the Taxonomy values.
-
Step 1 - We load the Taxonomy file
-
Step 2 - We get the keys from Form Recognizer Have a look at the function form_recognizer_get_keys
-
Step 3 - Map the keys we want to extract to the Taxonomy values from Step 2 Have a look at the function build_unsupervised_filter_keys
-
Step 4 - Build the querystring for prediction Have a look at the function build_unsupervised_filter_keys
-
Step 5 - Map the Taxonomy values back to the key fields we want to extract Have a look at the function get_synonym_key_from_value
Back to the Extraction section