Overview
Field Identification (Field ID) tasks are specific to Semi-structured documents. These tasks are created when the system is unsure if it has correctly located a field, or if a field is specified for manual identification in the Layout Editor.
The task begins with an instruction screen that states best practices for completing Field ID tasks. You may choose to dismiss this screen after its first appearance.
Documents without tables
On the Field Identification page, a list of the layout's possible fields appears in the right-hand sidebar.
If you have a trained model for the layout, we may have predictions for the placement of some fields. As part of Field Identification, you will need to:
- confirm or correct our predictions, and
- manually identify any fields we don't have predictions for.
Before you begin…
To help us transcribe each field accurately, review our Best Practices for Field ID Supervision and QA and keep them in mind as you identify each field's content.
You can complete Field ID Supervision by following the steps below.
-
Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts. The selected field will be highlighted.
- If we have a prediction for the field, its icon in the right-hand sidebar appears in blue (). The prediction appears on the image as a blue bounding box, and you can move to step 3.
- If we have a prediction for the field, its icon in the right-hand sidebar appears in blue (). The prediction appears on the image as a blue bounding box, and you can move to step 3.
-
If we don't have a prediction for the field, you will need to manually identify the field's content on the page.
- Click on the field's content. A bounding box appears around the field's content, which is highlighted in blue. The size of the bounding box is automatically adjusted depending on the field’s type (e.g., text field, checkbox, or signature).
- Click on the field's content. A bounding box appears around the field's content, which is highlighted in blue. The size of the bounding box is automatically adjusted depending on the field’s type (e.g., text field, checkbox, or signature).
-
Do one of the following:
If... Then... The bounding box includes all of the field's content. Move on to the next step. The box is in the right place but doesn't include all of the field's content (e.g., parts of letters fall outside of the box). Adjust the edges of the box until it contains all of the content that should be transcribed.
Neighboring text segments should also be included in the field's transcription. With a click-and-drag motion, draw a bounding box that includes all of the field's content.
The box doesn’t include any of the field’s content OR no bounding box appears around the field’s content when hovering over it. Press the spacebar, and with a click-and-drag motion, draw a bounding box that includes all of the field's content.
- Repeat steps 1-3 until you’ve identified all of the fields that appear in the document, confirming or correcting our predictions or manually identifying fields we don’t have predictions for.
-
When you’ve finished identifying fields in a document, click Confirm All and Continue (Enter), or press Return or Enter.
- If any bounding boxes are overlapping, you will be asked to make the bounding boxes tighter to the area of the field.
Documents with tables
If your document contains both fields and tables:
- You will first complete Field Identification, as outlined above.
- Then, you will be brought to the Table Identification workflow.
For more detailed information about table extraction, see Table Identification.
Fields with multiple occurrences
The Multiple Occurrences (MOs) feature helps you identify multiple instances of a field.
An occurrence is defined as one of the distinct values from a sequence of values for a field. For example, if two people own a bank account, then the “Account Owner” field needs two values to be extracted, one for each name.
Do NOT use MOs to select the same value multiple times. If the same value appears twice, you should only label the first occurrence in the natural reading order.
Identify multiple occurrences of a field
-
Create a bounding box for the first occurrence of the field.
-
Click on Add another [field’s name]. You can also use the shortcut CMD + ALT + +.
- Create a bounding box for the second occurrence of the field.
Use the Add another text segment option when a given occurrence cannot be annotated with only one bounding box. For more information on multiple bounding boxes, see Multiple bounding boxes for fields.
- Repeat steps 2 and 3 for each additional occurrence of the field.
Multiple Occurrences model
The default Field ID model can only predict one occurrence per field. For example, if a field can be referred to as a sequence of distinct values, you may need to use the Multiple Occurrences model. For more information on training a new Field ID model, see Training a New Field Identification Model.
If you want to process documents with multiple occurrences of fields for a specific layout, you need to select the Multiple Occurrence Field ID model for the layout before model training. To do so, follow the steps below:
-
Go to the admin page by adding “/admin/form_extraction/template/” to the end of the application URL (e.g., production.example.com/admin/form_extraction/template/).
- Click on the UUID of the layout you’d like to train a Multiple Occurrence model for.
-
In the Flex engine type for training setting, select MULTIPLE_OCCURRENCES from the drop-down menu.
- Click Save.
If you have a trained MULTIPLE_OCCURRENCES model for the layout, we may have predictions for the placement of some of the fields’ occurrences. As part of Field Identification, you will need to:
- confirm or correct our predictions, and
- manually identify any occurrences we don't have predictions for.
Note that if your model has low confidence in identifying some of the field’s occurrences, the entire field will be sent to Field ID Supervision, not individual occurrences.
If you previously used a layout to extract multiple occurrences in v32, v33, or v34 but still do not have a trained MULTIPLE_OCCURRENCES model created in v35 or v36, you can always send your documents to Field ID Supervision by doing one of the following:
- Enable Identification Supervision in the Layout Editor for fields with possible multiple occurrences. To learn more, see the “Defining field metadata” section in the Creating Semi-structured Layouts article.
- Using a custom code block, set the value of the manual_identification_processing_type property to FORCE for the Manual Identification block. Thus, the machine will try to predict the first occurrence of a field on a page, and your keyers can identify the additional occurrences manually. To learn more about Custom Code Blocks, see the “Custom Code Blocks” section in Flow Blocks.
Multiple bounding boxes for fields
To annotate values across line and page breaks that can’t be captured by a single bounding box, you can draw multiple bounding boxes for fields. For example, if you have a field whose value spans across two pages, you can draw two bounding boxes for the same field.
To create multiple bounding boxes for a single field, follow the steps below:
-
Select a field in the right-hand sidebar by clicking on it or using keyboard shortcuts.
-
Draw a bounding box for the first text segment of the field.
-
Click Add another text segment.
-
Draw a bounding box for the second text segment of the field.
Note that you can repeat steps 3 and 4 until you’ve identified all text segments of a field.
If a field has an occurrence with multiple bounding boxes, and you delete this occurrence, all of the occurrence’s bounding boxes will also be cleared.
Unstructured extraction
Unstructured extraction allows you to extract data points from long documents with unstructured text. For example, you can now identify fields of interest, such as title deeds, 10-Ks, and others, that you can then use downstream to generate actionable insights.
Data keyers can become more efficient at working on their tasks as they can leverage the automation capabilities of unstructured extraction. This automation is now possible with the introduction of an UNSTRUCTURED_EXTRACTION ID model. To learn how to enable and train UNSTRUCTURED_EXTRACTION ID models, see Training a New Field Identification Model.
You can use data-point extraction in conjunction with Named Entity Recognition Blocks.
Limitations in v36
The following limitations apply to unstructured extraction in v36:
- Unstructured extraction is available in SaaS deployments only.
- There is no Guided Data Labeling feature for UNSTRUCTURED_EXTRACTION models.
- You can extract long fields with up to 50-100 words.
- Unstructured extraction cannot identify checkboxes, signatures, and fields with multiple occurrences.
- Unstructured extraction is tested only with English text and may not perform well with other languages.
Searching for text segments
To search for text segments across all pages of a document, you can:
- Click the search bar at the top of the Field ID page and type a keyword. Note that the search bar supports single-word searches. We recommend searching for the most relevant keyword.
- Press Command + Option + F for Mac or Control + Alt + F for Windows and type a keyword.
Pressing Return for Mac or Enter for Windows will return all text segments that match your search. You can navigate through the search results, using the Previous Segment () and Next Segment () buttons.
All search results are highlighted with orange rectangles. The currently selected search result is highlighted with a dark orange rectangle while all other search results are highlighted with light orange rectangles.
Note that the search is not case-sensitive.
Addressing incorrect layouts
The Field ID task provides a Mark Layout Variation Incorrect button under the "Document Details" dropdown in the right panel. This action allows the user to skip further extraction on Semi-structured documents which have been matched to the wrong layout.
Once the Mark Layout Variation Incorrect action has been confirmed, all pages in the document will be marked as No Layout Found, and no further extraction work will be performed.
Multi-page documents & page re-ordering
If the document has more than one page, you can navigate between pages by clicking on the preview images in the left-side column. If the pages are out of order, you can also re-arrange them while in the Field ID task view.
To do so, hover over the image you'd like to move, then click and drag the page to its desired position.
When locating fields, you should look across all pages before concluding that a field is not present. If the field is present on multiple pages, you only have to draw the box once. You should work with your team to determine if there's a preference on which page to draw the box on.
Keyboard shortcuts
Field identification
Task | Mac Shortcuts | Windows Shortcuts |
Change label location |
F4 |
F4 |
Clear bounding box |
Backspace |
Backspace |
Next field in list |
E or ⬇ |
E or ⬇ |
Previous field in list |
W or ⬆ |
W or ⬆ |
Free draw a bounding box |
Spacebar + click and drag |
Spacebar + click and drag |
Add an additional occurrence of a field |
Command + Option + + |
Control + Alt + + |
Remove an additional occurrence of a field |
Command + Option + - |
Control + Alt + - |
Add another text segment |
Option + click and drag |
Alt + click and drag |
Add another text segment and free draw a bounding box (Only supported for text fields) |
Option + Spacebar + click and drag |
Alt + Spacebar + click and drag |
Focus segment search bar |
Command + Option + F |
Control + Alt + F |
Complete Task |
Command + Return |
Control + Enter |
All tasks
Task | Mac Shortcuts | Windows Shortcuts |
Zoom in |
Option + + |
Alt + + |
Zoom out |
Option + - |
Alt + - |
Next page |
Fn + ⬇ |
Page down |
Previous page |
Fn + ⬆ |
Page up |
Keyboard shortcuts |
F2 |
F2 |
Close task |
Option + Command + X |
Alt + Control + X |