36.0.22 (30 May 2024)
Languages
Fixed
Machine transcription of Korean multiline fields – We’ve fixed an issue that caused machine transcription to fail on multiline Korean fields. Examples of incorrect transcriptions and their corrected versions appear below.
- Incorrect: 원전세배당소득 | Correct: 이자.배당소득 원천세
- Incorrect: 이장소백당소득 | Correct: 이자.배당소득 지방소득세
Flows
Updated
Floating-point values for SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS – In addition to integer values, you can also enter values of type float for the SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS ".env" file variables. This update gives you flexibility when customizing your submission-processing latency, particularly if low latency levels are desired.
Machine Identification
Fixed
Asterisk as custom character for splitting segments – We've resolved an issue with text-segment detection that resulted in IndexError: list index out of range during Machine Identification. The issue occurred when custom_char_for_splitting_segments was set to * in /admin.
Quality Assurance
Fixed
Logic for automatic QA sampling rates – We’ve fixed a dereferencing issue that caused silent failures for flows with specific default settings. As part of this update, the logic accurately handles dereferencing, ensuring proper handling of affected flows. The issue affected flows created in Hyperscience versions that preceded the application version.
Installation
Updated
Increased post-install timeout – We’ve increased the post-install timeout from 3 to 4 minutes, providing more time for the system to load. Users will now see the following error message if the system’s timeout is reached before the application starts:
“The application does not appear to have started within 240 seconds, however startup times can vary due to environmental factors such as proxies and firewalls. If you are unable to access the login screen after an additional 2-3 minutes, please reach out to Hyperscience Support for additional troubleshooting assistance.”
36.0.21 (14 Feb 2024)
Submissions
Updated
Support for EML files and their attachments – You can now extract data from EML files and their attachments. When an EML file is ingested, the system creates a PDF file from the email's body and processes each of the file's attachments as a separate document in the submission.
Submission Processing
Fixed
Logging of errors during submission pre-processing – We've fixed an issue that allowed personally identifiable information to be stored in the error logs created during submission preprocessing.
Training Data Management
Fixed
Saving annotations before viewing another document – We've resolved a race condition that sometimes caused annotations to be saved to the next-viewed document when a keyer clicked the Previous or Next arrow buttons as annotations were being saved.
Flows
Fixed
Viewing restricted flows without the required permissions – We've fixed an issue that caused the page to repeatedly reload when a user attempted to view a flow that they did not have the required permissions to access. As part of this update, an error message appears in these situations, notifying the user that they are not authorized to view the flow.
Reporting
Fixed
Calculating the time taken to complete Flexible Extraction and Custom Supervision tasks – We've fixed an issue that caused completion-time calculations for Flexible Extraction and Custom Supervision tasks to depend on the contents of the tasks (e.g., fields, tables, decisions).
OpenID Connect (OIDC)
New
Redirecting users during ID token renewal – By default, when renewing OIDC ID tokens, the application no longer redirects users to the identity provider’s token endpoint. To allow this step to be bypassed, we have introduced the HS_OIDC_RENEW_ID_TOKEN_WITH_REFRESH_TOKEN “.env” file variable. When this variable is set to true, the renewal transaction occurs without redirecting users out of the application, enhancing the overall user experience. See OpenID Connect (OIDC) for more details.
36.0.20 (23 Nov 2023)
Languages
Fixed
Processing text in Turkish submissions – We've resolved a string-parsing issue that caused Turkish submissions to halt if they contained a colon (":").
Flows
Updated
Default timeout for blocks – We've increased the default timeout for block requests from 60 seconds to 180 seconds.
Training Data Management
Fixed
Columns not present in current layout – We’ve fixed an issue where Table Identification model training failed due to the presence of annotations for columns that were not in the latest live layout version.
Task Queue
Fixed
Effects of changing filters – We’ve fixed an issue where changing the filters in the Task Queue would only apply the changes and deselect any filters originally applied. This issue led to incorrect filtering when choosing both a date range and a filter from the Filters list.
OpenTelemetry
Updated
Emitting metrics after time-outs – The system now emits OpenTelemetry metrics for tasks after they exhaust their allotted retry attempts and time out.
36.0.19 (9 Nov 2023)
Input Connections
Updated
Number of subfolders scanned by Email Listener for Microsoft 365 Outlook – We've updated the number of subfolders scanned by the Email Listener in Microsoft 365 Outlook accounts from 10 to 100.
36.0.18 (26 Oct 2023)
Submissions
Fixed
Responsiveness when viewing documents – We’ve fixed a query-plan issue in deployments using MSSQL databases that caused delays when opening the document viewer in some instances.
Reporting
Fixed
Accounting for differences between browser and server times – We've updated our task-completion-time calculations to account for the difference between the server's timestamp and the browser's timestamp.
36.0.17 (13 Oct 2023)
Table Identification
Fixed
Completing submissions containing nested tables with orphan rows – We’ve fixed an issue that allowed submissions with a “Confirm that all pages are reviewed” warning message to be completed with orphan rows, leading to those submissions halting.
Transcription Supervision
Fixed
ResizeObserver loop exceeded error in Chrome on Macs – We've fixed an issue that caused ResizeObserver loop exceeded errors to occur during Transcription Supervision in Mac Chrome browsers in some instances.
Security
Fixed
Addressing security vulnerabilities – To ensure security, we've updated:
- sentry-sdk to 1.14.0,
- scipy to 1.10.0, and
- mpmath to 1.3.0.
36.0.16 (14 Sept 2023)
Languages
Updated
Improvements to Korean-English translations – We've optimized the transcription model used for Korean-English documents.
Data Types
New
Enhanced Korean Freeform – We've added an Enhanced Korean Freeform data type that leverages a Korean large language model for transcriptions. It increases transcription accuracy on Korean documents, particularly on degraded images.
Layouts
Fixed
Sorting layouts by "Last Updated" – We've fixed an issue that caused the table on the Layouts page (Library > Layouts) to become empty when it was sorted by the contents of the Last Updated column.
36.0.15 (31 Aug 2023)
Data Types
New
Capitalized Names – We've added a Capitalized Names data type that expects names that have the first letter of each name (e.g., first name and last name) capitalized.
Training
Fixed
Field Identification training and bounding-box coordinates – We've fixed an issue that caused Field Identification training to fail if a document's bounding boxes had certain coordinates.
Releases
Fixed
Loading of Releases page – We've optimized the loading of the Releases page (Library > Releases), resolving an issue that prevented the page from being displayed in some instances.
Flows
Fixed
Timeouts for flow-block polling requests – We've added timeouts for flow blocks' polling requests, preventing failed requests from continuing indefinitely.
UiPath Notifier
Fixed
Default authentication method – We've fixed an issue that caused OAuth to be the default authentication method for UiPath Notifier connections. The issue caused flows that used Basic Authentication for these connections to fail.
Reporting
Updated
Definition of dt_started – We've changed dt_started from the time the task was first assigned to the time when the task was opened. This update creates a more accurate measurement of the time taken to complete tasks.
Security
Fixed
Updating Django – To address security vulnerabilities, we've updated Django to 3.2.20.
36.0.14 (10 Aug 2023)
Submission Processing
Fixed
Duplicate submission-processing tasks – We've fixed a race condition in our task-synchronization manager that sometimes caused internal tasks to be executed more than once for a submission, resulting in data corruption.
Flows
Fixed
Subprocesses from pagination – Previously, pagination tasks sometimes created subprocesses that wouldn't time out if they couldn't be completed. To resolve this issue, we've added timeouts to these subprocesses.
Security
Fixed
Addressing security vulnerabilities – To ensure security, we've updated certifi to 2023.7.22 and pyJWT to 2.7.0.
36.0.13 (27 Jul 2023)
Languages
Fixed
Text segmentation for non-Latin languages – We’ve fixed the language parameters for segmentation for languages outside of the Latin language family. We’ll use the language of the submissions during segmentation rather than linking them to the Latin language family by default. For example, we won’t use Latin segmentation if you upload Korean or Korean-English documents.
Classification
Fixed
Loading pre-computed Structured Classification data for releases – We've reduced the amount of time required to load pre-computed Classification data for Structured documents when a new release is deployed. This update increases the efficiency of submission processing when multiple versions of a Structured layout are included in a release.
Transcription Supervision
Fixed
ResizeObserver loop exceeded error in Chrome on Macs – We've fixed an issue that caused ResizeObserver loop exceeded errors to occur during Transcription Supervision in Mac Chrome browsers in some instances.
Security
Fixed
Updating paddlepaddle – To address security vulnerabilities, we've updated paddlepaddle to 2.4.2.
36.0.12 (20 Jul 2023)
Trainer
Fixed
Displayed task statuses for completed tasks – We’ve fixed a UI issue that displayed the trainer’s task status as “Running” after completion on the Trainer page (Administration > Trainer).
Document Classification
Updated
Displaying Submission ID – We’ve added the Submission ID to the top of the page for easier traceability of your uploads (“Document Classification: Submission <submission_id>”).
36.0.11 (10 Jul 2023)
Machine Identification
Fixed
Consistency in field-location predictions – We’ve fixed an issue with non-deterministic behavior during field grouping, which caused different predictions for the same documents.
Databases
Fixed
Notifications and deadlocks – We've resolved an issue that caused database deadlocks to occur if the user and system made changes to notifications at the same time.
36.0.10 (3 Jul 2023)
Manual Transcription
Fixed
Normalization of Date table columns with column-specific languages – We've resolved an issue related to normalization after Manual Transcription. Date table columns with a different language from the one assigned to the layout are now normalized correctly. For example:
- Before: MM/DD/YYYY was normalized as YYYY/MM/DD.
- After: MM/DD/YYYY is normalized as MM/DD/YYYY.
Security
Fixed
HS_TLS_VERIFY_ENABLED and requests – A recent upgrade to requests caused SSL certificate validation errors to occur even when HS_TLS_VERIFY_ENABLED was set to false. To resolve this issue, we've downgraded requests to 2.27.1.
36.0.9 (23 Jun 2023)
User Experience
Updated
Maximum number of files per upload – We've increased the default maximum number of files per upload from 100 to 1000. This value can be customized with the DATA_UPLOAD_MAX_NUMBER_FILES ".env" file variable. The maximum applies to both training-data pages for models and submission pages.
Flows
Fixed
Deploying flows with validation errors – We’ve fixed an issue that allowed users to deploy flows that contained validation errors.
Manual Transcription
Fixed
Normalization of Date fields with field-specific languages – We've resolved an issue related to normalization after Manual Transcription. Date fields with a different language from the one assigned to the layout are now normalized correctly. For example:
- Before: MM/DD/YYYY was normalized as YYYY/MM/DD.
- After: MM/DD/YYYY is normalized as MM/DD/YYYY.
File Storage
Fixed
Sanitizing filename headers – We've fixed a data-sanitization issue in HTTP filename headers that prevented files from being downloaded to the file store.
36.0.8 (17 June 2023)
Machine Classification
Fixed
Classifying Structured documents with extreme aspect ratios – We've resolved an issue that caused out-of-memory errors to occur when the machine attempted to classify structured documents with extreme aspect ratios (e.g., 600 x 2 pixels). As part of this update, the system pre-calculates classification data for releases containing Structured layouts. These calculations may increase the time required to process the release the first time it is used.
Security
Fixed
Addressing security vulnerabilities – To ensure security, we've updated:
- Pillow to 9.5.0 and
- requests to 2.31.0.
36.0.7 (9 June 2023)
Transcription Supervision
Fixed
Reviewing blank table cells in documents with low-confidence transcriptions – We've resolved an issue that caused blank table cells to be sent to Transcription Supervision when the system transcribed text elsewhere in the document with low confidence.
As part of this update, we've changed the name of the Send Blank Cells to Manual Transcription setting in the Manual Transcription Block to Create Manual Transcription Task for Tables with Blank Cells.
Output Blocks
Fixed
Retrieving OAuth tokens for HTTP Notifier Output Block health checks – We've fixed an issue that caused the system to request a new OAuth token for each automatic health check performed by HTTP Notifier Output Blocks.
Security
Fixed
Addressing security vulnerabilities – To ensure security, we've updated:
- sqlparse to 0.4.4 and
- the version of Golang used to compile Filebeat to 1.20.4.
36.0.6 (17 May 2023)
Languages
Fixed
Recognition of text segments by the Korean and English language model – We've resolved a text-processing issue that sometimes prevented the machine from recognizing text segments in their entirety in documents whose language was "Korean and English." The same issue also prevented some text segments that contained only one digit from being detected.
Keyer Data Management
Fixed
Deleting training documents that are being processed – We've fixed an issue that caused errors to occur when training documents that were being processed were deleted.
Machine Identification
Fixed
Target accuracy when "Manual Identification Supervision" is disabled – We've fixed an issue that prevented target accuracy values from being set to 0 when the Manual Identification Supervision flow setting was disabled. This issue caused thresholding to be applied to machine-only identification, which affected the machine's predictions.
Comparing a layout's columns to columns in training data – We've resolved an issue that prevented the Machine Identification Block from comparing the columns in the latest layout version to the columns that the layout's model was trained on. This issue caused documents with columns that were not included in the model's training to be automatically sent to Identification Supervision, reducing automation.
File Storage
Updated
Enhancements to directory structure – We've updated the directory structure in file stores for faster data retrieval. Files are now stored in a directory with six levels to minimize the number of files stored in any single directory, preventing performance issues that may occur in high-volume instances.
As part of this update, we've added the following as valid values of the FORMS_STORAGE_MODE “.env” file variable:
- FILE_EX
- S3_EX
- AZURE_BB_EX
These values replace FILE, S3, and AZURE_BB, respectively, as valid values of FORMS_STORAGE_MODE. Instances with these values will have their file stores migrated to the new structure upon upgrading to v37. If you do not want your file store to use the new structure, set FORMS_STORAGE_MODE to FILE_LEGACY, S3_LEGACY, or AZURE_BB_LEGACY.
36.0.5 (9 May 2023)
Layouts
Updated
"Not in <layout language>" option for table columns – The Not in <layout language> option that has been available for fields can now also be applied to table columns in Semi-structured layouts. This option allows you to assign languages on a per-column basis, giving keyers the ability to enter transcriptions that are not in the language assigned to the document's layout.
To learn how to use the Not in <layout language> option, see Creating Semi-structured Layouts.
Training
Fixed
list index out of range in get_ground_truth_pages error when training Field Identification models – We've fixed a page-indexing issue that caused an IndexError: list index out of range in get_ground_truth_pages error to occur when training Field Identification models.
36.0.4 (4 May 2023)
Flow Blocks
Updated
Generating checksums for individual blocks – We now generate checksums of each block's command file, which are used to identify the blocks in the database and prevent duplicate blocks from being uploaded.
36.0.3 (26 Apr 2023)
Releases
Fixed
Exporting locked releases – We've resolved an issue that prevented users from exporting locked releases. Doing so resulted in a Could not export archived layout release with UUID: <release_uuid> error message.
Keyer Data Management
Fixed
Tooltips for "Previous / Next document in list." buttons for documents sorted in descending order – We've fixed an issue that caused the "Sorted by:" portion of the Previous document in list. and Next document in list. buttons' tooltips on the Annotations page to be incomplete when documents were sorted in descending order. For example, if the documents were sorted by Pages in descending order in the Training Documents table, the tooltip read "Sorted by:" instead of "Sorted by: Pages".
Editing annotations for documents that are being deleted – We've resolved an issue that allowed users to edit annotations for documents that were being deleted (e.g., by other users, as part of PII data deletion).
SaaS
Fixed
/admin access for users with permitted email-address domains in deployments without AWS ALB – We've fixed an issue that prevented users with permitted email-address domains from accessing /admin in deployments without AWS ALB authentication.
Security
Fixed
Addressing security vulnerabilities – To ensure security, we've updated:
- Pillow to 9.3.0,
- json5 to 2.2.3,
- esplint to 0.10.1,
- mocha to 10.2.0, and
- webpack to 5.77.0.
36.0.2 (6 Apr 2023)
Submission Processing
Fixed
Splitting a page's text into segments when "0" is a segment's only character – We've fixed an issue that prevented a page's text from being split correctly into text segments when segments contained only the "0" character. This issue caused processing delays and excessive memory usage.
Machine Classification
Fixed
Storing pre-calculations for classifying Structured documents – We've resolved an issue that caused invalid memory alloc request errors when the system attempted to store pre-calculated values for the release's Structured layout variations in the database. The issue affected instances with PostgreSQL databases.
Classification Supervision
Fixed
User interface for Classification Supervision tasks – We've made the following fixes to the Classification Supervision user interface:
- We've widened the right-hand panel, enlarging the image of the page being categorized.
- We've fixed an issue that caused the screen to flicker each time a keyer clicked on a thumbnail in the left-hand panel.
- We've resolved an issue that caused the right-hand panel to be hidden when a keyer clicked on a page group in the middle panel.
Reporting
Fixed
Counting time spent on Classification Supervision tasks where pages are classified as "Other" – We've fixed an issue that prevented time spent on Classification Supervision tasks from being included in Document Classification Supervision Time Spent (Seconds) when keyers classified all pages as "Other" during the tasks. The data for Document Classification Supervision Time Spent (Seconds) appears in the KeyerPerformance.csv file in the Keyer Projection Report.
Security
Fixed
Updating com.fasterxml.jackson.core:jackson-databind – To address security vulnerabilities, we've updated com.fasterxml.jackson.core:jackson-databind to 2.14.2.
SaaS
Fixed
"API Access" tab in the Users section for deployments without AWS ALB – We've fixed an issue that caused the API Access tab to appear in the Users section of the application in deployments that did not use AWS ALB authentication.
API
Updated
Restricting access to /api/v5/audit_logs – We've revoked access to the /api/v5/audit_logs endpoint from all users except System Admins.
36.0.1 (17 Mar 2023)
Submissions
Fixed
Retrieving blank thumbnail images of pages – We've fixed an issue that prevented blank thumbnail images from being retrieved when no thumbnail images existed for a submission. This issue prevented the application from being initialized in some situations.
Machine Classification
Fixed
Classifying Structured documents written in Japanese or Simplified Chinese – We've resolved an issue that caused submissions to halt at the Machine Classification step if they contained Structured documents written in Japanese or Simplified Chinese.
Classification Supervision
Fixed
"Perform Tasks" link in Submissions table – We've fixed an issue that prevented the Perform Tasks link from appearing in the Submissions table for submissions with Classification Supervision tasks. The issue affected submissions whose first page was classified by the machine.
Keyer Data Management
Fixed
Duplicate pages after training – We've fixed an issue that caused pages to be duplicated after their documents were used for training. The issue affected documents that contained at least one empty page.
"Latest version not live" after uploading releases and training data – We've resolved a timestamping issue that caused a "Latest version is not live" warning message to appear after uploading a release and the training data for its models.
Artifacts
Updated
Logging of artifact-export events – We've changed the severity of the following events from exceptions to warnings in the logs:
- Missing artifacts list
- Missing storage type
- Missing destination
Permissions
Fixed
Logging in without assigned user groups or permissions – Previously, if a user was not assigned to a user group in an identity provider (IdP), or if they were assigned to an IdP user group that did not have any permissions, they could log in to the application, but they could not log out. There was also no messaging to let the user know what they needed to do to resolve the issue. A fix for these issues is included in v36.0.1.
Upgrades
Fixed
Indexing training-data records – We've fixed an indexing issue that caused duplicates of training-data records to be found during the upgrade process, which prevented instances from being updated to v36.
SaaS
Fixed
Authentication and SaaS features when AWS ALB is not used – We've resolved an issue that prevented users from authenticating in some situations when a method other than AWS ALB was used. This issue also caused some SaaS-specific features to be disabled in affected instances.
Application recovery after database failovers – We've fixed an issue that prevented the application from recovering quickly after database failovers. The issue sometimes caused the application to be unresponsive for long periods of time.
36.0.0 (17 Mar 2023)
There is an issue in v36.0.0 that prevents version information from appearing in the UI. For this reason, we recommend using v36.0.1 rather than v36.0.0.
Languages
New
New languages – We've added support for submissions written in the following languages:
- Bulgarian
- Czech
- Estonian
- Hebrew
- Kazakh
- Latvian
- Lithuanian
- Russian
- Slovak
- Thai
- Turkish
We support automation on Structured and Semi-structured submissions in these languages, regardless of whether they contain handwritten or printed data.
To learn more about the languages we support, see Supported Languages.
Updated
Improvements to the Korean language model – We've enhanced the system's ability to accurately recognize and transcribe Korean words in fields with Generic Text, Address, Company Name, and Name data types.
Flows
Updated
Improvements to flow management – In an effort to provide more information about flows and the potential results of certain actions, we've made the following updates to flow-management tasks:
- Option to deploy subflows when deploying flows – When you deploy a flow, a confirmation dialog box appears, which includes an option to deploy all connected subflows.
-
Indicators that distinguish subflows from each other – Each subflow shown in Flow Studio has its own highlight color, making it easier to determine whether subflows are identical or different. We've also allowed more characters in a subflow's title to be shown in Flow Studio.
- Applying changes to a specific instance of a subflow – You can now choose to save changes to a specific instance of a subflow without impacting the other instances of that subflow in the main flow or in other flows.
-
Visual explanation of options when saving changes to a subflow – We've added diagrams explaining the options available when saving changes to a subflow, which illustrate where the changes will be applied.
More details on these updates can be found in Connecting Flow Blocks to Other Flows.
Defining retry policies for flow blocks – You can now define retry policies for flow blocks at the system, flow, or block level. Defining these policies gives you more control over the execution of flows and may prevent submissions from halting when temporary failures occur.
For each policy you create, you can specify the total number of retry attempts that the applicable blocks should have after their initial failure, along with the amount of time that should pass between attempts. For example, you can have a retry policy in which the system retries the block up to three times, with increased time between each attempt.
You can define system-level policies in /admin/hyperflow/wfeconfig/. Flow- and block-specific policies can be defined in the flow settings and block settings, respectively.
For more information, see Defining Automatic Block-Retry Policies.
Flow Blocks
New
Custom Entity Detection Block (beta) – This block provides complementary functionality to the capabilities of the Named Entity Recognition Block. The Custom Entity Detection Block can automatically identify a variety of entities including date, SSN, address, policy number, loan number, credit card number, customer ID, account number, employee ID, employer ID, passport number, driver license number, case number, phone number, application number, routing number, and “other.” In general, the block can be configured to locate and identify:
- single words, and
- word patterns that can be described with a combination of regular expressions and keywords.
You need to use Custom Entity Detection Blocks in conjunction with Full Page Transcription Blocks. For example, you can build a redaction flow that processes documents through full-page transcription, then detects all custom entities that are defined in the Custom Entity Detection Block, and at the end uses a Custom Code Block to place black boxes over the detected entities.
To learn more, see the "Custom Entity Detection Block (Beta)" section of Flow Blocks.
Flow Executions
New
Flow Executions page – To make it easier for users outside of the System Admin permission group to access flow-execution information, we’ve added a Flow Executions page to the Flows section of the application. Using the filters on this page, users can view a list of failed flow executions, which cause halted submissions, and retry the halted submissions that meet the filter’s criteria. Clicking the ID of a flow execution opens its Flow Run page, which contains a diagram of the flow and information about the progress of the flow’s execution.
Users need the View Flow Executions permission to access this page. By default, this permission is given to users in the System Admin and Business Admin permission groups.
For more information, see Flow Executions.
Updated
Flow Run enhancements – To provide more troubleshooting information about flows and their blocks, we've made the following improvements to the Flow Run page (formerly known as the Flow Execution page):
-
New "Code" tab for Custom Code and Python Code Blocks – When you click on a Custom Code Block or a Python Code Block in a flow-execution diagram, a Code tab appears in the bottom panel, containing the Python code for that block.
- Viewing flow inputs, outputs, and errors – You can now view flow-level inputs, outputs, and errors in the Flow Input, Flow Output, and Flow Runtime Errors tabs, respectively, on the Flow Run page.
To learn more about these updates, see Testing and Debugging Flows.
Flows SDK
Updated
Configuring Custom Code Blocks to accept specific file types as input – With the addition of the File input type within the Parameter class in v36, you can now configure Custom Code Blocks to accept any file type as input, including CSV and JSON files. Users can import and update files of the type expected by the Custom Code Block via the Flows settings sidebar in Hyperscience.
Submissions Table
New
Downloading submission-activity logs – In v36, we’ve added support for downloading submission-activity logs. The submission-activity logs provide you with information about how your submissions progressed through their flows.
To download submission-activity logs, go to the Submissions table, click the menu ( ), and then click Download Submission Activity Logs.
The downloaded submission-activity file is in CSV format.
For more details, see Navigating the Submissions Table.
Classification
Updated
Text Classification improvements – In v34, we introduced a "preview" version of the Text Classification feature. In v36, we've made updates to the application that streamline the use of Text Classification and make the model-management experience similar to that of other models:
- Text Classification models are now included on the Models page (Library > Models).
- A Model Details page is available for each Text Classification model, which shows the model's projected automation, training data, and information on each set of samples used for training.
- From the Model Details page, you can run training for a model, monitor the status of the training, and deploy the model after training is complete.
- You can also import and export Text Classification models, as well as the models themselves.
To learn more about Text Classification in v36, see Text Classification.
Layout Variation Alerting – With Layout Variation Alerting, users are notified if pages marked as “No Layout Found” are matched to existing layout variations that are not included in the flow's release. When Layout Variation Alerting is enabled, the system attempts to find layout variations for pages marked as “No Layout Found” on a nightly basis.
Note that Layout Variation Alerting is not available in SaaS deployments of Hyperscience. Also, we do not recommend enabling it in instances that process more than a million pages per day.
To learn more about Layout Variation Alerting and how to enable it, see Layout Variation Alerting.
Field Identification
New
Extracting data points from unstructured documents – Unstructured extraction allows you to extract data points from long documents with unstructured text. To leverage the automation capabilities of unstructured extraction, you need to select the new field ID model called UNSTRUCTURED_EXTRACTION under Flex Engine Type for Training at /admin/form_extraction/template/ for a given layout before training.
With the introduction of this new ID model, you can achieve automation based on the threshold you specify in the Field Identification Target Accuracy flow setting.
To upload and annotate unstructured documents, you can use the Keyer Data Management functionalities in the Model Details page.
Note that unstructured extraction is available in SaaS deployments only.
To learn more about the new UNSTRUCTURED_EXTRACTION model, see Training a New Field Identification Model.
Field Anomaly Detection – As your keyers identify field values in documents, they may sometimes select different instances of the same value across documents. These inconsistencies lead to decreased performance in Field Identification models over time. The Field Anomaly Detection feature analyzes training data before it is used in model training and flags potential mistakes and inconsistencies in field identification. You can then review these annotations and verify or edit them before they are used in training.
Field Anomaly Detection runs as part of Training Data Analysis, which you can run from the Model Details page.
The system highlights documents that contain potential anomalies, and you can review each document's annotations by clicking the Edit annotations link for that document.
For more information about Field Anomaly Detection, see Detecting and Correcting Anomalies in Field Annotations.
Enhanced signature detection – With the updates made in v36, the system can better detect signature fields, improving automation in the processing of signatures.
Transcription
Updated
Separate thresholds for the Automatic QA Sample Rate flow setting – To give you more flexibility when using automatic QA sampling, you now have separate thresholds for the Automatic QA Sample Rate flow setting. We’ve separated the thresholds in the following way:
- Structured Text Transcription QA Sample Rate
- Structured Checkbox Transcription QA Sample Rate
- Structured Signature Transcription QA Sample Rate
- Semi-structured Transcription QA Sample Rate
Note that the Automatic QA Sample Rate feature is optional, and you can still manually set QA sample rates.
To learn more, see Flow Settings.
Improved, faster transcription of PDFs – Previously, if a submission contained a PDF file, the system would convert each of the file's pages into an image before extracting data from the file. In v36, you can choose to extract data from the PDF directly rather than from images of its pages. This update improves machine transcription in PDFs and increases the speed of transcription.
To enable this feature, select the Faster PDF Transcription option in the Machine Classification Block's settings.
Note that you cannot have both Image Correction and Faster PDF Transcription enabled in the same Machine Classification Block, and PDFs must be oriented correctly in order for the Faster PDF Transcription feature to perform as intended.
For more information on the Faster PDF Transcription option, see the "Machine Classification" section of Flow Blocks.
Optimizations for full-page transcription – We've enhanced full-page transcription to make it faster and more accurate, particularly when processing long lines of text.
Transcription automation for fields with multiple bounding boxes – We've added support for transcription automation for fields identified with multiple bounding boxes in Semi-structured layouts.
Custom Supervision
New
Decision dependencies – The addition of decision dependencies for Custom Supervision tasks provides you with more flexibility for configuring available options in decision drop-down menus. In v36, Custom Supervision tasks support both decision dependencies within a single document and decision dependencies across multiple documents.
Configuring decision dependencies within a single document allows you to present different options in decision drop-down menus based on user input. For example, you can configure Custom Decision 1 with two possible answers. Based on the selected answer for Custom Decision 1, you will receive different possible answers for Custom Decision 2.
The newly-added support for decision dependencies across multiple documents also allows you to present different options in decision drop-down menus based on user input. The difference here is that your answers in one of the documents affect the possible answers in other documents. For example, you can configure Custom Decision 1 in Document 1 with two possible answers. Based on the selected answer for Custom Decision 1, you will receive different possible answers for Custom Decision 2 in Document 2.
Note that you can configure decision dependencies only for documents and cases.
Mandatory decisions – We’ve added support for mandatory decisions in Custom Supervision tasks. This allows you to mark important decisions critical for downstream processing and data quality. These decisions will need to have a value assigned before users can move on to the next Supervision task.
To learn more about these updates, see Custom Supervision.
Keyer Data Management
New
Changing training statuses of multiple documents simultaneously – With the introduction of the Edit training status option in v36, you can now edit the training statuses of multiple documents in bulk. To take advantage of this functionality, select documents from the Training Documents table on a model's Model Details page, and then click the Edit training status option that is located in the Actions drop-down menu.
You can only change the statuses of annotated documents. If any of the selected documents are not annotated, a warning message will appear in the Edit Training Status dialog box.
Annotation suggestions for fields with multiple bounding boxes – To expand the capabilities of the Guided Data Labeling feature, the annotation suggestions now provide you with predictions about where all bounding boxes of a field might be located.
For more information on these updates, see Keyer Data Management.
Reporting
Updated
Hourly breakdown of System Throughput report – We’ve added support for downloading hourly breakdowns of the System Throughput report (Reporting > Overview).
Note that hourly data is available only from the date v36 begins running in your instance. To manage database size, hourly data accumulates for up to 30 days. After 30 days, the first day’s data is deleted, and so on as each day passes.
To learn more about the System Throughput report, see System Throughput.
“Supervision” column in the Usage report – We’ve added a Supervision column to the following Usage report’s (Reporting > Usage) CSV files:
- signature_machine_transcriptions_report.csv
- checkbox_machine_transcriptions_report.csv
- supervision_transcriptions_report.csv
- machine_transcriptions_report.csv
The Supervision column indicates what the Transcription Supervision setting is for each field in the report. The Transcription Supervision settings are defined in the Layout Editor for each field in a layout variation.
For more information, see Usage Report.
Filtering by field type in the All Users Performance Summary report – We’ve added a Field Type filter to the All Users Performance Summary report (Reporting > User Performance). You can now choose to filter the report by one of the following field types:
- Text
- Checkbox
- Signature
Note that each download of the report includes only the data that meets the filter criteria.
More details can be found in All Users Performance Summary.
Settings
New
Importing and exporting system settings – To let you move system settings between multiple instances that are on the same major version, we’ve added support for importing and exporting the settings found in Administration > System Settings. You can find the import and export functionality at Administration > Import/Export. The export functionality lets you select which settings you want to export.
After making your selections, you can export your system settings to a JSON file. You can then use this JSON file to import your system settings to other instances.
More details can be found in Importing & Exporting System Settings.
Databases
Notice
Support for PostgreSQL 10.x in Hyperscience v38 – Beginning in v38, the Hyperscience application will no longer support PostgreSQL 10.x. PostgreSQL ended support for 10.x on November 10, 2022.
The following databases will be supported in v38:
- PostgreSQL 12.x, 13.x, and 14.x
- Amazon RDS for PostgreSQL
- Oracle 19c with DBMS_ALERT privileges
- Amazon RDS for Oracle
- Microsoft SQL Server (MSSQL) 2016, 2017, and 2019 with Service Broker enabled
- Amazon RDS for SQL Server
- Azure SQL Managed Instance
For more information on database requirements, see Infrastructure Requirements (Production).
Kubernetes
New
Support for Red Hat OpenShift – To enhance the deployment experience, we now support the use of Red Hat OpenShift. With this enhancement, you can now deploy the Hyperscience application on your own Red Hat OpenShift instance.
API
New
Flows endpoints – With the /api/v5/flows endpoints, you can manage your flows programmatically and create scripts to automate frequently performed tasks.
You can complete the following actions with these endpoints:
- List all flows in your instance
- Retrieve information about a specific flow
- Import a flow from a JSON or ZIP file
- Deploy or disable a flow
- Archive or restore a flow
More information about these endpoints can be found in our API documentation.
Updated
Importing and exporting models with Artifacts endpoints – We've added functionality to the /api/v5/artifacts endpoints that allows you to import and export Field Identification, Table Identification, and Classification models without logging in to the application.
Imported models do not replace live models. If the imported model matches a live model that doesn't already have a candidate model, the system saves the imported model as a candidate model. If a candidate model already exists, the import fails.
To learn more about importing and exporting models with the Artifacts endpoints, see our API documentation.