V36 Release Notes

36.0.22 (30 May 2024)

Languages

Fixed

Machine transcription of Korean multiline fields – We’ve fixed an issue that caused machine transcription to fail on multiline Korean fields. Examples of incorrect transcriptions and their corrected versions appear below.

  • Incorrect: 원전세배당소득 | Correct:  이자.배당소득 원천세
  • Incorrect: 이장소백당소득 | Correct: 이자.배당소득 지방소득세

Flows

Updated

​​Floating-point values for SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS – In addition to integer values, you can also enter values of type float for the SDM_BLOCKS_TASK_POLL_INTERVAL and HYPERFLOW_ENGINE_TASKS_POLL_INTERVAL_SECONDS ".env" file variables. This update gives you flexibility when customizing your submission-processing latency, particularly if low latency levels are desired.

Machine Identification

Fixed

Asterisk as custom character for splitting segments – We've resolved an issue with text-segment detection that resulted in IndexError: list index out of range during Machine Identification. The issue occurred when custom_char_for_splitting_segments was set to * in /admin.

Quality Assurance

Fixed

Logic for automatic QA sampling rates – We’ve fixed a dereferencing issue that caused silent failures for flows with specific default settings. As part of this update, the logic accurately handles dereferencing, ensuring proper handling of affected flows. The issue affected flows created in Hyperscience versions that preceded the application version.

Installation

Updated

Increased post-install timeout – We’ve increased the post-install timeout from 3 to 4 minutes, providing more time for the system to load. Users will now see the following error message if the system’s timeout is reached before the application starts: 

“The application does not appear to have started within 240 seconds, however startup times can vary due to environmental factors such as proxies and firewalls. If you are unable to access the login screen after an additional 2-3 minutes, please reach out to Hyperscience Support for additional troubleshooting assistance.”

36.0.21 (14 Feb 2024)

Submissions

Updated

Support for EML files and their attachments – You can now extract data from EML files and their attachments. When an EML file is ingested, the system creates a PDF file from the email's body and processes each of the file's attachments as a separate document in the submission.  

Submission Processing

Fixed

Logging of errors during submission pre-processing – We've fixed an issue that allowed personally identifiable information to be stored in the error logs created during submission preprocessing.

Training Data Management

Fixed

Saving annotations before viewing another document – We've resolved a race condition that sometimes caused annotations to be saved to the next-viewed document when a keyer clicked the Previous or Next arrow buttons as annotations were being saved. 

Flows

Fixed

Viewing restricted flows without the required permissions – We've fixed an issue that caused the page to repeatedly reload when a user attempted to view a flow that they did not have the required permissions to access. As part of this update, an error message appears in these situations, notifying the user that they are not authorized to view the flow.

Reporting

Fixed

Calculating the time taken to complete Flexible Extraction and Custom Supervision tasks – We've fixed an issue that caused completion-time calculations for Flexible Extraction and Custom Supervision tasks to depend on the contents of the tasks (e.g., fields, tables, decisions).

OpenID Connect (OIDC)

New

Redirecting users during ID token renewal – By default, when renewing OIDC ID tokens, the application no longer redirects users to the identity provider’s token endpoint. To allow this step to be bypassed, we have introduced the HS_OIDC_RENEW_ID_TOKEN_WITH_REFRESH_TOKEN “.env” file variable. When this variable is set to true, the renewal transaction occurs without redirecting users out of the application, enhancing the overall user experience. See OpenID Connect (OIDC) for more details.

36.0.20 (23 Nov 2023)

Languages

Fixed

Processing text in Turkish submissions – We've resolved a string-parsing issue that caused Turkish submissions to halt if they contained a colon (":").

Flows

Updated

Default timeout for blocks – We've increased the default timeout for block requests from 60 seconds to 180 seconds.

Training Data Management

Fixed

Columns not present in current layout – We’ve fixed an issue where Table Identification model training failed due to the presence of annotations for columns that were not in the latest live layout version.

Task Queue 

Fixed

Effects of changing filters – We’ve fixed an issue where changing the filters in the Task Queue would only apply the changes and deselect any filters originally applied. This issue led to incorrect filtering when choosing both a date range and a filter from the Filters list.

OpenTelemetry

Updated

Emitting metrics after time-outs – The system now emits OpenTelemetry metrics for tasks after they exhaust their allotted retry attempts and time out.

36.0.19 (9 Nov 2023)

Input Connections

Updated

Number of subfolders scanned by Email Listener for Microsoft 365 Outlook – We've updated the number of subfolders scanned by the Email Listener in Microsoft 365 Outlook accounts from 10 to 100.

36.0.18 (26 Oct 2023)

Submissions

Fixed

Responsiveness when viewing documents – We’ve fixed a query-plan  issue in deployments using MSSQL databases that caused delays when opening the document viewer in some instances.

Reporting

Fixed

Accounting for differences between browser and server times – We've updated our task-completion-time calculations to account for the difference between the server's timestamp and the browser's timestamp.

36.0.17 (13 Oct 2023)

Table Identification

Fixed

Completing submissions containing nested tables with orphan rows – We’ve fixed an issue that allowed submissions with a “Confirm that all pages are reviewed” warning message to be completed with orphan rows, leading to those submissions halting.  

Transcription Supervision

Fixed

ResizeObserver loop exceeded error in Chrome on Macs – We've fixed an issue that caused ResizeObserver loop exceeded errors to occur during Transcription Supervision in Mac Chrome browsers in some instances. 

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated:

  • sentry-sdk to 1.14.0,
  • scipy to 1.10.0, and
  • mpmath to 1.3.0.

36.0.16 (14 Sept 2023)

Languages

Updated

Improvements to Korean-English translations – We've optimized the transcription model used for Korean-English documents.

Data Types

New

Enhanced Korean Freeform – We've added an Enhanced Korean Freeform data type that leverages a Korean large language model for transcriptions. It increases transcription accuracy on Korean documents, particularly on degraded images.

Layouts

Fixed

Sorting layouts by "Last Updated" – We've fixed an issue that caused the table on the Layouts page (Library > Layouts) to become empty when it was sorted by the contents of the Last Updated column.

36.0.15 (31 Aug 2023)

Data Types

New

Capitalized Names – We've added a Capitalized Names data type that expects names that have the first letter of each name (e.g., first name and last name) capitalized.

Training

Fixed

Field Identification training and bounding-box coordinates – We've fixed an issue that caused Field Identification training to fail if a document's bounding boxes had certain coordinates.

Releases

Fixed

Loading of Releases page – We've optimized the loading of the Releases page (Library > Releases), resolving an issue that prevented the page from being displayed in some instances.

Flows

Fixed

Timeouts for flow-block polling requests – We've added timeouts for flow blocks' polling requests, preventing failed requests from continuing indefinitely.

UiPath Notifier

Fixed

Default authentication method – We've fixed an issue that caused OAuth to be the default authentication method for UiPath Notifier connections. The issue caused flows that used Basic Authentication for these connections to fail.

Reporting

Updated

Definition of dt_started – We've changed dt_started from the time the task was first assigned to the time when the task was opened. This update creates a more accurate measurement of the time taken to complete tasks.

Security

Fixed

Updating Django – To address security vulnerabilities, we've updated Django to 3.2.20.

36.0.14 (10 Aug 2023)

Submission Processing

Fixed

Duplicate submission-processing tasks – We've fixed a race condition in our task-synchronization manager that sometimes caused internal tasks to be executed more than once for a submission, resulting in data corruption.

Flows

Fixed

Subprocesses from pagination – Previously, pagination tasks sometimes created subprocesses that wouldn't time out if they couldn't be completed. To resolve this issue, we've added timeouts to these subprocesses.

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated certifi to 2023.7.22 and pyJWT to 2.7.0.

36.0.13 (27 Jul 2023)

Languages

Fixed 

Text segmentation for non-Latin languages – We’ve fixed the language parameters for segmentation for languages outside of the Latin language family. We’ll use the language of the submissions during segmentation rather than linking them to the Latin language family by default. For example, we won’t use Latin segmentation if you upload Korean or Korean-English documents.

Classification

Fixed

Loading pre-computed Structured Classification data for releases – We've reduced the amount of time required to load pre-computed Classification data for Structured documents when a new release is deployed. This update increases the efficiency of submission processing when multiple versions of a Structured layout are included in a release.

Transcription Supervision

Fixed

ResizeObserver loop exceeded error in Chrome on Macs – We've fixed an issue that caused ResizeObserver loop exceeded errors to occur during Transcription Supervision in Mac Chrome browsers in some instances.

Security

Fixed

Updating paddlepaddle – To address security vulnerabilities, we've updated paddlepaddle to 2.4.2.

36.0.12 (20 Jul 2023)

Trainer 

Fixed

Displayed task statuses for completed tasks – We’ve fixed a UI issue that displayed the trainer’s task status as “Running” after completion on the Trainer page (Administration > Trainer).

Document Classification

Updated

Displaying Submission ID – We’ve added the Submission ID to the top of the page for easier traceability of your uploads (“Document Classification: Submission <submission_id>”).

36.0.11 (10 Jul 2023)

Machine Identification

Fixed

Consistency in field-location predictions – We’ve fixed an issue with non-deterministic behavior during field grouping, which caused different predictions for the same documents. 

Databases

Fixed

Notifications and deadlocks – We've resolved an issue that caused database deadlocks to occur if the user and system made changes to notifications at the same time.

36.0.10 (3 Jul 2023)

Manual Transcription

Fixed

Normalization of Date table columns with column-specific languages – We've resolved an issue related to normalization after Manual Transcription. Date table columns with a different language from the one assigned to the layout are now normalized correctly. For example:

  • Before: MM/DD/YYYY was normalized as YYYY/MM/DD.
  • After: MM/DD/YYYY is normalized as MM/DD/YYYY.

Security

Fixed

HS_TLS_VERIFY_ENABLED and requests – A recent upgrade to requests caused SSL certificate validation errors to occur even when HS_TLS_VERIFY_ENABLED was set to false. To resolve this issue, we've downgraded requests to 2.27.1.

36.0.9 (23 Jun 2023)

User Experience 

Updated

Maximum number of files per upload – We've increased the default maximum number of files per upload from 100 to 1000. This value can be customized with the DATA_UPLOAD_MAX_NUMBER_FILES ".env" file variable. The maximum applies to both training-data pages for models and submission pages.

Flows

Fixed

Deploying flows with validation errors – We’ve fixed an issue that allowed users to deploy flows that contained validation errors.

Manual Transcription

Fixed

Normalization of Date fields with field-specific languages – We've resolved an issue related to normalization after Manual Transcription. Date fields with a different language from the one assigned to the layout are now normalized correctly. For example:

  • Before: MM/DD/YYYY was normalized as YYYY/MM/DD.
  • After: MM/DD/YYYY is normalized as MM/DD/YYYY.

File Storage

Fixed

Sanitizing filename headers – We've fixed a data-sanitization issue in HTTP filename headers that prevented files from being downloaded to the file store.

36.0.8 (17 June 2023)

Machine Classification

Fixed

Classifying Structured documents with extreme aspect ratios – We've resolved an issue that caused out-of-memory errors to occur when the machine attempted to classify structured documents with extreme aspect ratios (e.g., 600 x 2 pixels). As part of this update, the system pre-calculates classification data for releases containing Structured layouts. These calculations may increase the time required to process the release the first time it is used. 

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated:

  • Pillow to 9.5.0 and
  • requests to 2.31.0.

36.0.7 (9 June 2023)

Transcription Supervision

Fixed

Reviewing blank table cells in documents with low-confidence transcriptions – We've resolved an issue that caused blank table cells to be sent to Transcription Supervision when the system transcribed text elsewhere in the document with low confidence. 

As part of this update, we've changed the name of the Send Blank Cells to Manual Transcription setting in the Manual Transcription Block to Create Manual Transcription Task for Tables with Blank Cells.

Output Blocks

Fixed

Retrieving OAuth tokens for HTTP Notifier Output Block health checks – We've fixed an issue that caused the system to request a new OAuth token for each automatic health check performed by HTTP Notifier Output Blocks.

Security

Fixed

Addressing security vulnerabilities – To ensure security, we've updated:

  • sqlparse to 0.4.4 and
  • the version of Golang used to compile Filebeat to 1.20.4.

36.0.6 (17 May 2023)

Languages

Fixed

Recognition of text segments by the Korean and English language model – We've resolved a text-processing issue that sometimes prevented the machine from recognizing text segments in their entirety in documents whose language was "Korean and English." The same issue also prevented some text segments that contained only one digit from being detected.

Keyer Data Management

Fixed

Deleting training documents that are being processed – We've fixed an issue that caused errors to occur when training documents that were being processed were deleted.

Machine Identification 

Fixed

Target accuracy when "Manual Identification Supervision" is disabled – We've fixed an issue that prevented target accuracy values from being set to 0 when the Manual Identification Supervision flow setting was disabled. This issue caused thresholding to be applied to machine-only identification, which affected the machine's predictions. 

Comparing a layout's columns to columns in training data – We've resolved an issue that prevented the Machine Identification Block from comparing the columns in the latest layout version to the columns that the layout's model was trained on. This issue caused documents with columns that were not included in the model's training to be automatically sent to Identification Supervision, reducing automation.

File Storage

Updated

Enhancements to directory structure – We've updated the directory structure in file stores for faster data retrieval. Files are now stored in a directory with six levels to minimize the number of files stored in any single directory, preventing performance issues that may occur in high-volume instances.

As part of this update, we've added the following as valid values of the FORMS_STORAGE_MODE “.env” file variable:

  • FILE_EX
  • S3_EX
  • AZURE_BB_EX

These values replace FILE, S3, and AZURE_BB, respectively, as valid values of FORMS_STORAGE_MODE. Instances with these values will have their file stores migrated to the new structure upon upgrading to v37. If you do not want your file store to use the new structure, set FORMS_STORAGE_MODE to FILE_LEGACY, S3_LEGACY, or AZURE_BB_LEGACY.

36.0.5 (9 May 2023)

Layouts

Updated

"Not in <layout language>" option for table columns – The Not in <layout language> option that has been available for fields can now also be applied to table columns in Semi-structured layouts. This option allows you to assign languages on a per-column basis, giving keyers the ability to enter transcriptions that are not in the language assigned to the document's layout.

To learn how to use the Not in <layout language> option, see Creating Semi-structured Layouts.

Training

Fixed 

list index out of range in get_ground_truth_pages error when training Field Identification models – We've fixed a page-indexing issue that caused an IndexError: list index out of range in get_ground_truth_pages error to occur when training Field Identification models.

36.0.4 (4 May 2023)

Flow Blocks

Updated

Generating checksums for individual blocks We now generate checksums of each block's command file, which are used to identify the blocks in the database and prevent duplicate blocks from being uploaded.

36.0.3 (26 Apr 2023)

Releases

Fixed

Exporting locked releases We've resolved an issue that prevented users from exporting locked releases. Doing so resulted in a Could not export archived layout release with UUID: <release_uuid> error message.

Keyer Data Management

Fixed

Tooltips for "Previous / Next document in list." buttons for documents sorted in descending order We've fixed an issue that caused the "Sorted by:" portion of the Previous document in list. and Next document in list. buttons' tooltips on the Annotations page to be incomplete when documents were sorted in descending order. For example, if the documents were sorted by Pages in descending order in the Training Documents table, the tooltip read "Sorted by:" instead of "Sorted by: Pages".

Editing annotations for documents that are being deleted We've resolved an issue that allowed users to edit annotations for documents that were being deleted (e.g., by other users, as part of PII data deletion). 

SaaS

Fixed

/admin access for users with permitted email-address domains in deployments without AWS ALB We've fixed an issue that prevented users with permitted email-address domains from accessing /admin in deployments without AWS ALB authentication. 

Security

Fixed

Addressing security vulnerabilities To ensure security, we've updated:

  • Pillow to 9.3.0,
  • json5 to 2.2.3,
  • esplint to 0.10.1,
  • mocha to 10.2.0, and
  • webpack to 5.77.0.

36.0.2 (6 Apr 2023)

Submission Processing

Fixed

Splitting a page's text into segments when "0" is a segment's only character We've fixed an issue that prevented a page's text from being split correctly into text segments when segments contained only the "0" character. This issue caused processing delays and excessive memory usage.

Machine Classification

Fixed

Storing pre-calculations for classifying Structured documents – We've resolved an issue that caused invalid memory alloc request errors when the system attempted to store pre-calculated values for the release's Structured layout variations in the database. The issue affected instances with PostgreSQL databases. 

Classification Supervision

Fixed

User interface for Classification Supervision tasks – We've made the following fixes to the Classification Supervision user interface:

  • We've widened the right-hand panel, enlarging the image of the page being categorized.
  • We've fixed an issue that caused the screen to flicker each time a keyer clicked on a thumbnail in the left-hand panel.
  • We've resolved an issue that caused the right-hand panel to be hidden when a keyer clicked on a page group in the middle panel.

Reporting 

Fixed

Counting time spent on Classification Supervision tasks where pages are classified as "Other" – We've fixed an issue that prevented time spent on Classification Supervision tasks from being included in Document Classification Supervision Time Spent (Seconds) when keyers classified all pages as "Other" during the tasks. The data for Document Classification Supervision Time Spent (Seconds) appears in the KeyerPerformance.csv file in the Keyer Projection Report.

Security

Fixed

Updating com.fasterxml.jackson.core:jackson-databind – To address security vulnerabilities, we've updated com.fasterxml.jackson.core:jackson-databind to 2.14.2.

SaaS

Fixed

"API Access" tab in the Users section for deployments without AWS ALB – We've fixed an issue that caused the API Access tab to appear in the Users section of the application in deployments that did not use AWS ALB authentication.

API

Updated

Restricting access to /api/v5/audit_logs – We've revoked access to the /api/v5/audit_logs endpoint from all users except System Admins.

36.0.1 (17 Mar 2023)

Submissions

Fixed

Retrieving blank thumbnail images of pages – We've fixed an issue that prevented blank thumbnail images from being retrieved when no thumbnail images existed for a submission. This issue prevented the application from being initialized in some situations.

Machine Classification

Fixed

Classifying Structured documents written in Japanese or Simplified Chinese – We've resolved an issue that caused submissions to halt at the Machine Classification step if they contained Structured documents written in Japanese or Simplified Chinese.

Classification Supervision

Fixed

"Perform Tasks" link in Submissions table – We've fixed an issue that prevented the Perform Tasks link from appearing in the Submissions table for submissions with Classification Supervision tasks. The issue affected submissions whose first page was classified by the machine. 

Keyer Data Management

Fixed

Duplicate pages after training – We've fixed an issue that caused pages to be duplicated after their documents were used for training. The issue affected documents that contained at least one empty page.

"Latest version not live" after uploading releases and training data – We've resolved a timestamping issue that caused a "Latest version is not live" warning message to appear after uploading a release and the training data for its models.

Artifacts

Updated

Logging of artifact-export events – We've changed the severity of the following events from exceptions to warnings in the logs:

  • Missing artifacts list
  • Missing storage type
  • Missing destination

Permissions

Fixed

Logging in without assigned user groups or permissions – Previously, if a user was not assigned to a user group in an identity provider (IdP), or if they were assigned to an IdP user group that did not have any permissions, they could log in to the application, but they could not log out. There was also no messaging to let the user know what they needed to do to resolve the issue. A fix for these issues is included in v36.0.1.

Upgrades

Fixed

Indexing training-data records – We've fixed an indexing issue that caused duplicates of training-data records to be found during the upgrade process, which prevented instances from being updated to v36.

SaaS

Fixed

Authentication and SaaS features when AWS ALB is not used – We've resolved an issue that prevented users from authenticating in some situations when a method other than AWS ALB was used. This issue also caused some SaaS-specific features to be disabled in affected instances.

Application recovery after database failovers – We've fixed an issue that prevented the application from recovering quickly after database failovers. The issue sometimes caused the application to be unresponsive for long periods of time.

36.0.0 (17 Mar 2023)

There is an issue in v36.0.0 that prevents version information from appearing in the UI. For this reason, we recommend using v36.0.1 rather than v36.0.0.

Languages

New

New languages We've added support for submissions written in the following languages:

  • Bulgarian
  • Czech
  • Estonian
  • Hebrew
  • Kazakh
  • Latvian
  • Lithuanian
  • Russian
  • Slovak
  • Thai
  • Turkish

We support automation on Structured and Semi-structured submissions in these languages, regardless of whether they contain handwritten or printed data.

To learn more about the languages we support, see Supported Languages.

Updated

Improvements to the Korean language model We've enhanced the system's ability to accurately recognize and transcribe Korean words in fields with Generic Text, Address, Company Name, and Name data types.

Flows

Updated

Improvements to flow management In an effort to provide more information about flows and the potential results of certain actions, we've made the following updates to flow-management tasks:

  • Option to deploy subflows when deploying flows When you deploy a flow, a confirmation dialog box appears, which includes an option to deploy all connected subflows.
  • Indicators that distinguish subflows from each other Each subflow shown in Flow Studio has its own highlight color, making it easier to determine whether subflows are identical or different. We've also allowed more characters in a subflow's title to be shown in Flow Studio.

    FlowManagementSubflowColors.png

  • Applying changes to a specific instance of a subflow You can now choose to save changes to a specific instance of a subflow without impacting the other instances of that subflow in the main flow or in other flows.
  • Visual explanation of options when saving changes to a subflow We've added diagrams explaining the options available when saving changes to a subflow, which illustrate where the changes will be applied.

    FlowManagementSubflowOptions.png

More details on these updates can be found in Connecting Flow Blocks to Other Flows.

Defining retry policies for flow blocks You can now define retry policies for flow blocks at the system, flow, or block level. Defining these policies gives you more control over the execution of flows and may prevent submissions from halting when temporary failures occur.

For each policy you create, you can specify the total number of retry attempts that the applicable blocks should have after their initial failure, along with the amount of time that should pass between attempts. For example, you can have a retry policy in which the system retries the block up to three times, with increased time between each attempt.

 

RetryFailedBlocks.png

You can define system-level policies in /admin/hyperflow/wfeconfig/. Flow- and block-specific policies can be defined in the flow settings and block settings, respectively.

For more information, see Defining Automatic Block-Retry Policies

Flow Blocks

New

Custom Entity Detection Block (beta) – This block provides complementary functionality to the capabilities of the Named Entity Recognition Block. The Custom Entity Detection Block can automatically identify a variety of entities including date, SSN, address, policy number, loan number, credit card number, customer ID, account number, employee ID, employer ID, passport number, driver license number, case number, phone number, application number, routing number, and “other.” In general, the block can be configured to locate and identify:

  • single words, and
  • word patterns that can be described with a combination of regular expressions and keywords.

You need to use Custom Entity Detection Blocks in conjunction with Full Page Transcription Blocks. For example, you can build a redaction flow that processes documents through full-page transcription, then detects all custom entities that are defined in the Custom Entity Detection Block, and at the end uses a Custom Code Block to place black boxes over the detected entities. 

To learn more, see the "Custom Entity Detection Block (Beta)" section of Flow Blocks.

Flow Executions

New

Flow Executions page – To make it easier for users outside of the System Admin permission group to access flow-execution information, we’ve added a Flow Executions page to the Flows section of the application. Using the filters on this page, users can view a list of failed flow executions, which cause halted submissions, and retry the halted submissions that meet the filter’s criteria. Clicking the ID of a flow execution opens its Flow Run page, which contains a diagram of the flow and information about the progress of the flow’s execution.

Users need the View Flow Executions permission to access this page. By default, this permission is given to users in the System Admin and Business Admin permission groups.

FlowExecutionsPageFilters.png

 

For more information, see Flow Executions.

Updated

Flow Run enhancements To provide more troubleshooting information about flows and their blocks, we've made the following improvements to the Flow Run page (formerly known as the Flow Execution page):

  • New "Code" tab for Custom Code and Python Code Blocks When you click on a Custom Code Block or a Python Code Block in a flow-execution diagram, a Code tab appears in the bottom panel, containing the Python code for that block.

    FlowExecutionCodeTab.png

  • Viewing flow inputs, outputs, and errors You can now view flow-level inputs, outputs, and errors in the Flow Input, Flow Output, and Flow Runtime Errors tabs, respectively, on the Flow Run page.

FlowExecutionFlowRuntimeErrors.png

 

To learn more about these updates, see Testing and Debugging Flows.

Flows SDK

Updated

Configuring Custom Code Blocks to accept specific file types as input – With the addition of the File input type within the Parameter class in v36, you can now configure Custom Code Blocks to accept any file type as input, including CSV and JSON files. Users can import and update files of the type expected by the Custom Code Block via the Flows settings sidebar in Hyperscience.

Submissions Table

New

Downloading submission-activity logs – In v36, we’ve added support for downloading submission-activity logs. The submission-activity logs provide you with information about how your submissions progressed through their flows. 

To download submission-activity logs, go to the Submissions table, click the menu ( ThreeDotsMenu.png ), and then click Download Submission Activity Logs.

DownloadSubmissionActivityLogs.png

The downloaded submission-activity file is in CSV format. 

For more details, see Navigating the Submissions Table.

Classification 

Updated

Text Classification improvements In v34, we introduced a "preview" version of the Text Classification feature. In v36, we've made updates to the application that streamline the use of Text Classification and make the model-management experience similar to that of other models:

  • Text Classification models are now included on the Models page (Library > Models).
  • A Model Details page is available for each Text Classification model, which shows the model's projected automation, training data, and information on each set of samples used for training.
  • From the Model Details page, you can run training for a model, monitor the status of the training, and deploy the model after training is complete.
  • You can also import and export Text Classification models, as well as the models themselves.

To learn more about Text Classification in v36, see Text Classification.

Layout Variation Alerting With Layout Variation Alerting, users are notified if pages marked as “No Layout Found” are matched to existing layout variations that are not included in the flow's release. When Layout Variation Alerting is enabled, the system attempts to find layout variations for pages marked as “No Layout Found” on a nightly basis. 

Note that Layout Variation Alerting is not available in SaaS deployments of Hyperscience. Also, we do not recommend enabling it in instances that process more than a million pages per day.

To learn more about Layout Variation Alerting and how to enable it, see Layout Variation Alerting.

Field Identification

New

Extracting data points from unstructured documents – Unstructured extraction allows you to extract data points from long documents with unstructured text. To leverage the automation capabilities of unstructured extraction, you need to select the new field ID model called UNSTRUCTURED_EXTRACTION under Flex Engine Type for Training at /admin/form_extraction/template/ for a given layout before training.

With the introduction of this new ID model, you can achieve automation based on the threshold you specify in the Field Identification Target Accuracy flow setting. 

To upload and annotate unstructured documents, you can use the Keyer Data Management functionalities in the Model Details page.

Note that unstructured extraction is available in SaaS deployments only.

To learn more about the new UNSTRUCTURED_EXTRACTION model, see Training a New Field Identification Model.

Field Anomaly Detection As your keyers identify field values in documents, they may sometimes select different instances of the same value across documents. These inconsistencies lead to decreased performance in Field Identification models over time. The Field Anomaly Detection feature analyzes training data before it is used in model training and flags potential mistakes and inconsistencies in field identification. You can then review these annotations and verify or edit them before they are used in training.

Field Anomaly Detection runs as part of Training Data Analysis, which you can run from the Model Details page.

FieldAnomalyModelDetails.png

 

The system highlights documents that contain potential anomalies, and you can review each document's annotations by clicking the Edit annotations link for that document.

 

FieldAnomalyReviewAnnotations.png

For more information about Field Anomaly Detection, see Detecting and Correcting Anomalies in Field Annotations

Enhanced signature detection With the updates made in v36, the system can better detect signature fields, improving automation in the processing of signatures. 

Transcription

Updated

Separate thresholds for the Automatic QA Sample Rate flow setting – To give you more flexibility when using automatic QA sampling, you now have separate thresholds for the Automatic QA Sample Rate flow setting. We’ve separated the thresholds in the following way:

  • Structured Text Transcription QA Sample Rate
  • Structured Checkbox Transcription QA Sample Rate
  • Structured Signature Transcription QA Sample Rate
  • Semi-structured Transcription QA Sample Rate

Note that the Automatic QA Sample Rate feature is optional, and you can still manually set QA sample rates. 

AutomaticQASampleRateList.png

To learn more, see Flow Settings.

Improved, faster transcription of PDFs – Previously, if a submission contained a PDF file, the system would convert each of the file's pages into an image before extracting data from the file. In v36, you can choose to extract data from the PDF directly rather than from images of its pages. This update improves machine transcription in PDFs and increases the speed of transcription.

To enable this feature, select the Faster PDF Transcription option in the Machine Classification Block's settings.

FasterPDFTranscription.png

Note that you cannot have both Image Correction and Faster PDF Transcription enabled in the same Machine Classification Block, and PDFs must be oriented correctly in order for the Faster PDF Transcription feature to perform as intended.

For more information on the Faster PDF Transcription option, see the "Machine Classification" section of Flow Blocks.

Optimizations for full-page transcription – We've enhanced full-page transcription to make it faster and more accurate, particularly when processing long lines of text.

Transcription automation for fields with multiple bounding boxes – We've added support for transcription automation for fields identified with multiple bounding boxes in Semi-structured layouts.

Custom Supervision

New

Decision dependencies – The addition of decision dependencies for Custom Supervision tasks provides you with more flexibility for configuring available options in decision drop-down menus. In v36, Custom Supervision tasks support both decision dependencies within a single document and decision dependencies across multiple documents. 

Configuring decision dependencies within a single document allows you to present different options in decision drop-down menus based on user input. For example, you can configure Custom Decision 1 with two possible answers. Based on the selected answer for Custom Decision 1, you will receive different possible answers for Custom Decision 2

CustomSupervisionDecisionDependencies.gif

The newly-added support for decision dependencies across multiple documents also allows you to present different options in decision drop-down menus based on user input. The difference here is that your answers in one of the documents affect the possible answers in other documents. For example, you can configure Custom Decision 1 in Document 1 with two possible answers. Based on the selected answer for Custom Decision 1, you will receive different possible answers for Custom Decision 2 in Document 2

Note that you can configure decision dependencies only for documents and cases.

Mandatory decisions – We’ve added support for mandatory decisions in Custom Supervision tasks. This allows you to mark important decisions critical for downstream processing and data quality. These decisions will need to have a value assigned before users can move on to the next Supervision task.

CustomSupervisionMandatoryDecision.png

To learn more about these updates, see Custom Supervision.

Keyer Data Management

New

Changing training statuses of multiple documents simultaneously – With the introduction of the Edit training status option in v36, you can now edit the training statuses of multiple documents in bulk. To take advantage of this functionality, select documents from the Training Documents table on a model's Model Details page, and then click the Edit training status option that is located in the Actions drop-down menu. 

You can only change the statuses of annotated documents. If any of the selected documents are not annotated, a warning message will appear in the Edit Training Status dialog box. 

KDMBulkEditTrainingStatus.png

Annotation suggestions for fields with multiple bounding boxes – To expand the capabilities of the Guided Data Labeling feature, the annotation suggestions now provide you with predictions about where all bounding boxes of a field might be located.

GuidedDataLabelingMBB.png

For more information on these updates, see Keyer Data Management.

Reporting

Updated

Hourly breakdown of System Throughput report – We’ve added support for downloading hourly breakdowns of the System Throughput report (Reporting > Overview).

Note that hourly data is available only from the date v36 begins running in your instance. To manage database size, hourly data accumulates for up to 30 days. After 30 days, the first day’s data is deleted, and so on as each day passes. 

DownloadCSVHourlyBreakdown.png

To learn more about the System Throughput report, see System Throughput.

“Supervision” column in the Usage report – We’ve added a Supervision column to the following Usage report’s (Reporting > Usage) CSV files:

  • signature_machine_transcriptions_report.csv
  • checkbox_machine_transcriptions_report.csv
  • supervision_transcriptions_report.csv
  • machine_transcriptions_report.csv

The Supervision column indicates what the Transcription Supervision setting is for each field in the report. The Transcription Supervision settings are defined in the Layout Editor for each field in a layout variation.

For more information, see Usage Report.

Filtering by field type in the All Users Performance Summary report – We’ve added a Field Type filter to the All Users Performance Summary report (Reporting > User Performance). You can now choose to filter the report by one of the following field types:

  • Text
  • Checkbox
  • Signature

Note that each download of the report includes only the data that meets the filter criteria.

AllUsersPerformanceSummaryAllFieldTypes.png

More details can be found in All Users Performance Summary.

Settings

New

Importing and exporting system settings – To let you move system settings between multiple instances that are on the same major version, we’ve added support for importing and exporting the settings found in Administration > System Settings. You can find the import and export functionality at Administration > Import/Export. The export functionality lets you select which settings you want to export. 

ExportSettingsSelectSettings.png

After making your selections, you can export your system settings to a JSON file. You can then use this JSON file to import your system settings to other instances. 

ImportSystemSettingsDialog.png

More details can be found in Importing & Exporting System Settings.

Databases

Notice

Support for PostgreSQL 10.x in Hyperscience v38 – Beginning in v38, the Hyperscience application will no longer support PostgreSQL 10.x. PostgreSQL ended support for 10.x on November 10, 2022.

The following databases will be supported in v38:

  • PostgreSQL 12.x, 13.x, and 14.x
  • Amazon RDS for PostgreSQL
  • Oracle 19c with DBMS_ALERT privileges
  • Amazon RDS for Oracle
  • Microsoft SQL Server (MSSQL) 2016, 2017, and 2019 with Service Broker enabled
  • Amazon RDS for SQL Server
  • Azure SQL Managed Instance

For more information on database requirements, see Infrastructure Requirements (Production).

Kubernetes

New

Support for Red Hat OpenShift – To enhance the deployment experience, we now support the use of Red Hat OpenShift. With this enhancement, you can now deploy the Hyperscience application on your own Red Hat OpenShift instance.

API

New

Flows endpoints With the /api/v5/flows endpoints, you can manage your flows programmatically and create scripts to automate frequently performed tasks.

You can complete the following actions with these endpoints:

  • List all flows in your instance
  • Retrieve information about a specific flow
  • Import a flow from a JSON or ZIP file
  • Deploy or disable a flow
  • Archive or restore a flow

More information about these endpoints can be found in our API documentation

Updated

Importing and exporting models with Artifacts endpoints We've added functionality to the /api/v5/artifacts endpoints that allows you to import and export Field Identification, Table Identification, and Classification models without logging in to the application.

Imported models do not replace live models. If the imported model matches a live model that doesn't already have a candidate model, the system saves the imported model as a candidate model. If a candidate model already exists, the import fails.

To learn more about importing and exporting models with the Artifacts endpoints, see our API documentation

Was this article helpful?
0 out of 0 found this helpful