Apryse Announces Arrival of Intelligent Document Processing (IDP) to Automate Accurate Data Extraction From PDF

Thursday, February 23, 2023

Company Profile | Follow Company

Don't "Dig for Data" — Automate Accurate Data Extraction from PDF with Apryse IDP

Processing, featuring Data Extraction capability, unlocks information trapped in any PDF. Automatically convert content elements and structure to JSON or XLSX with leading accuracy and at scale.

Vancouver, BC, February 23, 2023--(T-Net)--Apryse (formerly PDFTron Systems) has announced the arrival of it's Intelligent Document Processing (IDP), a significant step forward in enabling IDP processes and efficiently extracting contents from stored documents into client's database at any scale.

Apryse IDP includes powerful PDF data extraction that recognizes and extracts any document layout along with content elements, such as tabular data and text, to structured JSON and Excel right out of the box. As a result, it gives organizations scalability and leading accuracy in PDF data extractions — it eliminates costs associated with extensive templating, rules, and data entry.

The new Apryse Intelligent Document Processing solution, including extraction, enables efficient and leading accuracy PDF content extraction without need for extensive upfront training or templates.

Try out the new IDP features in your environment (no trial key required). Visit the documentation for samples and feature details. Also, visit our page on JSON for details on the output structure and tags.

PDF data extraction works right out of the box — without training the model on every type of document used across your organization, without creating rules, or having to check for errors post-conversion.

Apryse IDP does just that for any structured or semi-structured data in PDF while offering different conversion formats for processing options. It reliably recognizes tables, accurately extracts text and tabular data, and detects and understands articles of text in a document.

The results should speak for themselves. So, visit the documentation to learn more about Apryse IDP and what it can do in your environment. There's no trial key required to get started.

Getting set up is straightforward. New Intelligent Data Extraction features are part of the IDP add-on to the Apryse Server SDK, meaning you can use your language of choice to embed the API into your application. Developers get complete control over extracted data and the workflow itself. Apryse IDP provides greater reliability, performance, and cost-effective scalability compared to external extraction services and on-demand processing.

Apryse's Intelligent Data Extraction capability looks like this:

No rules or templates — Avoids extensive upfront work associated with templating and drives accuracy and cost-effective scalability.
Handles multi-modal content — Text, tables, and forms, including fields in informal forms, such as scanned PDFs or forms specified in text without interactive form data in the file.
Layout awareness — Preserves natural reading order, logical relationships between elements, and contiguous blocks of text in JSON, for error-free extraction.
Conversion to different file formats — JSON and Excel allow flexible data consumption.
Table and cell recognition — Extracts tabular data into Excel, enabling easy analysis or further processing. Provides coordinates to tables in the JSON.

Table Detection and Tabular Data Extraction

Table Recognition detects table boundaries, rows, and columns, and challenging aspects like spanning cells that trip up some extraction tools.

A PDF Table in a SEC 10-Q Report

Our Excel output capturing a 10-Q table

The Table Data Extractor works within the recognizer workflow but is a separate function. It extracts tabular data into an Excel spreadsheet file, with one table per sheet, and to a JSON companion file.

You can use the extractor to pull out just one table element — from one PDF or many PDFs at once.

Structure Recognition — PDF to JSON

Structure Recognition refers to awareness of how the content elements in a PDF are positioned on a page and in relation to each other, rather than simply what these elements are (such as text and tables). As part of parsing the PDF, the Intelligent Data Extraction component reconstructs the formatting and layout (structure) of content elements and what they look like on screen into JSON.

The component recognizes headers, footers, paragraphs, and reading order and puts table and image coordinates into the JSON file. Along with the JSON instructions on their placement, this makes it easy to repurpose content and reflow the document, for example, if you want to republish content in another application, such as a mobile viewer.

Formatted text in our Q-10

Image of the document structure in JSON

About Apryse

Apryse, previously known as PDFTron, takes document solutions to the next level, making work better and life simpler. As a global leader in document processing technology, Apryse gives developers, enterprise customers, and small businesses the tools they need to reach their document goals faster and easier. Our product portfolio includes Apryse SDK, Fluent, iText, and XODO. Apryse technology works with all major platforms and a wide variety of unique file types.

Company Snapshot

Apryse (formerly PDFTron Systems Inc.)

Vancouver, BC (InfoTech)
80 Employees In BC (290 Total)
Founded: 1998

Apryse, previously known as PDFTron, takes document solutions to the next level, making work better and life simpler.

See Full Profile

Member Tools

See All News Releases From This Company

Follow Company

View Company Profile

Email To A Friend

Printer-Friendly Format

Print News

Other Recent Company News

1 - 3 of 20 Results

New Search


	Apryse Announces Acquisition of AI-Powered Document Toolkit Provider LEAD Technologies Vancouver March 15, 2024

	Apryse (formerly PDFTron Systems) Launches New All-in-one Document Generation Platform Vancouver May 17, 2023

	Apryse Announces Arrival of Intelligent Document Processing (IDP) to Automate Accurate Data Extraction From PDF Vancouver February 23, 2023