Business to business communication via PDF

The creation of the PDF aimed to make the dream of the paperless office
a reality. Developed by Adobe in the early 1990’s, the format allows
sending of text and graphics in the same document electronically. These
days a PDF can be viewed on any device, password-protected and printed
locally. Another useful feature is the ability to add forms which can
be completed and returned digitally.

PDFs are now easier than ever to produce. For example, a PDF can be
created from within Microsoft Office 2007 products or Open Office.
Various powerful free third-party PDF applications that integrate with
older Microsoft Office products are available. Web pages can be
converted to PDF using extensions and plugins that work with popular
browsers such as Chrome and Firefox.

Using tags

Another useful feature is the ability to create a tagged PDF. The tags
give meaning to the content and allow for the extraction of data.
However, it is not easy to add tags. You need additional software, and
the tags have to be added manually, which is a repetitive task.
Furthermore, if the PDF is produced by an internal system, it may not be
possible to add the tags when the PDF is generated.

These advantages mean the format is regularly used to send information
from business to business. However, as the content in the PDF has no
meaning without tags, the content usually needs to be copied and pasted
from the document into the businesses internal system. The task is
laborious and prone to error, wasting time and effort.

The solution is to send the data with meaning, usually in XML or JSON
format. The file can then be dropped into an internal system that has
suitable code to recognise and accurately extract all the information
from the file. The problem with XML and JSON is that its aimed at
programmers and developers, not the average computer user. The following
figures show how a pdf and XML file may differ for ordering a pair of
shoes:

PDF File:

Brand: Fly London

Design: Shard

Size: 39

Colour: black

XML File:



<?xml version="1.0" encoding="UTF-8"?>


<shoe_order>


<brand>Fly London</brand>


<design>Shard</design>


<size>39</size>


<colour>Black</colour> ...


</shoe_order>

It is unlikely that the average user would feel comfortable creating an
XML file and would rather continue to use PDFs, which have the benefit
of password protection should sensitive data be contained within.
Subsequently, there needs to be a way to extract data from PDFs other
than direct copying and pasting.

How SwiftCase helps

Fortunately, there are a number of systems designed to automate
extracting information from PDFs. The best ones combine different
technologies to recognise the data. These technologies include OCR tools
and word pattern recognition. Word pattern recognition is important if
you want to recognize a chunk of text that differs in length between two
different headings, for example.

Systems designed to extract information from PDFs automatically can be
used by Swiftcase to automate your business workflows. As soon as a PDF
is sent in, the relevant information can be picked-up by Swiftcase, the data extracted and
inserted into the system. No more manual entry required and an instant
response to incoming work.

SwiftCase can automate a wide range of data-import processes, helping you focus on providing an excellent service to clients.

If you’re interested in a free, no-obligation demonstration, get in touch today.