top of page

Automating PDF Document Processing

IoT | MES | SCADA | InDriver | Industry40 | Automation | SmartFactory | DataIntegration

Automating PDF Document Processing

Modern organizations handle thousands of documents — invoices, batch records, production reports, etc. 

These PDFs often contain critical data that must be analyzed, archived, and visualized. But manual processing is slow, error-prone, and hard to scale.

💡 InDriver changes that.

With InDriver’s scripting engine (JavaScript-based) and its powerful PDF & File APIs, you can:

✅ Automatically detect new files using the built-in FileWatcher
✅ Extract data from text, tables, headers, and fields
✅ Convert values to structured formats (e.g. JSON)
✅ Archive results directly in an SQL database
✅ Analyze trends, flag anomalies, and create dashboards in Grafana, Metabase, Looker, or Power BI

🔍 A simple example: PDF parsing with RegExp

PdfBatch.png

Let’s say you receive batch record PDFs with values like:

Batch ID: ABC-1234  
Produced: 2024-12-15  
Operator: John Doe  
Temperature: 72.5 °C

Here’s how you can extract this data and log to SQL in just a few lines.

InDriver.import("PdfApi");

PdfApi.setCodec("ISO 8859-2");

let content = PdfApi.pageText(0); // Load the first page 


let data = {
  batchId: content.match(/Batch ID:\s*(\S+)/)?.[1],
  date: content.match(/Produced:\s*([\d\-]+)/)?.[1],
  operator: content.match(/Operator:\s*(.+)/)?.[1],
  tempC: parseFloat(content.match(/Temperature:\s*([\d.]+)/)?.[1])
};
// JSON

 

InDriver.sqlExecute( "azureserver", "insert into public.batch_records (source, ts, data) \ values ('Machine1','" + new Date().toISOString() + "',$$" + JSON.stringify(data)+ "$$);" );

That’s it. In just one small script:

  • The file is read using PDFApi

  • Values are extracted via RegExp

  • Parsed data is sent to SQL

 

The script can be fully automated using FileWatcher from FileAPI, which triggers the job every time a new file appears in a folder — no human is needed.

📊 This approach works for:

  • Production reports

  • Lab records

  • Invoices

  • Shipping docs

  • Pharmaceutical batch data
    ...and more.

🚀 Whether you're processing 10 or 10,000 documents daily, InDriver gives you full control, speed, and flexibility.

bottom of page