Output Options

CDN

S3 Signed Url

S3 Destination

FTP Destination

Azure Destination

IAM Users

Other Service Signed Urls

Getting Started

Blitline Job

Welcome

Job Options

List of Available Functions

Examples List

Functions

Job Response

Polling and Postbacks

Service Limits

Development Recommendations

Gotchas

Advanced

Pipelines

Fonts

Smart Image

Static IPs

Formats

Colorspace

Color Extraction

PDF

Image Optimization

Metadata

Signed Jobs

Subimage

Special (Non-Image) Processing

Zipping

Trancoding Video Presets

Apache Tika

AWS Rekognition/Facial Recognition

Animated GIFs

Building Gifs or Videos from Images

Find Image on the Internet

Video Keyframes

Video Transcoding

Screenshots of Websites

IM Scripts

Vector Processing

Apache Tika

Updated 9 months ago by Blitline Support

APACHE TIKA

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Blitline supports information retrieval from documents such as PDF, and XLS. Not only can Blitline rasterize documents into an image, you can now retrive the data stored within those documents with Blitline. This allows you to retrive the text of various documents (like PDF, Word or EPUB) along with the thumbnails.

A common use-case for this word be to get the text from PDF documents while thumbnailing them and then push that text and metadata into an Elasticsearch system for indexing.

HOW TO USE IT:

Just add get_tika : true option to your root JSON.


          {
              "application_id":"YOUR_APP_ID",
              "src":"https://s3.amazonaws.com/blitdoc/docx/Contoso.xlsx",
              "get_tika" : "true",
              "v" : 1.22,
              "functions":[
                  {
                      "name":"crop",
                      "params":{
                          "gravity": "NorthGravity",
                          "width":100
                      },
                      "save":{
                          "image_identifier":"MY_CLIENT_ID"
                        }
                  }
              ]
          }

See an example here:

Example: Tika Example

How did we do?