Output Options

CDN

S3 Signed Url

S3 Destination

FTP Destination

Azure Destination

IAM Users

Other Service Signed Urls

Getting Started

Blitline Job

Welcome

Job Options

List of Available Functions

Examples List

Functions

Job Response

Polling and Postbacks

Service Limits

Development Recommendations

Gotchas

Advanced

Pipelines

Fonts

Smart Image

Static IPs

Formats

Colorspace

Color Extraction

PDF

Image Optimization

Metadata

Signed Jobs

Subimage

Special (Non-Image) Processing

Zipping

Trancoding Video Presets

Apache Tika

AWS Rekognition/Facial Recognition

Animated GIFs

Building Gifs or Videos from Images

Find Image on the Internet

Video Keyframes

Video Transcoding

Screenshots of Websites

IM Scripts

Vector Processing

PDF

Updated 11 months ago by Blitline Support

PDFS AS IMAGE

By default, Blitline will automatically try to convert a source PDF into 1 large image. You do not need to do anything special to make this happen. 

                {
                    "application_id": "YOUR_APP_ID",
                    "src": "https://s3.amazonaws.com/bltemp/non_stock_bulk_sell_sheet.pdf",
                    "functions": [
                        {
                            "name": "resize_to_fit",
                            "params": {
                                "width": 200
                            },
                            "save": {
                                "image_identifier": "external_sample_1"
                            }
                        }
                    ]
                }

AS IMAGE/PAGE

You can process each page on it’s own, pushing each page as a single image to your S3 bucket or Azure storage. Just add the extra JSON field “src_type” : “multi_page”.

                { "application_id": "YOUR_APP_ID",
                    "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
                    "src_type" : "multi_page",
                    "functions" :
                    [{
                      "name": "resize_to_fit",
                      "params": { "width" : 200, "height" : 200},
                      "save" : {
                            "image_identifier" : "external_sample_1"
                        }
                     }
                    ]}

SPECIFIC PAGES

You can pick individual pages, using the same functionality as above, by adding “pages” : [0,x,y] as a JSON child of “src_type”.


                { "application_id": "YOUR_APP_ID",
                    "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
                    "src_type" : {"name" : "multi_page", "pages" : [0,1]},
                    "v" : 1.21,
                    "functions" :
                    [{
                      "name": "resize_to_fit",
                      "params": { "width" : 200, "height" : 200},
                      "save" : {
                            "image_identifier" : "external_sample_1"
                        }
                     }
                    ]}



BURSTING


Bursting allows you to explode the PDF into all the individual pages and run them all in parallel on Blitline’s massive image processing cloud. This allows HUGE PDF’s to be processed in a fraction of the time it would take to do it on your own machine or in a linear fashion.

Here is what happens behind the scenes:

  • Blitline downloads the src pdf
  • Blitline breaks the PDF into individual pages, and uploads these pages to a temp storge location
  • Blitline automatically creates a new “job” copying over the functions and data you have specified in the “burst_job”, for each page of the PDF, automatically renaming the output files to have a “__X” suffix (THAT IS 2 UNDERSCORES, NOT 1). Where X refers to page number.
  • Blitline will track the jobs and when they are all completed will issue a “postback” to your postback_url or put the item in the long polling cache.

How to submit a PDF for bursting?

To tell Blitline that you wish to burst a PDF, you must set the src_type to “burst_pdf”.


An example job would look like this:

          {
              "application_id": "YOUR_APP_ID",
              "src": "https://s3.amazonaws.com/bltemp/non_stock_bulk_sell_sheet.pdf",
              "src_type": "burst_pdf",
              "v": 1.2,
              "src_data": {
                  "dpi": 200
              },
              "functions": [
                  {
                      "name": "resize_to_fit",
                      "params": {
                          "width": 500
                      },
                      "save": {
                          "image_identifier": "external_sample_1"
                      }
                  }
              ]
          }


The resulting RESPONSE will look like this:

          {
                "results":
                {
                    "images":[{
                        "image_identifier": "MY_CLIENT_ID",
                            "s3_url": "https://s3.amazonaws.com/dev.blitline/2011111513/1/fDIFJQVNlO6IeDZwXlruYg.jpg"
                    }],
                    "job_id": "4ec2e057c29aba53a5000001",
                    "group_completion_job_id" : "B734Hasd23423llasda"
                }
            }


Notice there is a group_completion_job_id, which is a “virtual job_id” indicating the completion of the group of jobs. You can poll this group_completion_job_id just as you would a regular job. It will also be the job_id of the postback when ALL the individual jobs are completed.

When outputting the PDF files, Blitline will automatically append two underscores (“__”) and a page number to the filename. When you specify an s3_destination, Blitline will automatically output to your key + “__” + (page number). This is a canonical format that Blitline uses.



How did we do?