Basic PDF Manipulation with Fileforge API
Monday, July 8, 2024
tl;dr: In this article you will learn how to manipulate PDFs with the Fileforge API. We will show you how to merge, split, extract and insert pages from PDFs with the help of the Fileforge API.
Introduction
PDF manipulation is a common task in document processing. Whether you need to merge multiple PDFs into a single document, split a large PDF into smaller files, or extract/insert specific pages from a PDF, the Fileforge API provides a simple and efficient solution. We aim at simplifying the process of working with PDF files programmatically, allowing developers to focus on building great applications. In this tutorial, we will demonstrate how to perform basic PDF manipulation operations using the Fileforge Node SDK. Let’s get started
Prerequisites
Before we get started, you will need to sign up for a free Fileforge account and create an API key. You can sign up for an account and get your API key from the dashboard.
I also encourage you to take a look at the documentation for the Fileforge API.
Manipulation Operations
We will cover three basic PDF manipulation operations in this tutorial:
- Merging PDFs
- Splitting PDFs
- Extracting Pages from PDFs
- Inserting Pages into PDFs
On a future update, we will also cover more operation such as delete pages, rotate pages, and more.
Merging PDFs
The behavior of the merge operation is straightforward: it combines multiple PDF files into a single PDF document. The files
parameter is an array of file paths or readable streams representing the PDF files to merge. PDFs are merged in the order they are provided in the array. Here is a simple example of how to merge two PDF files using the Fileforge API in Node.js:
1 | import { FileforgeClient } from "@fileforge/client"; |
2 | import * as fs from "fs"; |
3 | |
4 | (async () => { |
5 | const ff = new FileforgeClient({ |
6 | apiKey: process.env.FILEFORGE_API_KEY, |
7 | }); |
8 | try { |
9 | const pdfFiles = [ |
10 | fs.createReadStream(__dirname + "/pdf1.pdf"), |
11 | fs.createReadStream(__dirname + "/pdf2.pdf"), |
12 | ]; |
13 | const mergedPdfStream = await ff.pdf.merge( |
14 | pdfFiles, |
15 | { |
16 | options: { |
17 | // Specify merge options if any |
18 | }, |
19 | }, |
20 | { |
21 | timeoutInSeconds: 60, |
22 | } |
23 | ); |
24 | mergedPdfStream.pipe(fs.createWriteStream("./result_merge.pdf")); |
25 | console.log("PDF merge successful. Stream ready."); |
26 | } catch (error) { |
27 | console.error("Error during PDF merge:", error); |
28 | throw error; |
29 | } |
30 | })(); |
Splitting PDFs
The split operation divides a PDF document into 2 smaller PDF files. The splitPage
parameter specifies the page number at which to split the PDF. It is one-based, meaning the first page is 1, the second page is 2, and so on. Here is an example of how to split a PDF file at page 3 using the Fileforge API in Node.js:
1 | import { FileforgeClient } from "@fileforge/client"; |
2 | import * as fs from "fs"; |
3 | |
4 | (async () => { |
5 | const ff = new FileforgeClient({ |
6 | apiKey: process.env.FILEFORGE_API_KEY, |
7 | }); |
8 | |
9 | try { |
10 | const splitRequest = { |
11 | options: { |
12 | splitPage: 3, |
13 | }, |
14 | }; |
15 | const requestOptions = { |
16 | timeoutInSeconds: 60, |
17 | maxRetries: 3, |
18 | }; |
19 | const splitArchiveStream = await ff.pdf.split( |
20 | new File([fs.readFileSync(__dirname + "/samples/form.pdf")], "form.pdf", { |
21 | type: "application/pdf", |
22 | }), |
23 | splitRequest, |
24 | requestOptions |
25 | ); |
26 | |
27 | await pipeline( |
28 | splitArchiveStream, |
29 | fs.createWriteStream("./result_split.zip") |
30 | ); |
31 | console.log("Split successful. Zip Stream ready."); |
32 | } catch (error) { |
33 | console.error("Error during PDF splitting:", error); |
34 | } |
35 | })(); |
One may observe that the output is a ZIP file containing the two split PDFs. We opted for such a design to ensure that the split operation is efficient and that the output is easily accessible, especially for large PDFs.
We might consider adding an options to split the documents a multiples pages, allowing to split 1 document into multiple documents instead of only 2. This feature will be added in a future update. If you are interested, please reach out to us.
Extracting Pages from PDFs
The extract operation allows you to extract a specific range of pages from a PDF document. It requires to parameters: start
and end
. These parameters specify the range of pages to extract and are one-based. Here is a simple demonstration to extract the first page only of a document:
1 | import { FileforgeClient } from "@fileforge/client"; |
2 | import * as fs from "fs"; |
3 | |
4 | (async () => { |
5 | const ff = new FileforgeClient({ |
6 | apiKey: process.env.FILEFORGE_API_KEY, |
7 | }); |
8 | |
9 | try { |
10 | const extractRequest = { |
11 | options: { |
12 | start: 1, |
13 | end: 1, |
14 | }, |
15 | }; |
16 | const requestOptions = { |
17 | timeoutInSeconds: 60, |
18 | maxRetries: 3, |
19 | }; |
20 | const extractStream = await ff.pdf.extract( |
21 | new File([fs.readFileSync(__dirname + "/samples/form.pdf")], "form.pdf", { |
22 | type: "application/pdf", |
23 | }), |
24 | extractRequest, |
25 | requestOptions |
26 | ); |
27 | |
28 | await pipeline(extractStream, fs.createWriteStream("./result_extract.pdf")); |
29 | console.log("Extraction successful.Stream ready."); |
30 | } catch (error) { |
31 | console.error("Error during PDF extraction:", error); |
32 | } |
33 | })(); |
Inserting Pages into PDFs
Finally, the insert operation allows you to insert a PDF document into another PDF document at a specific page. The insertPage
parameter specifies the page number at which to insert the PDF. It is one-based. Here is an example of how to insert a PDF file at page 2 of another PDF file using the Fileforge API in Node.js:
1 | import { FileforgeClient } from "@fileforge/client"; |
2 | import * as fs from "fs"; |
3 | |
4 | (async () => { |
5 | const ff = new FileforgeClient({ |
6 | apiKey: process.env.FILEFORGE_API_KEY, |
7 | }); |
8 | |
9 | try { |
10 | const pdfFiles = [ |
11 | fs.createReadStream(__dirname + "/pdf1.pdf"), |
12 | fs.createReadStream(__dirname + "/pdf2.pdf"), |
13 | ]; |
14 | const insertPDFStream = await ff.pdf.insert( |
15 | pdfFiles, |
16 | { |
17 | options: { |
18 | // Specify insert options if any |
19 | insertPage: 2, |
20 | }, |
21 | }, |
22 | { |
23 | timeoutInSeconds: 60, |
24 | } |
25 | ); |
26 | insertPDFStream.pipe(fs.createWriteStream("./result_insert.pdf")); |
27 | console.log("PDF inserted successfully. Stream ready."); |
28 | } catch (error) { |
29 | console.error("Error during PDF insertion:", error); |
30 | throw error; |
31 | } |
32 | })(); |
Conclusion
In this tutorial, we have demonstrated how to perform basic PDF manipulation operations using the Fileforge API. We covered merging PDFs, splitting PDFs, extracting pages from PDFs, and inserting pages into PDFs. These operations are essential for document processing and can be easily integrated into your applications using the Fileforge Node SDK. If you have any questions or need further assistance, feel free to reach out to us. We are here to help you build great applications with the Fileforge API. Happy coding!
PS: all the code snippets are available in the API Documentation.