Basic PDF Manipulation with Fileforge API

Basic PDF Manipulation with Fileforge API

Monday, July 8, 2024

Auguste Lefevre Grunewald

Studied Computer Science and Quantitative Finance. Co-founder @ Fileforge. I'm passionate about technology, politics, history and nature. I love to share my thoughts and learn from others.

tl;dr: In this article you will learn how to manipulate PDFs with the Fileforge API. We will show you how to merge, split, extract and insert pages from PDFs with the help of the Fileforge API.

Introduction

PDF manipulation is a common task in document processing. Whether you need to merge multiple PDFs into a single document, split a large PDF into smaller files, or extract/insert specific pages from a PDF, the Fileforge API provides a simple and efficient solution. We aim at simplifying the process of working with PDF files programmatically, allowing developers to focus on building great applications. In this tutorial, we will demonstrate how to perform basic PDF manipulation operations using the Fileforge Node SDK. Let’s get started

Prerequisites

Before we get started, you will need to sign up for a free Fileforge account and create an API key. You can sign up for an account and get your API key from the dashboard.

I also encourage you to take a look at the documentation for the Fileforge API.

Manipulation Operations

We will cover three basic PDF manipulation operations in this tutorial:

  • Merging PDFs
  • Splitting PDFs
  • Extracting Pages from PDFs
  • Inserting Pages into PDFs

On a future update, we will also cover more operation such as delete pages, rotate pages, and more.

Merging PDFs

The behavior of the merge operation is straightforward: it combines multiple PDF files into a single PDF document. The files parameter is an array of file paths or readable streams representing the PDF files to merge. PDFs are merged in the order they are provided in the array. Here is a simple example of how to merge two PDF files using the Fileforge API in Node.js:

1
import { FileforgeClient } from "@fileforge/client";
2
import * as fs from "fs";
3
4
(async () => {
5
const ff = new FileforgeClient({
6
apiKey: process.env.FILEFORGE_API_KEY,
7
});
8
try {
9
const pdfFiles = [
10
fs.createReadStream(__dirname + "/pdf1.pdf"),
11
fs.createReadStream(__dirname + "/pdf2.pdf"),
12
];
13
const mergedPdfStream = await ff.pdf.merge(
14
pdfFiles,
15
{
16
options: {
17
// Specify merge options if any
18
},
19
},
20
{
21
timeoutInSeconds: 60,
22
}
23
);
24
mergedPdfStream.pipe(fs.createWriteStream("./result_merge.pdf"));
25
console.log("PDF merge successful. Stream ready.");
26
} catch (error) {
27
console.error("Error during PDF merge:", error);
28
throw error;
29
}
30
})();

Splitting PDFs

The split operation divides a PDF document into 2 smaller PDF files. The splitPage parameter specifies the page number at which to split the PDF. It is one-based, meaning the first page is 1, the second page is 2, and so on. Here is an example of how to split a PDF file at page 3 using the Fileforge API in Node.js:

1
import { FileforgeClient } from "@fileforge/client";
2
import * as fs from "fs";
3
4
(async () => {
5
const ff = new FileforgeClient({
6
apiKey: process.env.FILEFORGE_API_KEY,
7
});
8
9
try {
10
const splitRequest = {
11
options: {
12
splitPage: 3,
13
},
14
};
15
const requestOptions = {
16
timeoutInSeconds: 60,
17
maxRetries: 3,
18
};
19
const splitArchiveStream = await ff.pdf.split(
20
new File([fs.readFileSync(__dirname + "/samples/form.pdf")], "form.pdf", {
21
type: "application/pdf",
22
}),
23
splitRequest,
24
requestOptions
25
);
26
27
await pipeline(
28
splitArchiveStream,
29
fs.createWriteStream("./result_split.zip")
30
);
31
console.log("Split successful. Zip Stream ready.");
32
} catch (error) {
33
console.error("Error during PDF splitting:", error);
34
}
35
})();

One may observe that the output is a ZIP file containing the two split PDFs. We opted for such a design to ensure that the split operation is efficient and that the output is easily accessible, especially for large PDFs.

We might consider adding an options to split the documents a multiples pages, allowing to split 1 document into multiple documents instead of only 2. This feature will be added in a future update. If you are interested, please reach out to us.

Extracting Pages from PDFs

The extract operation allows you to extract a specific range of pages from a PDF document. It requires to parameters: start and end. These parameters specify the range of pages to extract and are one-based. Here is a simple demonstration to extract the first page only of a document:

1
import { FileforgeClient } from "@fileforge/client";
2
import * as fs from "fs";
3
4
(async () => {
5
const ff = new FileforgeClient({
6
apiKey: process.env.FILEFORGE_API_KEY,
7
});
8
9
try {
10
const extractRequest = {
11
options: {
12
start: 1,
13
end: 1,
14
},
15
};
16
const requestOptions = {
17
timeoutInSeconds: 60,
18
maxRetries: 3,
19
};
20
const extractStream = await ff.pdf.extract(
21
new File([fs.readFileSync(__dirname + "/samples/form.pdf")], "form.pdf", {
22
type: "application/pdf",
23
}),
24
extractRequest,
25
requestOptions
26
);
27
28
await pipeline(extractStream, fs.createWriteStream("./result_extract.pdf"));
29
console.log("Extraction successful.Stream ready.");
30
} catch (error) {
31
console.error("Error during PDF extraction:", error);
32
}
33
})();

Inserting Pages into PDFs

Finally, the insert operation allows you to insert a PDF document into another PDF document at a specific page. The insertPage parameter specifies the page number at which to insert the PDF. It is one-based. Here is an example of how to insert a PDF file at page 2 of another PDF file using the Fileforge API in Node.js:

1
import { FileforgeClient } from "@fileforge/client";
2
import * as fs from "fs";
3
4
(async () => {
5
const ff = new FileforgeClient({
6
apiKey: process.env.FILEFORGE_API_KEY,
7
});
8
9
try {
10
const pdfFiles = [
11
fs.createReadStream(__dirname + "/pdf1.pdf"),
12
fs.createReadStream(__dirname + "/pdf2.pdf"),
13
];
14
const insertPDFStream = await ff.pdf.insert(
15
pdfFiles,
16
{
17
options: {
18
// Specify insert options if any
19
insertPage: 2,
20
},
21
},
22
{
23
timeoutInSeconds: 60,
24
}
25
);
26
insertPDFStream.pipe(fs.createWriteStream("./result_insert.pdf"));
27
console.log("PDF inserted successfully. Stream ready.");
28
} catch (error) {
29
console.error("Error during PDF insertion:", error);
30
throw error;
31
}
32
})();

Conclusion

In this tutorial, we have demonstrated how to perform basic PDF manipulation operations using the Fileforge API. We covered merging PDFs, splitting PDFs, extracting pages from PDFs, and inserting pages into PDFs. These operations are essential for document processing and can be easily integrated into your applications using the Fileforge Node SDK. If you have any questions or need further assistance, feel free to reach out to us. We are here to help you build great applications with the Fileforge API. Happy coding!

PS: all the code snippets are available in the API Documentation.

Want to talk to a {PDF} Expert?

We will be happy to help you with any questions you may have about our products or any other inquiries.

Be in touch today

Related products

Also on our blog