Our Blog

Merging, splitting and creating PDF files with PowerShell

  • Standard
  • 0
  • Przemyslaw Klys

We're in the last days of 2019, and this will be my last blog post this year. What better way to end a good year than with the release of the new PowerShell module. If the title of today's blog post isn't giving it up yet, I wanted to share a PowerShell module called PSWritePDF that can help you create and modify (split/merge) PDF documents. It joins my other PowerShell modules to create different types of documents such as PSWriteWordPSWriteExcelPSWriteHTML. I know that PSWriteExcel is relatively basic, but both PSWriteHTML and PSWriteWord deliver robust build capabilities.

PSWritePDF is by no means a finished product. Like with most of my modules, I build some concept that matches view on how I would like it to look, and in the next months, I will probably update its functionality to match my expectations. But just because it isn't finished doesn't mean it's not functional. PSWritePDF is based on NET iText 7 library, and the licensing of it is strictly related to requirements of that library – and that means it's licensed under AGPL. I would be more than happy to make PowerShell part MIT, but I am no licensing expert, and therefore, for now (or forever, it will stay licensed the same way iText 7 is licensed). Since PSWritePDF is based on iText 7 it should be possible with some work to get all that functionality into PowerShell. That means that this module has excellent possibilities when it comes to potential use cases.

For now, I've divided the module functionality into two categories:

  • Standalone functions such as Split-PDF, Merge-PDF or Convert-PDFtoText
  • Bundled functions working like PSWriteHTML where they are not supposed to be used separately mainly to create PDF files (for now)
Installing / Updating PSWritePDF

Like with all my PowerShell modules, PSWritePDF is published to PowerShellGallery. That means all that you have to do to start working with my module is to install it.

Install-Module PSWritePDF -Force

And if the time comes that i will update it, all you have to do is run:

Update-Module PSWritePDF

The module should work on PowerShell 5.1PowerShell 6, and PowerShell 7 and work on Windows/Linux and macOS. However, I noticed some issues on PowerShell Core for some PDF files, but it seems to be related to iText 7 or my implementation of it. Not sure what the problem is, but iText 7 running on PowerShell 5.1 seems a bit more stable.

PSWritePDF - Development

Since PSWritePDF, like most of my modules, are under development most of the time, all sources are published on GitHub. If you want to contribute to this project or want to take a peek at sources, you can do so on GitHub. Please keep in mind that the PowerShellGallery version is optimized and better for production use. If you see any issues, bugs, or features that are missing, please make sure to submit them on GitHub.

Merging PDF with PowerShell

After installing PSWritePDF to merge two or more PDF files is as easy as using one command Merge-PDF with two parameters.

$FilePath1 = "$PSScriptRoot\Input\OutputDocument0.pdf"
$FilePath2 = "$PSScriptRoot\Input\OutputDocument1.pdf"

$OutputFile = "$PSScriptRoot\Output\OutputDocument.pdf" # Shouldn't exist / will be overwritten

Merge-PDF -InputFile $FilePath1, $FilePath2 -OutputFile $OutputFile

That's it.

Splitting PDF with PowerShell

Now that you know how to merge PDF files, it's time to learn how to split them. Right now, I've only implemented split by pages. It means given a file, it will split it into X number of files, where X is a number of pages in PDF.

Split-PDF -FilePath "$PSScriptRoot\SampleToSplit.pdf" -OutputFolder "$PSScriptRoot\Output"

That's it.

Extracting text from PDF

Another standalone function allows you to extract text from PDF. Of course, the text has to be computer generated. Sadly, it doesn't do any OCR.

# Get all pages text
Convert-PDFToText -FilePath "$PSScriptRoot\Example04.pdf"

# Get page 1 text only
Convert-PDFToText -FilePath "$PSScriptRoot\Example04.pdf" -Page 1

By using command above, you can extract text from one or more pages.

Creating PDF files with PowerShell

Creating new PDF files takes a similar approach to what I have built for PSWriteHTML or Documentimo (which will be migrated back to PSWriteWord at some point). It uses DSL (Domain-Specific Language) to help build your document in an easy-to-use way. I've created few basic functions, but surely in future I will try to add more and more of those to make sure it's possible to create feature-rich PDF files.

New-PDF {
    New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
    New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
    New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
    New-PDFList -Indent 3 {
        New-PDFListItem -Text 'Test'
        New-PDFListItem -Text '2nd'
    }

    New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
    New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
    New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
    New-PDFList -Indent 3 {
        New-PDFListItem -Text 'Test'
        New-PDFListItem -Text '2nd'
    }
} -FilePath "$PSScriptRoot\Example01_Simple.pdf" -Show

What we did above is we created a PDF document, added few texts to it using New-PDFText functions and created a list with 2 bullet points. What's important here is iText 7 brings some constant values for colors, fonts, and other types of styling. Most likely it's possible to expand beyond what is built-in using a different approach, but I didn't have time to play around those options. This means it's very basic in what it can do.

New-PDF -MarginTop 100 {
    New-PDFPage -PageSize A5 {
        New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
        New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
        New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
        New-PDFList -Indent 3 {
            New-PDFListItem -Text 'Test'
            New-PDFListItem -Text '2nd'
        }
    }
    New-PDFPage -PageSize A4 -Rotate {
        New-PDFText -Text 'Hello 1', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
        New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
        New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
        New-PDFList -Indent 3 {
            New-PDFListItem -Text 'Test'
            New-PDFListItem -Text '2nd'
        }
    }
} -FilePath "$PSScriptRoot\Example01_WithSections.pdf" -Show

As you can see above, the output from the code gave us two pages with different page sizes and rotations. It's important to understand that while the name of a function is New-PDFPage, it's not exactly a page. It's more of an area or a section. If you had enough text on the first “page”, it would span across multiple pages. New-PDFPage would create a new area starting from another page. Maybe it should be called New-PDFArea but seemed less intuitive. There's also a New-PDFOptions function that allows you to define margins for the whole document, but it isn't necessary. Both New-PDF and New-PDFPage have their margin parameters, making it a bit more direct approach where the margins get applied. As we have seen above, when we used margins for New-PDF, it applied to all pages. However, it's possible to apply margins using New-PDFPage, which can have different margins per each “page”. If you want to control margins for all pages, using them on New-PDF is the best choice.

New-PDF  -MarginLeft 120 -MarginRight 20 -MarginTop 20 -MarginBottom 20 -PageSize B4 -Rotate {
    New-PDFText -Text 'Test ', 'Me', 'Oooh' -FontColor BLUE, YELLOW, RED
    New-PDFList {
        New-PDFListItem -Text 'Test'
        New-PDFListItem -Text '2nd'
    }
} -FilePath "$PSScriptRoot\Example01_MoreOptions.pdf" -Show

Below is another example that shows using Margins on different levels and how they apply.

New-PDF -MarginTop 200 {
    New-PDFPage -PageSize A5 {
        New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
        New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
        New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
        New-PDFList -Indent 3 {
            New-PDFListItem -Text 'Test'
            New-PDFListItem -Text '2nd'
        }
    }
    New-PDFPage -PageSize A4 -Rotate -MarginLeft 10 -MarginTop 50 {
        New-PDFText -Text 'Hello 1', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
        New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
        New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'
        New-PDFList -Indent 3 {
            New-PDFListItem -Text 'Test'
            New-PDFListItem -Text '2nd'
        }
    }
} -FilePath "$PSScriptRoot\Example01_WithSectionsMargins.pdf" -Show

$Document = Get-PDF -FilePath "$PSScriptRoot\Example01_WithSections.pdf"
$Details = Get-PDFDetails -Document $Document
$Details | Format-List
$Details.Pages | Format-Table
Close-PDF -Document $Document

You can also notice that I've used additional code below to read the PDF I've just created and read the details of that PDF file.

Here's how the output of Get-PDFDetails look like

$Document = Get-PDF -FilePath "$PSScriptRoot\Example01_WithSections.pdf"
$Details = Get-PDFDetails -Document $Document
$Details | Format-List
$Details.Pages | Format-Table
Close-PDF -Document $Document

Notice how there are additional details for pages. You probably also noticed that margins show a bit different story. This is a known issue as I am not sure how to get margins for each page separatly. Hopefully, sooner or later, I'll figure it out, and this gets updated. 

Adding Tables to PDF files in PowerShell

While there may not be a lot of functionality yet, I've also added the ability to add tables to PDF. It's as simple as New-PDFTable -DataTable $YourData

$DataTable1 = @(
    [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
)

$DataTable2 = @(
    [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
    [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' }
)

New-PDF {
    New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true
    New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED
    New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.'

    New-PDFText -Text 'This table is representation of ', 'PSCustomObject', ' or other', ', ', 'standard types' -FontColor BLACK, RED, BLACK, BLACK, RED -FontBold $false, $true, $false, $false, $true

    New-PDFTable -DataTable $DataTable1

    New-PDFText -Text 'This shows how to create a list' -FontColor MAGENTA

    New-PDFList -Indent 3 {
        New-PDFListItem -Text 'Test'
        New-PDFListItem -Text '2nd'
    }

    New-PDFText -Text 'This table is representation of ', 'Hashtable/OrderedDictionary' -FontColor BLACK, BLUE

    New-PDFTable -DataTable $DataTable2

} -FilePath "$PSScriptRoot\Example06.pdf" -Show

As you can see above I've manually built the data in $DataTable1 and $DataTable2 variables but it should work with just any other data. Again, this is limited in functionality and it's just showing possible options. This will need further enhancement.

Tags: , , , ,

This is a unique website which will require a more modern browser to work! Please upgrade today!