Scroll Top
Evotec Services sp. z o.o., ul. Drozdów 6, Mikołów, 43-190, Poland

PSParseHTML – Parse HTML PowerShell Module

PSParseHTML

PSParseHTML started as a suite of data processing Cmdlets to help PSWriteHTML, but it has gained functionality enough to be its own module. Basic usage instructions are described on this blog post.

PSParseHTML provides the following ten (10) functions:

  • Convert-HTMLToText
  • ConvertFrom-HtmlTable
  • ConvertFrom-HTMLAttributes (aliases: ConvertFrom-HTMLTag, ConvertFrom-HTMLClass)
  • ConvertFrom-HTML
  • Format-CSS
  • Format-HTML
  • Format-JavaScript
  • Optimize-CSS
  • Optimize-HTML
  • Optimize-JavaScript

The expected input is a string literal or string data read from a file. The output can be PowerShell objects (classes are HtmlNode or AngleSharp.Html.Dom.HtmlElement depending on the chosen processing engine) or string literals written to stdout.

It may not seem like much, but those ten functions are powerful enough to realize robust HTML processing in shell.

Installation

Install from PSGallery

Install-Module -Name PSParseHTML -AllowClobber -Force

Force and AllowClobber aren't necessary but they do skip errors in case some appear.

Update from PSGallery

Update-Module -Name PSParseHTML

That's it. Whenever there's a new version you simply run the Update-Module command and enjoy. Remember that you may need to close, re-open your PowerShell session if you had used the module prior to updating it.

As usual, remember module updates may break your scripts: if your scripts work for you in production, retain those versions until you test new versions in a dev environment. I may make small changes which are big enough so that your automated updates will break your scripts. For example, I might make a small rename to a parameter — boom, your code stops working! Be responsible!

3rd party references

This module utilizes several external dependencies to do its work. The authors of those libraries have done fantastic work — I've just added some PowerShell to the mix.