Our Blog

Submitting blogs to web.archive.org using PowerShell

  • Standard
  • 0
  • Przemyslaw Klys

Since my website went down in fire with OVH SBG2, I used this occasion to publish my restored website via Cloudflare. It allows me to have to cache, minimization, and some additional security. One thing that caught my attention while browsing through Cloudflare settings was the Always Online feature based on web.archive.org. Basically, the concept is – whenever the website is down, Cloudflare would go and fetch content from web.archive.org.

While the feature is in beta, and I really hope it won't ever happen again that my website will go down for longer than a few minutes, I thought that it's worth enabling this feature as some of the content I host is the documentation for my open-source projects that can't be found anywhere else. The only thought I had, is now I need to make sure that web.archive.org actually has my website covered and updated. If you go to the website, you can tell archive.org to archive your link, but the problem is – it will only take that link and nothing else.

Submitting all blogs to web.archive.org

To make sure all my blogs will be added to WayBackMachine without me spending tons of time, I've written a short script that will help me do it. The hardest part is getting all the blogs from wy website. Fortunately, a while back, I wrote a PowerShell module called PSWebToolbox that contains Get-RSSFeed cmdlet, which can scan RSS feed and get all blogs from any website.

'https://evotec.xyz/feed' | Get-RSSFeed -Count 12 -Verbose | Format-Table -AutoSize
'https://evotec.xyz/feed' | Get-RSSFeed -Count 2

Having PSWebToolbox with its Get-RSSFeed meant that I only need to submit the blogs using the Invoke-WebRequest cmdlet. Get-RSSFeed provides me titles, links, categories, and even descriptions. The script has basic error handling along with a fail-safe. If something fails (the archive.org website isn't the most responsive), the script will continue applying other blog posts. Once done, the script, when rerun, would push only missing links rather than starting from scratch.

$Blogs = Get-RSSFeed -Url 'https://evotec.xyz/feed/' -All
if (-not $StatusBlogs) {
    $StatusBlogs = [ordered] @{}
}
foreach ($Blog in $Blogs) {
    if ($StatusBlogs[$Blog.Link] -eq $true) {
        continue
    }
    Write-Color "[+] ", "Submiting blog ", $($Blog.Title), " ($($Blog.Link)) ", "published on ", $($Blog.PublishDate) -Color Yellow, White, Yellow, Cyan, White, Yellow, White, Red
    try {
        $Status = Invoke-WebRequest -Uri "https://web.archive.org/save/$($Blog.Link)" -ErrorAction Stop
        if ($Status.StatusCode -eq 200) {
            $StatusBlogs[$Blog.Link] = $true
            Write-Color "[+] ", "Submiting blog succeeded ", $($Blog.Title), " ($($Blog.Link)) ", "published on ", $($Blog.PublishDate) -Color Yellow, White, Yellow, Green, White, Yellow, White, Red
        } else {
            $StatusBlogs[$Blog.Link] = $false
            Write-Color "[-] ", "Submiting blog failed ", $($Blog.Title), " ($($Blog.Link)) ", "published on ", $($Blog.PublishDate) -Color Yellow, White, Yellow, Red, White, Yellow, White, Red
        }
    } catch {
        $StatusBlogs[$Blog.Link] = $false
        Write-Color "[-] ", "Submiting blog failed ", $($Blog.Title), " ($($Blog.Link)) ", "published on ", $($Blog.PublishDate), " with error: ", $($_.Exception.Message) -Color Yellow, White, Yellow, Red, White, Yellow, White, Red
    }
}

You may have also noticed me using the Write-Color cmdlet, which simplifies colorful messages. It's not necessary and could be easily replaced by Write-Host or anything else. If you would like to have Write-Color at your disposal, you can install it from PowerShellGallery or get the sources from GitHub.

Submitting blogs to WebArchive using PowerShell

Required PowerShell Modules

To install or update PowerShell modules required to get the above script up and running, you need to install the following PowerShell modules.

Install-Module PSWebToolbox
Install-Module PSWriteColor
Tags: , , , , , ,

This is a unique website which will require a more modern browser to work! Please upgrade today!