Drupal and the Open Web in the Australian Government - 2022 edition

Have you ever wondered how popular Drupal is in your local state and at the Australian Federal Government level?

Drupal and the Open Web in the Australian Government - 2022 edition

This is the complementary blog post for my DrupalSouth Brisbane 2022 session.

Have you ever wondered how popular Drupal is in your local state and at the Australian Federal Government level? This blog post will help to answer that question, using open source tooling. The hope is that you gain some insight to the relative popularity of Drupal and appreciate more the impact you and Drupal have in Australia.

Existing solutions

There are a number of websites that will claim to be able to give you this information. However they all will likely want you at some point to pay them money.

  • wappalyzer.com
  • semrush.com
  • builtwith.com
  • whatcms.com
  • similartech.com
  • larger.io
“I wanted an open way to do this”

As it turns out, you can plug a few things together to scrape the technologies in use

Problem #1: How to get a list of all Australia Government domains

If anything, there are too many sources of this information:

A crawling method could also be done, loads of suitable seed sites, e.g.:

The main issue is that just having a list of sites, does not convey the importance of the site relative to another site.

Enter DomCop to which publishes a list of the top 10 million domains on the internet, including a rank and Open PageRank.

DomCop's top 10 million websites with a filter of .gov.au applied.
DomCop's top 10 million websites with a filter of .gov.au applied.

On top of suppling 5,795 Australian Government domains, there also is an "Open Page Rank" field. The PageRanks are calculated based on the Open data provided by Common Crawl and Common Search.

Problem #2: What is PageRank?

PageRank is a system for ranking web pages that Google's founders developed in 1996. A PageRank score of 0 is typically a low-quality website, whereas, a score of 10 would represent only the most authoritative sites on the web. It is logarithmic (with a base of 5).

A site with PageRank 3 is 5 times more authoritative than a site with PageRank 2.

Australian Governments sites are never static, they are constantly evolving. Sometimes several sites merge into 1, or sometimes 1 site splits into move sites.

DHS Victoria is now closed. 3 sites now replace this 1 site.
DHS Victoria is now closed. 3 sites now replace this 1 site.
DESE is also now closed. 2 sites replace this 1 site.
DESE is also now closed. 2 sites replace this 1 site.

Just show me the graphs

Disclaimer:

  • This is based on Sept 22, 2022 data
  • The scoring is based off PageRank data, so the percentages are not raw counts of websites, but an approximation of how important the respective sites are compared to others (assumes a logarithmic base of 5).
  • Wappalyzer detection is not perfect (see the end of this blog post for upstreamed PRs), and there is still a fairly large portion of sites where the CMS cannot be identified
  • MoGs make this tricky (PageRank relies on incoming links, which break due to MoGs)
  • Only *.gov.au domains considered (some Government sites use other TLDs)
  • Unlikely newly created websites are in the top 10 million just yet (due to how PageRank works)

All sites (*.gov.au)

All sites (*.gov.au)

Federal sites (not state based domains)

Programmes like GovCMS are having an impact here.

Federal sites (every non-state based domain)

Victoria *.vic.gov.au

The Single Digital Presence (SDP) programme makes a mark in Victoria.

Victoria (*.vic.gov.au)

New South Wales *.nsw.gov.au

Large Drupal sites like https://www.nsw.gov.au/ and https://www.service.nsw.gov.au/ help to make Drupal dominant in NSW.

New South Wales (*.nsw.gov.au)

South Australia *.sa.gov.au

South Australia (*.sa.gov.au)

Western Australia *.wa.gov.au

A lot of unknown CMSs in WA, including sites like https://ww2.health.wa.gov.au/ which I still have no idea what the CMS used is. Edit - a keen eyed developer has told me that because this URL exists, so the WA Health site is generated by SiteCore.

Western Australia (*.wa.gov.au)

Tasmania *.tas.gov.au

The lowest usage of Drupal for any Australian state or territory and the highest percentage of Wordpress.

Tasmania (*.tas.gov.au)

Queensland *.qld.gov.au

Queensland (*.qld.gov.au)

Australian Capital Territory *.act.gov.au

The highest percentage of Squiz compared to any other Australia state or territory.

Australian Capital Territory (*.act.gov.au)

Northern Territory *.nt.gov.au

Northern Territory (*.nt.gov.au)

Open Source Software (OSS) CMS vs Proprietary CMS

For the CMS' that can be identified, splitting them into 2 categories, OSS and Proprietary.

Open Source Software (OSS) CMS vs Proprietary CMS

Drupal sites by major version

For sites reporting as Drupal, Drupal 9 and 7 are the most popular.

Drupal by major version

Observations and other unusual findings

#1 - Drupal usage

“Drupal powers roughly 27% of all digital experiences that you use in the Australian government”

#2 - Top contender

“Squiz Matrix is the top contender with 15%, and has a clear state led mandate in certain states/territories”

#3 - TLS coverage

TLS coverage is not 100% - 129 domains found with no TLS

Domain

CMS

Page Rank

Score

http://www.bom.gov.au/

unknown

5.51

7,101

http://handle.slv.vic.gov.au/

unknown

4.47

1,332

http://www.mbsonline.gov.au/internet/mbsonline/publishing.nsf/Content/Home

hcl-notes

4.45

1,289

http://onesearch.slq.qld.gov.au/primo-explore/search?vid=SLQ

unknown

4.41

1,209

http://www.majorprojects.planning.nsw.gov.au/

unknown

4.41

1,209

The second most trafficked site in the Australian Government does not support TLS. Instead this awkward redirect page is used. And a sad face emoji. Sad face indeed.

#4 - If in doubt, add a number

19 domains found with ww[number] as a subdomain.

Domain

CMS

Page Rank

Score

https://www2.gbrmpa.gov.au/

drupal

4.85

2,455

https://www1.health.gov.au/

unknown

4.62

1,695

https://ww2.health.wa.gov.au/

unknown

4.49

1,375

http://www9.health.gov.au/

unknown

4.3

1,013

https://www0.landgate.wa.gov.au/

squiz-matrix

4.25

935

When you run out of subdomains, just add a number.

#5 - You cannot kill Dreamweaver

15 sites found in 2022.

Extending this for the future

  • Crawl other domain spaces, e.g. the New Zealand government domain space *.govt.nz
  • Make a website and publish this data quarterly (DomCop's data updates around this frequency)
  • Measure trends over time

Upstreamed enhancements

These are all to make the detection of CMS' and Javascript frameworks more accurate for Australia Government sites.

Raw data

If you want to do your own analysis, here is a link to a full CSV dump.

Comments

I am keen to hear feedback on this data, and what can be done to improve the scoring. Also, if you can help fill in some of the 'unknown' data, let me know, I am happy to craft another PR into Wappalyzer.