Path: blob/master/Advertools/Advertools_Analyze_website_content_using_XML_sitemap.ipynb
2973 views
Advertools - Analyze website content using XML sitemap
Tags: #advertools #xml #sitemap #website #analyze #seo
Author: Elias Dabbas
Last update: 2023-05-23 (Created: 2023-05-09)
Description: This notebook helps you get an overview of a website's content by analyzing and visualizing its XML sitemap. It's also an important SEO audit process that can uncover some potential issues that might affect the website.
Input
Install libraries
If running it on naas, run the code below to install the libraries
Import libraries
Setup Variables
sitemap_url
: URL of the sitemap to analyze, which can beThe URL of an XML sitemap
The URL of an XML sitemapindex
The URL of a robots.txt file
Normal and zipped formats are supported
recursive
: If this is a sitemapindex, should all the sub-sitemaps also be downloaded, parsed and combined into one DataFrame?max_workers
: Number of concurrent workers to fetch the sitemaps.
Model
Analyze website content using XML sitemap
Getting the sitemap(s)
Split URLs into their components for further analysis/understanding
Output
Display results
Errors
Duplicated URLs
URL counts per sitemap and sitemap sizes
Each sitemap should have a maximumof 50,000 URLs, and its size should not exceek 50MB
URL counts:
URL Sizes: