Path: blob/master/Advertools/Advertools_Check_website_pages_status_code.ipynb
2973 views
Advertools - Check website pages status code
Tags: #advertools #website #status #code #check #pages
Author: Florent Ravenel
Last update: 2023-08-04 (Created: 2023-08-04)
Description: This notebook crawls your website and checks the status code of all pages. It starts from the home page and discovers URLs by following links within the website. It is a useful tool for quickly checking the status of your website and generating a report to take necessary actions.
References:
Input
Install libraries
If running it on naas, run the code below to uninstall (bug) and install the libraries
Import libraries
Setup variables
Mandatory
website_url
: URL of the website page to checkcron
: We use CRON tasks to schedule notebooks, find the syntax you need to on: https://crontab.guru/email_to
: Represents the recipient(s) of the email. By default, your email account on naas will be set.
Optional
output_dir
: Represents the output directory for the website crawl.timestamp
: Represents the timestamp when the code is executed.output_website_crawl
: Represents the output file name for the website crawl.output_website_crawl_log
: Represents the output file name for the log file of the website crawl.output_status_code_ko
: Represents the output file name for the status code report.subject
: Represents the subject line for the email.
Model
Define output paths
Create the output directory and define paths for the output files.