Scraper that evaluates tens of SEO factors for any website
! Important: The code here is for presentation purposes only. I removed ~ 1600 lines of code and files too
- Page size – recommended to be less than 300KB
- Page text percentage and text/html ratio – checks if page text size is between 20-50% and text/html ratio 25-70%
- Title tag – check if defined for each page and its length; per page and site statistics
- all pages should have a title tag and the length should be between 10 and 70 characters
- H1->H6 tags – number, length and list – per page and site statistics
- there shouldn’t be more than one H1tag per page
- shows how much you use subheadings to break text into smaller blocks
- Tables – number and nesting level – it is advisable not to use tables, unless absolutely necessary
- Paragraphs number and their length – per page and site statistics
- large blocks of text should be broken in paragraphs for easy reading
- Html header tags:
- meta description – check if defined for each page and its length; per page and site statistics
- meta keywords, meta robots, meta charset, language, doctype, canonical link – check if defined for each page
- favicon, Google authorship, Google publisher
- Excess spaces length – these are spaces in the html code that can be removed to decrease page size
- Html comments length – these can also be removed to decrease page size
- CSS and JS files, inline styles – number per page and site statistics : the fewer, the better
- Pages that have the most internal backlinks – internal SEO: your most important pages are the ones which have the most links pointing to them
- Site maximum depth – how far away your pages are from the home page; the fewer levels, the better; a flat structure is desirable
- Internal and external links list – per page ; checks if number of links exceeds 100
- Links that have title attribute defined – number per page and website statistics ; they do make a difference for SEO
- Broken internal and external links – these should be fixed;
includes bad anchors – a common issue found during analysis of multiple websites is to find links like “http:/..” (there should be 2 slashes) - No follow links – too many do-follow external links can affect page rank
- Urls with underscores – underscore is not considered a separator so it’s better to replace them with hyphens
- Urls with query strings – better use URL rewriting to improve usability and search friendliness of your site
eg ‘http://www.mysite.com/myproductdetails.php?id=7’ -> ‘http://www.mysite.com/products/7/’ - Cloaked links
- Links from comments – If you’re curious what websites your most active fans own
- Feeds
- Links to files – PDFs, Excel and Word docs, images, etc – you might want to remove links pointing to images
Checks if alt and title attributes are defined and their number; per page and website statistics. Image alt attribute should be less than 150 characters.
- Facebook shares, likes and comments + total
- Blog comments – per page and website total
- Response time – defines the overall site speed and identifies slow pages
- Whois data – creation, update and expiration dates, registrant name and contact info, registrar, name servers. Domain registration length and domain privacy are ranking factors.
- Domain redirection – google sees ‘http://www.[domain]’ and ‘http://[domain]’ as 2 separate sites; you should configure redirection from non-preferred domain to the preferred one (most commonly ‘http://[domain]’)
- Alexa ranks
- Theme – identifies website theme
- Sitemaps
- Emails
- Plugins, services and technologies your (competitor) website is using
Identifies whether the keyword is present, in which position (better at the beginning) and how many times in each of the html elements: url, title, meta description, H1->H6 tags, body, image filename, title and alt attributes.