Skip to content

Scraper that evaluates tens of SEO factors for any website

Notifications You must be signed in to change notification settings

abacusadvertising/SEO-Spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEO-Spider

Scraper that evaluates tens of SEO factors for any website

! Important: The code here is for presentation purposes only. I removed ~ 1600 lines of code and files too

Detailed list of features and analyzed data

HTML elements:

  • Page size – recommended to be less than 300KB
  • Page text percentage and text/html ratio – checks if page text size is between 20-50% and text/html ratio 25-70%
  • Title tag – check if defined for each page and its length; per page and site statistics
    • all pages should have a title tag and the length should be between 10 and 70 characters
  • H1->H6 tags – number, length and list – per page and site statistics
    • there shouldn’t be more than one H1tag per page
    • shows how much you use subheadings to break text into smaller blocks
  • Tables – number and nesting level – it is advisable not to use tables, unless absolutely necessary
  • Paragraphs number and their length – per page and site statistics
    • large blocks of text should be broken in paragraphs for easy reading
  • Html header tags:
    • meta description – check if defined for each page and its length; per page and site statistics
    • meta keywords, meta robots, meta charset, language, doctype, canonical link – check if defined for each page
    • favicon, Google authorship, Google publisher
  • Excess spaces length – these are spaces in the html code that can be removed to decrease page size
  • Html comments length – these can also be removed to decrease page size
  • CSS and JS files, inline styles – number per page and site statistics : the fewer, the better

Links :

  • Pages that have the most internal backlinks – internal SEO: your most important pages are the ones which have the most links pointing to them
  • Site maximum depth – how far away your pages are from the home page; the fewer levels, the better; a flat structure is desirable
  • Internal and external links list – per page ; checks if number of links exceeds 100
  • Links that have title attribute defined – number per page and website statistics ; they do make a difference for SEO
  • Broken internal and external links – these should be fixed;
    includes bad anchors – a common issue found during analysis of multiple websites is to find links like “http:/..” (there should be 2 slashes)
  • No follow links – too many do-follow external links can affect page rank
  • Urls with underscores – underscore is not considered a separator so it’s better to replace them with hyphens
  • Urls with query strings – better use URL rewriting to improve usability and search friendliness of your site
    eghttp://www.mysite.com/myproductdetails.php?id=7’ -> ‘http://www.mysite.com/products/7/
  • Cloaked links
  • Links from comments – If you’re curious what websites your most active fans own
  • Feeds
  • Links to files – PDFs, Excel and Word docs, images, etc – you might want to remove links pointing to images

Images:

Checks if alt and title attributes are defined and their number; per page and website statistics. Image alt attribute should be less than 150 characters.

Social engagement:

  • Facebook shares, likes and comments + total
  • Blog comments – per page and website total

Website information:

  • Response time – defines the overall site speed and identifies slow pages
  • Whois data – creation, update and expiration dates, registrant name and contact info, registrar, name servers. Domain registration length and domain privacy are ranking factors.
  • Domain redirection – google sees http://www.[domain] and ‘http://[domain]’ as 2 separate sites; you should configure redirection from non-preferred domain to the preferred one (most commonly ‘http://[domain]’)
  • Alexa ranks
  • Theme – identifies website theme
  • Sitemaps
  • Emails
  • Plugins, services and technologies your (competitor) website is using

Keyword usage:

Identifies whether the keyword is present, in which position (better at the beginning) and how many times in each of the html elements: url, title, meta description, H1->H6 tags, body, image filename, title and alt attributes.

About

Scraper that evaluates tens of SEO factors for any website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages