Skip to content

Latest commit

 

History

History
93 lines (70 loc) · 3.51 KB

File metadata and controls

93 lines (70 loc) · 3.51 KB

Description

This Google Apps Script recursively returns all locs of url elements starting from a sitemap file or from a sitemap index file and writes the results to a pre-formatted Google Sheet that can be used as page feed in Google Ads.

This is my very first fork. Feel free to correct or add the information in this README and in the inline comments.

Also I'd be happy for any improvement of my very basic functions. Especially if you find a way to save more execution time. Just as reference: With this script I extracted 620K+ urls from 72 sitemap files in about 23 minutes. 30 minutes is the execution time limit in Google Apps. So this script will probably run into a timeout for bigger website projects with 800K+ indexed urls.

Functions

main()

Sets start and export file urls, prepares cache arrays and calls sub-functions.

fetchSitemaps(url)

Returns sitemap URLs from sitemap index files. Processes sitemap elements only.

fetchXml(url)

Pre-fetches the XML data from all cached sitemap urls to speed up the following extracting process.

extractLocsFromXml(xml)

Returns url locs from cached sitemap XML contents. Processes url elements only.

main() ⇒

Sets start and export file urls, prepares cache arrays and calls sub-functions.

Process steps:

  • Clear existing content in exportSheet
  • Write header to exportSheet
  • Start with url of sitemap or sitemap index file
  • Recursively return all sitemap urls and write them to temp sitemaps array
  • Retrieve the XML content of all sitemaps previously saved in temp sitemaps array
  • Extract all urls (locs) from all cached sitemap contents
  • Finally write all extracted urls (locs) to exportSheet

Kind: global function Customfunction:

Param Type Description
exportSheetUrl  "https://docs.google.com/spreadsheets/d/..." REQUIRED The URL of the Google Sheet to export to
startUrl "https://www.yourdomain.com/sitemap.xml" OR "https://www.yourdomain.com/sitemap-index.xml" REQUIRED The url of the sitemap or the sitemap index file

fetchSitemaps(url) ⇒

Returns sitemap URLs from sitemap index files. Processes sitemap elements only.

Kind: local variable Customfunction:

Param Type Description
url  "../sitemap.xml" OR "../sitemap-index.xml Actually processed sitemap URL provided by main()
return Array Extracted sitemap URLs

fetchXml(url) ⇒

Pre-fetches the XML data from all cached sitemap urls to speed up the following extracting process.

Kind: local variable Customfunction:

Param Type Description
url  "../sitemap.xml" OR "../sitemap-index.xml Actually processed sitemap URL provided by main()
return Array Fetched XML

extractLocsFromXml(xml) ⇒

Returns url locs from cached sitemap XML contents. Processes url elements only.

Kind: local variable Customfunction:

Param Type Description
xml  "<?xml version...>" Actually processed sitemap XML content provided by main()
return Array Extracted urls (locs)