Fixes
- subTitle extraction works now
Fixes
- Blocked responses on the search page now properly retry the request (no more unhandled promise rejection)
- Smoother search page pagination
- More informative logs
- Fixed consent approval if browser crashes
Fixes
maxCrawledPlaces
+exportPlaceUrls
was giving inconsistent number of results.
Features
- Added
allPlacesNoSearch
to input. This option allows you to scrape all places shown on the map without the need for any search term. - Added
reviewsStartDate
to input to extract only reviews newer than this date. - Added
radiusKm
to thePoint
type incustomGeolocation
Improvement
additionalInfo
extraction is faster now.additionalInfo
extraction for hotels and similar categories is more complete now: Data which is not displayed on the Google page but present in the Google response is also extracted.
- Lowering the default zoom values. The past setup made the scraping too slow and costly. The new defaults will speed up the scraping a lot while missing only a few places. You can still manually override the
zoom
parameter. New default values are:country
orstate
-> 12county
-> 14city
-> 15postalCode
-> 16 no geolocation -> 12
Fixes
location
extraction works in (almost) all cases now (search URLs and URLs with place IDs will always work).
Features
- Added
oneReviewPerRow
to input to enable expanding reviews one per output row
Fixes
openingHours
extraction works in almost all cases now (search URLs and URLs with place IDs will always work).
- Start URLs now correctly work from uploaded CSV files or Google Sheets. It uses to trim part of the URL.
- Changed
polygon
input field tocustomGeolocation
- Added deeper section into Reamde on how you can provide your own exact coordinates
Breaking changes We decided it is time to change several default parameters to make the user experience smoother. These changes should not have a big effect on currect users.
city
and other geolocation parameters will have preference overlat
&long
if both are used (in 99% cases users want to use the automatic location splitting to get the most results which doesn't work with directlat
&long
)zoom
will no longer have a default value 12. Instead, it will change based on geolocation type like this:
country
or state
-> 12
county
-> 14
city
-> 17
postalCode
-> 18
no geolocation -> 12
Users will still be able to specify the zoom and override this behavior.
See Readme for more details
Breaking change
reviewsSort
is now set tonewest
by default. This is because some places don't yield all reviews on other sortings (we are not sure if this is a bug or silent block on Google's side)
Fixes
exportPlaceUrls
now properly dedupes the URLs- added
categories
fields listing all categories the place is listed in
Fixes
- Fixed
additionalInfo
for hotels - Fixed
exportPlaceUrls
not checking for correct geolocation
Fixes
website
field now displays the full URL. This fixes issue of blankfacebook.com
links.
Fixes
- Fixed new layout of
additionalInfo
Fixes
- Improved reliability of scraping place detail, reviews and images (improving scrolling and back button interaction)
Features
- Added
menu
to output - Added
price
to output
Fixes
- Fixed
popularTimesHistogram
which caused crash on some pages
Fixes
- Fixed image extraction & make it optional (it should not crash the whole scrape)
Fixes
- Fixed
temporarilyClosed
andpermanentlyClosed
- Added a step for normalizing input Start URLs because those with wrong format don't contain JSON data
Fixes
- Fixed popular times live and histogram
https://github.com/drobnikj/crawler-google-places/pull/185 https://github.com/drobnikj/crawler-google-places/issues/181
Fixes
- In like 10% cases, the reviews are in wrong order and there is less of them. We didn't find a root cause yet but we retry the page so the output gets corrected.
Breaking fix
- If you did not pass
maxReviews
in the input at all (undefined
), it scraped 5 reviews as default. That was against the input schema description so it is now fixed to scrape 0 reviews in those cases.
Fixes
- Fixed
placeId
extraction that was broken for some inputs - Fixed missing
imageUrls
Features
- Added option to input URLs with CID (Google My Business Listing ID) to start URLs, e.g. https://maps.google.com/?cid=12640514468890456789
- Added
cid
to output
Fixes
- Fixed
maxCrawledPlaces
not finishing quickly for large country-wise searches.maxCrawledPlacesPerSearch
still has this problem
Fixes
- Fixed problem that
startUrls
was not picking up all provided URLs sometimes (due to automaticuniqueKey
resolution) likesCount
in reviews
Fixes
maxCrawledPlaces
now compares to total sum of all places
Features
- Added
maxCrawledPlacesPerSearch
to limit max places per search term or search URL
Fixes
-
Address is now parsed correctly into components even when you supply direct place IDs
-
Migrated code from
apify
0.22.5 to 1.3.1
- Added
county
to geolocation options
Fixes (hopefully last fixes after the layout change)
- Scraping all images per place works again
- Fixed
additionalInfo
- Fixed
openiningHours
Fixes
- Fix handling of search pages without results
- Skip empty searches that sometimes users accidentally post
Features
- Added orderBy attribute to result scrape
Fixes
- Fully or partially fixed consent screen issues
- Should also help with
Failed to set the 'innerHTML' property on 'Element': This document requires 'TrustedHTML' assignment.
which is caused by injecting JQuery into constent screen
Fixes
- Fixed
reviewsTranslation
Fixes after Google changed layout, not everything was fixed. Next batch of fixed asap!
- Fixed additional data
- Fixed search pagination getting into infinite loop
- Fixed empty search handling
- Fixed reviews not being scraped
- Fixed
totalScore
Warning - Next version will be a breaking one as we will remove personal data from reviews by default. You will have to explicitly enable the fields below. Features
- Added input fields to selectively pick which personal data fields to scrape -
scrapeReviewerName
,scrapeReviewerId
,scrapeReviewerUrl
,scrapeReviewId
,scrapeReviewUrl
,scrapeResponseFromOwnerText
Fixes
- Removed duplicate reviews + all reviews scraped correctly
reviewsSort
finally works correctly- Reviews scraping is now significantly faster
- Handle error that irregularly happened when scraping huge amount of reviews
Features
- Added
reviewsDistribution
- Added
publishedAtDate
(exact date),responseFromOwnerDate
andresponseFromOwnerText
for each review
Fixes:
totalScore
andreviewsCount
are now correctly extracted for all languagesstartUrls
now correctly work non-.com domains and on detail places
Fixes:
- Search keyword that links only to a single place (like
"London Eye"
) now works correctly
Features:
- Address is parsed into
neighborhood
,street
,city
,postalCode
,state
andcountryCode
fields - Added
reviewsTranslation
option to adjust how Google translates reviews from non-English languages - Parsing ads. This means a bit more results. Those that are ads have
"isAdvertisement": true
field. - Added
useCachedPlaces
option to load places from your KV Store. Useful if you need to scrape the same places regularly. - Added
polygon
option to provide your own geolocation polygon.
Fixes:
- This one is big. We removed the infamous
Place is outside of required location (polygon)
error. The location of a place is now checked during paginating and these places are skipped. This means a massive speed up of the scraper.
Features:
- Automatic screenshots of errors to see what went wrong
- Added
searchPageUrl
to output - Added
PLACES-OUT-OF-POLYOGON
record to Key-Value store. You can check what places were excluded.
Fixes:
- Fixed rare bug with saving stats
- Improvement in review sorting - but it is still not ideal, more work needs to be done
- Added postal code geolocation to input
- Improved errors when location is not found
- Optimization - Removed geolocation data from intermediate requests
- Fixed handling of Google consent screen
- Better input validation and deprecation logs
- Changed default for
maxImages
to1
as it doesn't require scrolling for the main image imageUrls
are returned with the highest resolution
- Removed
forceEng
input in favor oflanguage
- The default setup now uses
maxImages: 0
andmaxReviews: 0
to improve efficiency
- added several browser options to input -
maxConcurrency
,maxPageRetries
,pageLoadTimeoutSec
,maxPagesPerBrowser
,useChrome
- rewamped input schema and readme
- Added
reviewerNumberOfReviews
andisLocalGuide
to reviews
- added few extra review fields (ID, URL)
- add an option for caching place location
- add an option for sorting of reviews
- add stats logging
- reworked input search string
- opening hour parsing (#39)
- separate locatedIn field (#32)
- update readme
- extract additional info - Service Options, Highlights, Offerings,.. (#41)
- add
maxReviews
,maxImages
(#40) - add
temporarilyClosed
andpermanentlyClosed
flags (#33) - allow to scrape only places urls (#29)
- add
forceEnglish
flag into input (#24, #21) - add searching in polygon using nominatim.org
- add startUrls
- added
maxAutomaticZoomOut
to limit how far can Google zoom out (it naturally zooms out as you press next page in search)