Skip to content

Latest commit

 

History

History
86 lines (61 loc) · 3.56 KB

CRAFTOA-doc-Google_Scholar.md

File metadata and controls

86 lines (61 loc) · 3.56 KB

Google Scholar

Last version: 2024-09-03

Searchable bibliographic index that displays full text and metadata of scholarly literature across an array of publishing formats and disciplines.

General information

Name Google Scholar
Website https://scholar.google.com/
Owner Google
Owner type Private company
Owner country USA
Launch year 2004
Scope Any
Number of items Above 389 million scholarly documents (Michael Gusenbauer, 2018)
Access for index users Free
Access for index data providers Free
Documentation Technical inclusion guidelines: https://scholar.google.com/intl/fr/scholar/inclusion.html
Indexing guidelines: https://scholar.google.com/intl/fr/scholar/inclusion.html#indexing
Application form for providers None

Content and service

Content type Scholarly articles (Journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts)
Content language Any
Content geographical provenance Any
Indexing level for publications Articles
Full text link to the full text when available
Index sources Any website with scholarly articles and proper URL
Supported standards HTML meta tags supporting:
Highwire Press tags
BE Press tags
PRISM tags
Dublin Core tags
Contact Address for Providers None
Bibliodiversity support No limitation on the language of the content or the publishing business model

Additional services:

Google scholar Metrics (visibility and influence of recent articles in scholarly publications): https://scholar.google.com/intl/fr/scholar/metrics.html

Google scholar Profiles (showcase of authors publications): https://scholar.google.com/intl/fr/scholar/citations.html

Requirements for academic publications

Joining process

Google Scholar uses automated software, known as “robots” or “crawlers”, to fetch files for inclusion in the search results. The journal’s website needs to be structured in a way that makes it possible to “crawl” in this manner. Automatic crawlers need to be able to discover and fetch the URLs of all articles, as well as to periodically refresh their content from the journal website.

Data Collection process
Web crawling

Minimum requirements

Editorial minimum requirements

None.

Technical minimum requirements

Data file format

HTML or PDF

Metadata mandatory fields

For PDF publications:
Title (large font)
Authors (below title)
Bibliographic citation of the paper (in first page footer)
Bibliography (separate section)

For HTML publications (metadata included in metatags of the HTML file):
Title
Authors
Publication date
Bibliographic citation of the paper
Bibliography

Additional criteria

SEO/UX requirements

Depending on the number of papers on the website:
If small amount: all articles should be listed on 1 page
If thousands: the website should contain full list of all articles ordered by publication or entry date
If more than 100 000: specific browsing interface with the last updates
If robots.txt is used, the file should be configured to allow GScholar robots crawling