CI-Research
Popular repositories Loading
-
KeywordAnalysis
KeywordAnalysis PublicWord analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
-
spark-Jupyter-AWS
spark-Jupyter-AWS PublicForked from PiercingDan/spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Jupyter Notebook 1
-
cdx-index-client
cdx-index-client PublicForked from ikreymer/cdx-index-client
A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/
-
commoncrawl-examples
commoncrawl-examples PublicForked from commoncrawl/commoncrawl-examples
A library of examples showing how to use the Common Crawl corpus.
Java
-
dkpro-c4corpus
dkpro-c4corpus PublicForked from dkpro/dkpro-c4corpus
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.
Java
-
common_crawl_index
common_crawl_index PublicForked from trivio/common_crawl_index
Index URLs in Common Crawl
Python 1
Repositories
- KeywordAnalysis Public
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
CI-Research/KeywordAnalysis’s past year of commit activity - CommonCrawlDocumentDownload Public Forked from centic9/CommonCrawlDocumentDownload
A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types for mass-testing of frameworks like Apache POI and Apache Tika
CI-Research/CommonCrawlDocumentDownload’s past year of commit activity - cdx-index-client Public Forked from ikreymer/cdx-index-client
A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/
CI-Research/cdx-index-client’s past year of commit activity - spark-Jupyter-AWS Public Forked from PiercingDan/spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
CI-Research/spark-Jupyter-AWS’s past year of commit activity - dkpro-c4corpus Public Forked from dkpro/dkpro-c4corpus
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.
CI-Research/dkpro-c4corpus’s past year of commit activity - commoncrawl-examples Public Forked from commoncrawl/commoncrawl-examples
A library of examples showing how to use the Common Crawl corpus.
CI-Research/commoncrawl-examples’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…