From 809b6c3fa32f12edc81c03a27deb2ab5ec19d660 Mon Sep 17 00:00:00 2001 From: Falah Gate Salieh Date: Fri, 21 Jul 2023 19:27:39 +0300 Subject: [PATCH 1/8] add dataset research_papers_dataset The "Research Papers Dataset 2023" contains information related to research papers. It includes the following features: - Title (dtype: string): The title of the research paper. - Abstract (dtype: string): The abstract of the research paper. ### Dataset Splits: The dataset is divided into one split: - Train Split: - Name: train - Number of Bytes: 2,363,569,633 - Number of Examples: 2,311,491 --- .../research_papers_dataset/ReadME.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 data/datasets/research_papers_dataset/ReadME.md diff --git a/data/datasets/research_papers_dataset/ReadME.md b/data/datasets/research_papers_dataset/ReadME.md new file mode 100644 index 0000000000..0411125fbb --- /dev/null +++ b/data/datasets/research_papers_dataset/ReadME.md @@ -0,0 +1,93 @@ +--- +dataset_info: + features: + - name: title + dtype: string + - name: abstract + dtype: string + splits: + - name: train + num_bytes: 2363569633 + num_examples: 2311491 + download_size: 1423881564 + dataset_size: 2363569633 +--- +## Research Paper Dataset 2023 + +### Dataset Information: + +The "Research Paper Dataset 2023" contains information related to research papers. It includes the following features: + +- Title (dtype: string): The title of the research paper. +- Abstract (dtype: string): The abstract of the research paper. + +### Dataset Splits: + +The dataset is divided into one split: + +- Train Split: + - Name: train + - Number of Bytes: 2,363,569,633 + - Number of Examples: 2,311,491 + +### Download Information: + +- Download Size: 1,423,881,564 bytes +- Dataset Size: 2,363,569,633 bytes + +### Dataset Citation: + +If you use this dataset in your research or project, please cite it as follows: + +``` +@dataset{Research Paper Dataset 2023, + author = {Falah.G.Salieh}, + title = {Research Paper Dataset 2023,}, + year = {2023}, + publisher = {Hugging Face}, + version = {1.0}, + location = {Online}, + url = {Falah/research_paper2023} +} + + +``` + + ### Apache License: +The "Research Paper Dataset 2023" is distributed under the Apache License 2.0. You can find a copy of the license in the LICENSE file of the dataset repository. + +The specific licensing and usage terms for this dataset can be found in the dataset repository or documentation. +Please make sure to review and comply with the applicable license and usage terms before downloading and using the dataset. + +### Example Usage: + +To load the "Research Paper Dataset 2023" using the Hugging Face Datasets Library in Python, you can use the following code: + +```python +from datasets import load_dataset + +dataset = load_dataset("Falah/research_paper2023") +``` +### Application of "Research Paper Dataset 2023" for NLP Text Classification and Chatbot Models + +The "Research Paper Dataset 2023" can be a valuable resource for various Natural Language Processing (NLP) tasks, including text classification and generating titles for books in the context of chatbot models. Here are some ways this dataset can be utilized for these applications: + +1. **Text Classification**: The dataset's features, such as the title and abstract of research papers, can be used to train a text classification model. By assigning appropriate labels to the research papers based on their topics or fields of study, the model can learn to classify new research papers into different categories. For example, the model can predict whether a research paper is related to computer science, biology, physics, etc. This text classification model can then be adapted for other applications that require categorizing text. + +2. **Book Title Generation for Chatbot Models**: By utilizing the research paper titles in the dataset, a natural language generation model, such as a sequence-to-sequence model or a transformer-based model, can be trained to generate book titles. The model can be fine-tuned on the research paper titles to learn patterns and structures in generating relevant and meaningful book titles. This can be a useful feature for chatbot models that recommend books based on specific research topics or areas of interest. + +### Potential Benefits: + +- Improved Chatbot Recommendations: With the ability to generate book titles related to specific research topics, chatbot models can provide more relevant and personalized book recommendations to users. +- Enhanced User Engagement: By incorporating the text classification model, the chatbot can better understand user queries and respond more accurately, leading to a more engaging user experience. +- Knowledge Discovery: Researchers and students can use the text classification model to efficiently categorize large collections of research papers, enabling quicker access to relevant information in specific domains. + +### Considerations: + +- Data Preprocessing: Before training the NLP models, appropriate data preprocessing steps may be required, such as text cleaning, tokenization, and encoding, to prepare the dataset for model input. +- Model Selection and Fine-Tuning: Choosing the right NLP model architecture and hyperparameters, and fine-tuning the model on the specific tasks, can significantly impact the model's performance and generalization ability. +- Ethical Use: Ensure that the generated book titles and text classification predictions are used responsibly and ethically, respecting copyright and intellectual property rights. + +### Conclusion: + +The "Research Paper Dataset 2023" holds great potential for enhancing NLP text classification models and chatbot systems. By leveraging the dataset's features and information, NLP applications can be developed to aid researchers, students, and readers in finding relevant research papers and generating meaningful book titles for their specific interests. Proper utilization of this dataset can lead to more efficient information retrieval and improved user experiences in the domain of research and academic literature exploration. From c4c5c877e2890d21b79e1ef28570b620049d1f20 Mon Sep 17 00:00:00 2001 From: Falah Gate Salieh Date: Fri, 21 Jul 2023 19:29:24 +0300 Subject: [PATCH 2/8] Create load_dataset.py --- data/datasets/research_papers_dataset/load_dataset.py | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 data/datasets/research_papers_dataset/load_dataset.py diff --git a/data/datasets/research_papers_dataset/load_dataset.py b/data/datasets/research_papers_dataset/load_dataset.py new file mode 100644 index 0000000000..36ce97e040 --- /dev/null +++ b/data/datasets/research_papers_dataset/load_dataset.py @@ -0,0 +1,2 @@ +from datasets import load_dataset +dataset = load_dataset("Falah/research_paper2023") From fd3f41f008b8d50df1f0a275725ad78be47f5afd Mon Sep 17 00:00:00 2001 From: your name Date: Sat, 22 Jul 2023 15:34:18 +0300 Subject: [PATCH 3/8] add_new_dataset_sentiments_381_classes --- .../sentiments-dataset-381-classes/README.md | 346 ++++++++++++++++++ .../load_dataset.py | 3 + 2 files changed, 349 insertions(+) create mode 100644 data/datasets/sentiments-dataset-381-classes/README.md create mode 100644 data/datasets/sentiments-dataset-381-classes/load_dataset.py diff --git a/data/datasets/sentiments-dataset-381-classes/README.md b/data/datasets/sentiments-dataset-381-classes/README.md new file mode 100644 index 0000000000..1a18a15e1f --- /dev/null +++ b/data/datasets/sentiments-dataset-381-classes/README.md @@ -0,0 +1,346 @@ +--- +dataset_info: + features: + - name: text + dtype: string + - name: sentiment + dtype: string + splits: + - name: train + num_bytes: 104602 + num_examples: 1061 + download_size: 48213 + dataset_size: 104602 +license: apache-2.0 +task_categories: +- text-classification +language: +- en +pretty_name: sentiments-dataset-381-classes +size_categories: +- 1K Date: Sat, 22 Jul 2023 17:09:48 +0300 Subject: [PATCH 4/8] updata dataset --- data/datasets/medium_articles_posts/README.md | 39 ++++++++ .../medium_articles_posts/load_dataset.py | 3 + .../research_papers_dataset/ReadME.md | 94 ++++++++++++++----- .../research_papers_dataset/load_dataset.py | 1 + .../research_papers_dataset/package-lock.json | 26 +++++ 5 files changed, 138 insertions(+), 25 deletions(-) create mode 100644 data/datasets/medium_articles_posts/README.md create mode 100644 data/datasets/medium_articles_posts/load_dataset.py create mode 100644 data/datasets/research_papers_dataset/package-lock.json diff --git a/data/datasets/medium_articles_posts/README.md b/data/datasets/medium_articles_posts/README.md new file mode 100644 index 0000000000..d355795186 --- /dev/null +++ b/data/datasets/medium_articles_posts/README.md @@ -0,0 +1,39 @@ +# Medium Articles Posts Dataset + +## Description + +The Medium Articles Posts dataset contains a collection of articles published on the Medium platform. Each article entry includes information such as the article's title, main content or text, associated URL or link, authors' names, timestamps, and tags or categories. + +## Dataset Info + +The dataset consists of the following features: + +- **title**: *(string)* The title of the Medium article. +- **text**: *(string)* The main content or text of the Medium article. +- **url**: *(string)* The URL or link to the Medium article. +- **authors**: *(string)* The authors or contributors of the Medium article. +- **timestamp**: *(string)* The timestamp or date when the Medium article was published. +- **tags**: *(string)* Tags or categories associated with the Medium article. + +## Dataset Size + +- **Total Dataset Size**: 1,044,746,687 bytes (approximately 1000 MB) + +## Splits + +The dataset is split into the following part: + +- **Train**: + - Number of examples: 192,368 + - Size: 1,044,746,687 bytes (approximately 1000 MB) + +## Download Size + +- **Compressed Download Size**: 601,519,297 bytes (approximately 600 MB) +### Usage example +```python +from datasets import load_dataset +#Load the dataset +dataset = load_dataset("Falah/medium_articles_posts") + +``` \ No newline at end of file diff --git a/data/datasets/medium_articles_posts/load_dataset.py b/data/datasets/medium_articles_posts/load_dataset.py new file mode 100644 index 0000000000..1cc8027b1d --- /dev/null +++ b/data/datasets/medium_articles_posts/load_dataset.py @@ -0,0 +1,3 @@ +from datasets import load_dataset +#Load the dataset +dataset = load_dataset("Falah/medium_articles_posts") diff --git a/data/datasets/research_papers_dataset/ReadME.md b/data/datasets/research_papers_dataset/ReadME.md index 0411125fbb..f927f272f0 100644 --- a/data/datasets/research_papers_dataset/ReadME.md +++ b/data/datasets/research_papers_dataset/ReadME.md @@ -1,22 +1,26 @@ --- dataset_info: features: - - name: title - dtype: string - - name: abstract - dtype: string + - name: title + dtype: string + - name: abstract + dtype: string splits: - - name: train - num_bytes: 2363569633 - num_examples: 2311491 + - name: train + num_bytes: 2363569633 + num_examples: 2311491 download_size: 1423881564 dataset_size: 2363569633 --- + ## Research Paper Dataset 2023 +[Check out this website](https://huggingface.co/datasets/Falah/research_paper2023) + ### Dataset Information: -The "Research Paper Dataset 2023" contains information related to research papers. It includes the following features: +The "Research Paper Dataset 2023" contains information related to research +papers. It includes the following features: - Title (dtype: string): The title of the research paper. - Abstract (dtype: string): The abstract of the research paper. @@ -53,41 +57,81 @@ If you use this dataset in your research or project, please cite it as follows: ``` - ### Apache License: -The "Research Paper Dataset 2023" is distributed under the Apache License 2.0. You can find a copy of the license in the LICENSE file of the dataset repository. +### Apache License: + +The "Research Paper Dataset 2023" is distributed under the Apache License 2.0. +You can find a copy of the license in the LICENSE file of the dataset +repository. -The specific licensing and usage terms for this dataset can be found in the dataset repository or documentation. -Please make sure to review and comply with the applicable license and usage terms before downloading and using the dataset. +The specific licensing and usage terms for this dataset can be found in the +dataset repository or documentation. Please make sure to review and comply with +the applicable license and usage terms before downloading and using the dataset. ### Example Usage: -To load the "Research Paper Dataset 2023" using the Hugging Face Datasets Library in Python, you can use the following code: +To load the "Research Paper Dataset 2023" using the Hugging Face Datasets +Library in Python, you can use the following code: ```python from datasets import load_dataset dataset = load_dataset("Falah/research_paper2023") ``` -### Application of "Research Paper Dataset 2023" for NLP Text Classification and Chatbot Models - -The "Research Paper Dataset 2023" can be a valuable resource for various Natural Language Processing (NLP) tasks, including text classification and generating titles for books in the context of chatbot models. Here are some ways this dataset can be utilized for these applications: -1. **Text Classification**: The dataset's features, such as the title and abstract of research papers, can be used to train a text classification model. By assigning appropriate labels to the research papers based on their topics or fields of study, the model can learn to classify new research papers into different categories. For example, the model can predict whether a research paper is related to computer science, biology, physics, etc. This text classification model can then be adapted for other applications that require categorizing text. +### Application of "Research Paper Dataset 2023" for NLP Text Classification and Chatbot Models -2. **Book Title Generation for Chatbot Models**: By utilizing the research paper titles in the dataset, a natural language generation model, such as a sequence-to-sequence model or a transformer-based model, can be trained to generate book titles. The model can be fine-tuned on the research paper titles to learn patterns and structures in generating relevant and meaningful book titles. This can be a useful feature for chatbot models that recommend books based on specific research topics or areas of interest. +The "Research Paper Dataset 2023" can be a valuable resource for various Natural +Language Processing (NLP) tasks, including text classification and generating +titles for books in the context of chatbot models. Here are some ways this +dataset can be utilized for these applications: + +1. **Text Classification**: The dataset's features, such as the title and + abstract of research papers, can be used to train a text classification + model. By assigning appropriate labels to the research papers based on their + topics or fields of study, the model can learn to classify new research + papers into different categories. For example, the model can predict whether + a research paper is related to computer science, biology, physics, etc. This + text classification model can then be adapted for other applications that + require categorizing text. + +2. **Book Title Generation for Chatbot Models**: By utilizing the research paper + titles in the dataset, a natural language generation model, such as a + sequence-to-sequence model or a transformer-based model, can be trained to + generate book titles. The model can be fine-tuned on the research paper + titles to learn patterns and structures in generating relevant and meaningful + book titles. This can be a useful feature for chatbot models that recommend + books based on specific research topics or areas of interest. ### Potential Benefits: -- Improved Chatbot Recommendations: With the ability to generate book titles related to specific research topics, chatbot models can provide more relevant and personalized book recommendations to users. -- Enhanced User Engagement: By incorporating the text classification model, the chatbot can better understand user queries and respond more accurately, leading to a more engaging user experience. -- Knowledge Discovery: Researchers and students can use the text classification model to efficiently categorize large collections of research papers, enabling quicker access to relevant information in specific domains. +- Improved Chatbot Recommendations: With the ability to generate book titles + related to specific research topics, chatbot models can provide more relevant + and personalized book recommendations to users. +- Enhanced User Engagement: By incorporating the text classification model, the + chatbot can better understand user queries and respond more accurately, + leading to a more engaging user experience. +- Knowledge Discovery: Researchers and students can use the text classification + model to efficiently categorize large collections of research papers, enabling + quicker access to relevant information in specific domains. ### Considerations: -- Data Preprocessing: Before training the NLP models, appropriate data preprocessing steps may be required, such as text cleaning, tokenization, and encoding, to prepare the dataset for model input. -- Model Selection and Fine-Tuning: Choosing the right NLP model architecture and hyperparameters, and fine-tuning the model on the specific tasks, can significantly impact the model's performance and generalization ability. -- Ethical Use: Ensure that the generated book titles and text classification predictions are used responsibly and ethically, respecting copyright and intellectual property rights. +- Data Preprocessing: Before training the NLP models, appropriate data + preprocessing steps may be required, such as text cleaning, tokenization, and + encoding, to prepare the dataset for model input. +- Model Selection and Fine-Tuning: Choosing the right NLP model architecture and + hyperparameters, and fine-tuning the model on the specific tasks, can + significantly impact the model's performance and generalization ability. +- Ethical Use: Ensure that the generated book titles and text classification + predictions are used responsibly and ethically, respecting copyright and + intellectual property rights. ### Conclusion: -The "Research Paper Dataset 2023" holds great potential for enhancing NLP text classification models and chatbot systems. By leveraging the dataset's features and information, NLP applications can be developed to aid researchers, students, and readers in finding relevant research papers and generating meaningful book titles for their specific interests. Proper utilization of this dataset can lead to more efficient information retrieval and improved user experiences in the domain of research and academic literature exploration. +The "Research Paper Dataset 2023" holds great potential for enhancing NLP text +classification models and chatbot systems. By leveraging the dataset's features +and information, NLP applications can be developed to aid researchers, students, +and readers in finding relevant research papers and generating meaningful book +titles for their specific interests. Proper utilization of this dataset can lead +to more efficient information retrieval and improved user experiences in the +domain of research and academic literature exploration. diff --git a/data/datasets/research_papers_dataset/load_dataset.py b/data/datasets/research_papers_dataset/load_dataset.py index 36ce97e040..4602f0d253 100644 --- a/data/datasets/research_papers_dataset/load_dataset.py +++ b/data/datasets/research_papers_dataset/load_dataset.py @@ -1,2 +1,3 @@ from datasets import load_dataset + dataset = load_dataset("Falah/research_paper2023") diff --git a/data/datasets/research_papers_dataset/package-lock.json b/data/datasets/research_papers_dataset/package-lock.json new file mode 100644 index 0000000000..f370609afd --- /dev/null +++ b/data/datasets/research_papers_dataset/package-lock.json @@ -0,0 +1,26 @@ +{ + "husky": { + "hooks": { + "pre-commit": "lint-staged" + } + }, + "lint-staged": { + "*.{js,jsx,ts,tsx,json,css,scss,md}": [ + "prettier --write", + "git add" + ] + } +} +{ + "husky": { + "hooks": { + "pre-commit": "lint-staged" + } + }, + "lint-staged": { + "*.{js,jsx,ts,tsx,json,css,scss,md}": [ + "prettier --write", + "git add" + ] + } +} From c32b7da2ef8ab3611a734803e6a685ced561b567 Mon Sep 17 00:00:00 2001 From: your name Date: Sun, 23 Jul 2023 08:33:09 +0300 Subject: [PATCH 5/8] updata dataset and add new dataset --- data/datasets/medium_articles_posts/README.md | 4 ++-- data/datasets/medium_articles_posts/__init__.py | 0 data/datasets/medium_articles_posts/requirements.txt | 1 + data/datasets/research_papers_dataset/__init__.py | 0 data/datasets/research_papers_dataset/requirements.txt | 1 + data/datasets/semantics_ws_qna_oa/__init__.py | 0 data/datasets/sentiments-dataset-381-classes/README.md | 2 +- data/datasets/sentiments-dataset-381-classes/__init__.py | 0 data/datasets/sentiments-dataset-381-classes/requirements.txt | 1 + 9 files changed, 6 insertions(+), 3 deletions(-) create mode 100644 data/datasets/medium_articles_posts/__init__.py create mode 100644 data/datasets/medium_articles_posts/requirements.txt create mode 100644 data/datasets/research_papers_dataset/__init__.py create mode 100644 data/datasets/research_papers_dataset/requirements.txt create mode 100644 data/datasets/semantics_ws_qna_oa/__init__.py create mode 100644 data/datasets/sentiments-dataset-381-classes/__init__.py create mode 100644 data/datasets/sentiments-dataset-381-classes/requirements.txt diff --git a/data/datasets/medium_articles_posts/README.md b/data/datasets/medium_articles_posts/README.md index d355795186..65b8211e5d 100644 --- a/data/datasets/medium_articles_posts/README.md +++ b/data/datasets/medium_articles_posts/README.md @@ -23,14 +23,14 @@ The dataset consists of the following features: The dataset is split into the following part: -- **Train**: +- **Train**: - Number of examples: 192,368 - Size: 1,044,746,687 bytes (approximately 1000 MB) ## Download Size - **Compressed Download Size**: 601,519,297 bytes (approximately 600 MB) -### Usage example +### Usage example ```python from datasets import load_dataset #Load the dataset diff --git a/data/datasets/medium_articles_posts/__init__.py b/data/datasets/medium_articles_posts/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/medium_articles_posts/requirements.txt b/data/datasets/medium_articles_posts/requirements.txt new file mode 100644 index 0000000000..e9f023c9e0 --- /dev/null +++ b/data/datasets/medium_articles_posts/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0 \ No newline at end of file diff --git a/data/datasets/research_papers_dataset/__init__.py b/data/datasets/research_papers_dataset/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/research_papers_dataset/requirements.txt b/data/datasets/research_papers_dataset/requirements.txt new file mode 100644 index 0000000000..e9f023c9e0 --- /dev/null +++ b/data/datasets/research_papers_dataset/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0 \ No newline at end of file diff --git a/data/datasets/semantics_ws_qna_oa/__init__.py b/data/datasets/semantics_ws_qna_oa/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/sentiments-dataset-381-classes/README.md b/data/datasets/sentiments-dataset-381-classes/README.md index 1a18a15e1f..63d712f2af 100644 --- a/data/datasets/sentiments-dataset-381-classes/README.md +++ b/data/datasets/sentiments-dataset-381-classes/README.md @@ -312,7 +312,7 @@ The dataset includes the following sentiment class names as examples: - Whimsical - Intertwining - - and more -## Usage example +## Usage example ```python from datasets import load_dataset #Load the dataset diff --git a/data/datasets/sentiments-dataset-381-classes/__init__.py b/data/datasets/sentiments-dataset-381-classes/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/sentiments-dataset-381-classes/requirements.txt b/data/datasets/sentiments-dataset-381-classes/requirements.txt new file mode 100644 index 0000000000..e9f023c9e0 --- /dev/null +++ b/data/datasets/sentiments-dataset-381-classes/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0 \ No newline at end of file From 6b1cbe02d64503c71ef0b438c62c83c68b9d1fa7 Mon Sep 17 00:00:00 2001 From: your name Date: Sun, 23 Jul 2023 08:38:43 +0300 Subject: [PATCH 6/8] updata dataset add new dataset for research_papers_and_medium_articles+post --- data/datasets/semantics_ws_qna_oa/__init__.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 data/datasets/semantics_ws_qna_oa/__init__.py diff --git a/data/datasets/semantics_ws_qna_oa/__init__.py b/data/datasets/semantics_ws_qna_oa/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 From b86354b770a0db46c932c2dd9dcf03e737e33f01 Mon Sep 17 00:00:00 2001 From: your name Date: Sun, 23 Jul 2023 09:06:52 +0300 Subject: [PATCH 7/8] updata_to_medium_post_dataset --- data/datasets/medium_articles_posts/README.md | 24 ++++++---- .../medium_articles_posts/load_dataset.py | 3 +- .../medium_articles_posts/requirements.txt | 2 +- .../research_papers_dataset/package-lock.json | 26 ---------- .../research_papers_dataset/requirements.txt | 2 +- .../sentiments-dataset-381-classes/README.md | 47 ++++++++++++------- .../requirements.txt | 2 +- 7 files changed, 51 insertions(+), 55 deletions(-) delete mode 100644 data/datasets/research_papers_dataset/package-lock.json diff --git a/data/datasets/medium_articles_posts/README.md b/data/datasets/medium_articles_posts/README.md index 65b8211e5d..1b915cc296 100644 --- a/data/datasets/medium_articles_posts/README.md +++ b/data/datasets/medium_articles_posts/README.md @@ -2,18 +2,22 @@ ## Description -The Medium Articles Posts dataset contains a collection of articles published on the Medium platform. Each article entry includes information such as the article's title, main content or text, associated URL or link, authors' names, timestamps, and tags or categories. +The Medium Articles Posts dataset contains a collection of articles published on +the Medium platform. Each article entry includes information such as the +article's title, main content or text, associated URL or link, authors' names, +timestamps, and tags or categories. ## Dataset Info The dataset consists of the following features: -- **title**: *(string)* The title of the Medium article. -- **text**: *(string)* The main content or text of the Medium article. -- **url**: *(string)* The URL or link to the Medium article. -- **authors**: *(string)* The authors or contributors of the Medium article. -- **timestamp**: *(string)* The timestamp or date when the Medium article was published. -- **tags**: *(string)* Tags or categories associated with the Medium article. +- **title**: _(string)_ The title of the Medium article. +- **text**: _(string)_ The main content or text of the Medium article. +- **url**: _(string)_ The URL or link to the Medium article. +- **authors**: _(string)_ The authors or contributors of the Medium article. +- **timestamp**: _(string)_ The timestamp or date when the Medium article was + published. +- **tags**: _(string)_ Tags or categories associated with the Medium article. ## Dataset Size @@ -30,10 +34,12 @@ The dataset is split into the following part: ## Download Size - **Compressed Download Size**: 601,519,297 bytes (approximately 600 MB) -### Usage example + +### Usage example + ```python from datasets import load_dataset #Load the dataset dataset = load_dataset("Falah/medium_articles_posts") -``` \ No newline at end of file +``` diff --git a/data/datasets/medium_articles_posts/load_dataset.py b/data/datasets/medium_articles_posts/load_dataset.py index 1cc8027b1d..d8b750a3b8 100644 --- a/data/datasets/medium_articles_posts/load_dataset.py +++ b/data/datasets/medium_articles_posts/load_dataset.py @@ -1,3 +1,4 @@ from datasets import load_dataset -#Load the dataset + +# Load the dataset dataset = load_dataset("Falah/medium_articles_posts") diff --git a/data/datasets/medium_articles_posts/requirements.txt b/data/datasets/medium_articles_posts/requirements.txt index e9f023c9e0..76de43c3ed 100644 --- a/data/datasets/medium_articles_posts/requirements.txt +++ b/data/datasets/medium_articles_posts/requirements.txt @@ -1 +1 @@ -datasets==2.9.0 \ No newline at end of file +datasets==2.9.0 diff --git a/data/datasets/research_papers_dataset/package-lock.json b/data/datasets/research_papers_dataset/package-lock.json deleted file mode 100644 index f370609afd..0000000000 --- a/data/datasets/research_papers_dataset/package-lock.json +++ /dev/null @@ -1,26 +0,0 @@ -{ - "husky": { - "hooks": { - "pre-commit": "lint-staged" - } - }, - "lint-staged": { - "*.{js,jsx,ts,tsx,json,css,scss,md}": [ - "prettier --write", - "git add" - ] - } -} -{ - "husky": { - "hooks": { - "pre-commit": "lint-staged" - } - }, - "lint-staged": { - "*.{js,jsx,ts,tsx,json,css,scss,md}": [ - "prettier --write", - "git add" - ] - } -} diff --git a/data/datasets/research_papers_dataset/requirements.txt b/data/datasets/research_papers_dataset/requirements.txt index e9f023c9e0..76de43c3ed 100644 --- a/data/datasets/research_papers_dataset/requirements.txt +++ b/data/datasets/research_papers_dataset/requirements.txt @@ -1 +1 @@ -datasets==2.9.0 \ No newline at end of file +datasets==2.9.0 diff --git a/data/datasets/sentiments-dataset-381-classes/README.md b/data/datasets/sentiments-dataset-381-classes/README.md index 63d712f2af..23a5526354 100644 --- a/data/datasets/sentiments-dataset-381-classes/README.md +++ b/data/datasets/sentiments-dataset-381-classes/README.md @@ -1,37 +1,45 @@ --- dataset_info: features: - - name: text - dtype: string - - name: sentiment - dtype: string + - name: text + dtype: string + - name: sentiment + dtype: string splits: - - name: train - num_bytes: 104602 - num_examples: 1061 + - name: train + num_bytes: 104602 + num_examples: 1061 download_size: 48213 dataset_size: 104602 license: apache-2.0 task_categories: -- text-classification + - text-classification language: -- en + - en pretty_name: sentiments-dataset-381-classes size_categories: -- 1K Date: Sun, 23 Jul 2023 09:16:11 +0300 Subject: [PATCH 8/8] update dataset --- data/datasets/medium_articles_posts/requirements.txt | 1 + data/datasets/research_papers_dataset/requirements.txt | 1 + data/datasets/sentiments-dataset-381-classes/requirements.txt | 1 + 3 files changed, 3 insertions(+) diff --git a/data/datasets/medium_articles_posts/requirements.txt b/data/datasets/medium_articles_posts/requirements.txt index 76de43c3ed..7883858ca7 100644 --- a/data/datasets/medium_articles_posts/requirements.txt +++ b/data/datasets/medium_articles_posts/requirements.txt @@ -1 +1,2 @@ datasets==2.9.0 + diff --git a/data/datasets/research_papers_dataset/requirements.txt b/data/datasets/research_papers_dataset/requirements.txt index 76de43c3ed..7883858ca7 100644 --- a/data/datasets/research_papers_dataset/requirements.txt +++ b/data/datasets/research_papers_dataset/requirements.txt @@ -1 +1,2 @@ datasets==2.9.0 + diff --git a/data/datasets/sentiments-dataset-381-classes/requirements.txt b/data/datasets/sentiments-dataset-381-classes/requirements.txt index 76de43c3ed..7883858ca7 100644 --- a/data/datasets/sentiments-dataset-381-classes/requirements.txt +++ b/data/datasets/sentiments-dataset-381-classes/requirements.txt @@ -1 +1,2 @@ datasets==2.9.0 +