Corpus Builder is a web application designed to convert unclean text into clean, segmented text. It processes the input text by removing numbers and formatting it for better readability. This tool is ideal for preparing text data for analysis or presentation.
- Single Line Conversion: Removes line breaks to convert text into a single line.
- Automatic Segmentation: Adds line breaks after each occurrence of a number.
- Number Removal: Strips numbers from the text, leaving behind clean, segmented content.
You can try the live version of the application here.
- React: JavaScript library for building user interfaces.
- Tailwind CSS: Utility-first CSS framework for styling.
- Netlify: Platform for deploying web applications.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/ankitklakra/corpus-builder.git
-
Navigate to the project directory:
cd corpus-builder
-
Install the dependencies:
npm install
-
Start the development server:
npm start
-
Open your browser and go to
http://localhost:3000
to view the application.
- Enter the unclean text into the provided text area.
- Click the "Submit" button to process the text.
- The cleaned and segmented text will appear in the output text area.
If you want to contribute to this project, feel free to fork the repository and submit a pull request. Any contributions are welcome!
This project is licensed under the MIT License.