Clone the repository
https://https://github.com/Kshitij-Nishant/Text-Summarizationconda create -n summary python=3.8 -yconda activate summarypip install -r requirements.txtpython template.py-
Update config.yaml
-
Update params.yaml
-
Update entity
-
Update the configuration manager in src config
-
update the components
-
update the pipeline
-
update the main.py
-
Check the complete flow of code execution:
python main.py- After each stage use below to push to Github:
git add .
git commit -m "<Put Caption here on the updates>"
git push origin mainAdd the prediction pipeline and Update the app.py
check for complete flow of code execution in FastAPI:
# Finally run the following command
python app.pyNow,
open up you local host and port......Push to git after
Author: Kshitij Nishant
Data Scientist Practitioner
Email: kshitijnishant09@gmail.com#with specific access
1. EC2 access : It is virtual machine
2. ECR: Elastic Container registry to save your docker image in aws
#Description: About the deployment
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
#Policy:
1. AmazonEC2ContainerRegistryFullAccess
2. AmazonEC2FullAccess
- Save the URI: 381492009295.dkr.ecr.ap-south-1.amazonaws.com/textsum
#optinal
sudo apt-get update -y
sudo apt-get upgrade
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting>actions>runner>new self hosted runner> choose os> then run command one by one
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = ap-south-1
AWS_ECR_LOGIN_URI = demo>> 381492009295.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY_NAME = simple-app
Input Text:
"National Aluminium Company Limited (NALCO) is a Schedule ‘A’ Navratna CPSE established on 7th January, 1981 having its registered office at Bhubaneswar. It is one of the largest integrated Bauxite-Alumina-Aluminium- Power Complex in the Country. At present, Government of India holds 51.28% of paid up equity capital. The Company has been operating its captive Panchpatmali Bauxite Mines for the pit head Alumina refinery at Damanjodi, in the District of Koraput in Odisha and Aluminium Smelter & Captive Power Plant at Angul. As a part of green initiative, NALCO has installed 198 MW Wind Power Plants at various locations in India and 850 kWp roof top Solar Power Plants at its premises to join hands for carbon neutrality. From the days of first commercial operation since 1987 the Company has continuously earned profits for last 36 years. NALCO is one of the leading foreign exchange earning CPSEs of the Country."
Output Text:
"National Aluminium Company Limited (NALCO) is a Schedule ‘A’ Navratna CPSE established on 7th January, 1981 .It is one of the largest integrated Bauxite-Alumina-Aluminium- Power Complex in the Country .Government of India holds 51.28% of paid up equity capital ."
For mobile, I gave a rather longer paragraph description of the show "Bridgerton" and this is the summary I got back from the model.
Input Text:
"Bridgerton is an American historical romance television series created by Chris Van Dusen for Netflix. Based on the book series by Julia Quinn, it is Shondaland's first scripted show for Netflix. The series is set during the early 1800s in an alternative London Regency era, in which George III established racial equality and granted many people of African descent aristocratic titles due to the African heritage of his wife, Queen Charlotte. The viewer is taken to observe the highly competitive social season; where young marriageable nobility and gentry are introduced into society.
The first season debuted on December 25, 2020. The second season premiered on March 25, 2022. Part one of the third season premiered on May 16, 2024, with part two following on June 13, 2024.[1] The series was renewed for a fourth season in April 2021.[2][3] In May 2023, Queen Charlotte: A Bridgerton Story, a spin-off series focused on Queen Charlotte, was released.
Bridgerton was positively received for its direction, actors' performances, production and set design, winning two Primetime Creative Arts Emmy Awards, a Make-Up Artists And Hair Stylists Guild Awards, and nominations at the Primetime Emmy Awards, Screen Actors Guild Awards, Satellite Awards and NAACP Image Awards. The music score by Kris Bowers earned a Grammy Award nomination for Best Score Soundtrack for Visual Media."
Output Text:
"Bridgerton is an American historical romance television series created by Chris Van Dusen for Netflix .Based on the book series by Julia Quinn, it is Shondaland's first scripted show for Netflix ."
We can see from the second images, respectively, that the whole paragraph has been summarized to a few lines with a good enough information. Hence demonstrating the model's efficiency in understanding the keywords and giving the user valuable insight about the subject in lesser lines in and under a minute time.
(PS: We can also increase the number of words in summarized paragraph.)
-
In filtering long written rage and vulgar comments: It will be more computationally efficient if the Filter model takes in the summarized text of this model and use it as it's input.
-
Summarizing comments on a product: User's buying a product can go through the summary of all the comments made on the product from previous buyers rather than going through each comment just to understand if the product is worth it or not.
-
Creating Headlines for an article: In a world filled with information, headlines serve as the first point of contact, grabbing the reader's attention and enticing them to read further and provides a quick summary of the article or news piece, giving readers an idea of what to expect.



