-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial repository publish checkpoint
- Loading branch information
KSMubasshir
committed
Jan 30, 2023
1 parent
c373818
commit 7d9cf4d
Showing
38 changed files
with
1,748 additions
and
1,701 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,57 @@ | ||
# bd-newspaper-crawlers | ||
 | ||
[](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/LICENSE.md) | ||
[](https://github.com/KSMubasshir/bd-newspaper-crawlers) | ||
[](https://github.com/KSMubasshir/bd-newspaper-crawlers/stargazers) | ||
|
||
|
||
A collection of Bangla Newspaper and Blog crawlers. Can be used to mine Bangla text data for Natural Language Processing tasks. | ||
## List of Crawlers | ||
| Site Name | Site Type | Language | Crawler | | ||
|---------------------------------------------------------|-----------|----------|-----------------------------------------------------------------------------------------------------------| | ||
| [Bangladesh Pratidin](https://www.bd-pratidin.com/) | News | Bangla | [bdpratidin.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bdpratidin.py) | | ||
| [Anandabazar](https://www.anandabazar.com/) | News | Bangla | [anandabazar.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/anandabazar.py) | | ||
| [24 Live News](https://www.bangla.24livenewspaper.com/) | News | Bangla | [24livenews.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/24livenews.py) | | ||
| [Amra Bondhu](https://www.amrabondhu.com/) | Blog | Bangla | [amrabondhu.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/amrabondhu.py) | | ||
| [Bangla Blog](http://banglablog.in/) | Blog | Bangla | [banglablog.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/banglablog.py) | | ||
| [Bangla News 24](https://www.banglanews24.com/) | News | Bangla | [banglanews24.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/banglanews24.py) | | ||
| [Biggani.org](https://biggani.org/) | Blog | Bangla | [biggani.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/biggani.py) | | ||
| [Biggan Blog](https://bigganblog.org/) | Blog | Bangla | [bigganblog.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bigganblog.py) | | ||
| [Biggan Projukti](http://www.bigganprojukti.com/) | Blog | Bangla | [bigganprojukti.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bigganprojukti.py) | | ||
| Sl | Site Name | Site Type | Language | Crawler | | ||
|-----|-------------------------------------------------------------------|--------------|----------|---------------------------------------------------------------------------------------------------------------------| | ||
| 1 | [Prothom Alo - Bangla](https://www.prothomalo.com/) | News | Bangla | [prothomalo_bn.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/prothomalo_bn.py) | | ||
| 2 | [Prothom Alo - English](https://en.prothomalo.com/) | News | English | [prothomalo_en.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/prothomalo_en.py) | | ||
| 3 | [Bangladesh Pratidin](https://www.bd-pratidin.com/) | News | Bangla | [bdpratidin.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bdpratidin.py) | | ||
| 4 | [Kalerkantho](https://www.kalerkantho.com/online) | News | Bangla | [kalerkantho.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/kalerkantho.py) | | ||
| 5 | [Daily Inqilab](https://www.dailyinqilab.com/) | News | Bangla | [inqilab.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/inqilab.py) | | ||
| 6 | [Samakal](https://samakal.com/) | News | Bangla | [samakal.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/samakal.py) | | ||
| 7 | [Jugantor](https://www.jugantor.com/) | News | Bangla | [jugantor.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/jugantor.py) | | ||
| 8 | [Ittefaq - Bangla](https://www.ittefaq.com.bd/) | News | Bangla | [ittefaq_bn.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/ittefaq_bn.py) | | ||
| 9 | [Ittefaq - English](https://en.ittefaq.com.bd/) | News | English | [ittefaq_en.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/ittefaq_en.py) | | ||
| 10 | [The Daily Star - Bangla](https://bangla.thedailystar.net/) | News | Bangla | [daily_star.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/daily_star.py) | | ||
| 11 | [Anandabazar](https://www.anandabazar.com/) | News | Bangla | [anandabazar.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/anandabazar.py) | | ||
| 12 | [Zee News - Bangla](https://zeenews.india.com/bengali/) | News | Bangla | [crawler_zeenews.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/crawler_zeenews.py) | | ||
| 13 | [Voice of America - Bangla](https://www.voabangla.com/ ) | News | Bangla | [crawler_voabangla.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/crawler_voabangla.py) | | ||
| 14 | [Hindustan Times - Bangla](https://bangla.hindustantimes.com/) | News | Bangla | [hindustantimes.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/hindustantimes.py) | | ||
| 15 | [The Business Standard - Bangla](https://www.tbsnews.net/bangla/) | News | Bangla | [crawler_tbs.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/crawler_tbs.py) | | ||
| 16 | [Dhaka Tribune](https://bangla.dhakatribune.com/) | News | Bangla | [dhakatribune.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/dhakatribune.py) | | ||
| 17 | [NTV](https://www.ntvbd.com/) | News | Bangla | [ntvbd.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/ntvbd.py) | | ||
| 18 | [Indian Express - Bangla](https://bengali.indianexpress.com/) | News | Bangla | [indianexpress.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/indianexpress.py) | | ||
| 19 | [Ei Samay](https://eisamay.com/us) | News | Bangla | [eisamay.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/eisamay.py) | | ||
| 20 | [Amader Shomoy](https://www.dainikamadershomoy.com/) | News | Bangla | [dainikamadershomoy.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/dainikamadershomoy.py) | | ||
| 21 | [Daily Bangladesh](https://www.daily-bangladesh.com/) | News | Bangla | [daily_bangladesh.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/daily_bangladesh.py) | | ||
| 22 | [Sangbad Pratidin](https://www.sangbadpratidin.in/) | News | Bangla | [sangbadpratidin.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/sangbadpratidin.py) | | ||
| 23 | [24 Live News](https://www.bangla.24livenewspaper.com/) | News | Bangla | [24livenews.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/24livenews.py) | | ||
| 24 | [Amra Bondhu](https://www.amrabondhu.com/) | Blog | Bangla | [amrabondhu.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/amrabondhu.py) | | ||
| 25 | [Bangla Blog](http://banglablog.in/) | Blog | Bangla | [banglablog.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/banglablog.py) | | ||
| 26 | [Bangla News 24](https://www.banglanews24.com/) | News | Bangla | [banglanews24.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/banglanews24.py) | | ||
| 27 | [Biggani.org](https://biggani.org/) | Blog | Bangla | [biggani.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/biggani.py) | | ||
| 28 | [Biggan Blog](https://bigganblog.org/) | Blog | Bangla | [bigganblog.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bigganblog.py) | | ||
| 29 | [Biggan Projukti](http://www.bigganprojukti.com/) | Blog | Bangla | [bigganprojukti.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bigganprojukti.py) | | ||
| 30 | [Bigyan](https://bigyan.org.in/) | Blog | Bangla | [bigyan.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/bigyan.py) | | ||
| 31 | [Cadet College Blog](https://cadetcollegeblog.com/) | Blog | Bangla | [cadetcollegeblog.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/cadetcollegeblog.py) | | ||
| 32 | [cpbook by Subeen](http://cpbook.subeen.com/) | Blog | Bangla | [cpsubeen.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/cpsubeen.py) | | ||
| 33 | [Porjotonlipi](https://porjotonlipi.com/) | Blog | Bangla | [crawler_porjotonlipi.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/crawler_porjotonlipi.py) | | ||
| 34 | [Tagore Web](https://www.tagoreweb.in/) | Blog | Bangla | [crawler_tagoreweb.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/crawler_tagoreweb.py) | | ||
| 35 | [Dakghar](https://www.dakghar24.com/) | News | Bangla | [dakghar.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/dakghar.py) | | ||
| 36 | [Dmp News](https://dmpnews.org/) | News | Bangla | [dmpnews.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/dmpnews.py) | | ||
| 37 | [hindime](https://hindime.net/) | Blog | Hindi | [hindime.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/hindime.py) | | ||
| 38 | [Jagran](https://www.jagran.com/) | News | Hindi | [jagran.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/jagran.py) | | ||
| 39 | [Nirbik](https://www.nirbik.com/) | Blog | Bangla | [nirbik.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/nirbik.py) | | ||
| 40 | [Onnodristy](https://onnodristy.com/) | News | Bangla | [onnodristy.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/onnodristy.py) | | ||
| 41 | [Department of Agricultural Extension](http://dae.portal.gov.bd/) | Govt. Portal | Bangla | [portalgov.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/portalgov.py) | | ||
| 42 | [Sastha Bangla](http://www.sasthabangla.com/) | Blog | Bangla | [sasthabangla.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/sasthabangla.py) | | ||
| 43 | [Shopnobaz](https://shopnobaz.net/) | Blog | Bangla | [shopnobaz.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/shopnobaz.py) | | ||
| 44 | [Songramer Notebook](https://songramernotebook.com/) | Blog | Bangla | [songramernotebook.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/songramernotebook.py) | | ||
| 45 | [Subeen](http://subeen.com/) | Blog | Bangla | [subeen.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/subeen.py) | | ||
| 46 | [Tech Tunes](https://www.techtunes.io/) | Blog | Bangla | [techtunes.py](https://github.com/KSMubasshir/bd-newspaper-crawlers/blob/master/techtunes.py) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.