viz.ipynb: Characters list Data Pre-processing Basic Vizualisation Book-indices - ends Dictionary of books
src\components.py List of Characters NLTK Text objects Book to Text mappings (dictionary : unprocessed) books_to_chaps : Dictionary of book mapped to list of chapters in it

proc_data -> redundant(except doesnt have 's removed and is all .lower())

Data with Text: ('s removal has been done locally-at the isage point) c_b_t.csv final_data.csv data.csv

==> cpo.csv -> clusts.csv(removed 's) ==> freq_occs_by_book.csv -> static(ie., blocks of words) co-occurences, book wise ==> freq_occs.csv -> static co-occs, chapter wise ==> cooccs_final -> sensible co-occs(ie., from word dist and context)(calculated canto wise), by book

======> Sentiments_books and cooccc_books can be merged Same for chaps

=====================================================================================================

DELIVERABLES

C - Chapter wise metric B - Book wise metric

Dendograms(C)
Networks(B)
Theme(+ve/-ve)(B)
Deg centralitiy graph over books | (add chars to ch list to analyse more) | labels over graph
DC per char per book | get from orig dict - deg
Summarization - per chapter against dendogram, per book against k core

too slow! 4) K-means clusters scatter(C) | Need mod -> on-hover label | or include in network

streamlit, gradio, Langchain, PineCone, FAISS, git

EDA : word2vec, TSNE, NER, NLP
Sentiment analysis, network analysis/(centrality metrics), statistical analysis, visualization, streamlit, gradio deployment
LLM, embedding, vector db, question answering, similarity, similarity metrics, model/param tuning
Clustering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToC.md

ToC.md

Files

ToC.md

Latest commit

History

ToC.md

File metadata and controls