Entity network #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

caiocmello wants to merge 3 commits into main from entity_network

Collaborator

caiocmello commented Apr 16, 2025

Hi @Ferdaous-af,

Can I ask you please to review this notebook? Please, refer to the reviewer's guidelines below.

Notebook file: https://github.com/impresso/impresso-datalab-notebooks/blob/main/explore-vis/entity_network.ipynb

Reviewer's guidelines:

Is the code consistent? Eg. Use of variables, formatting
Is the explanation of the code correct? Is there imprecise information? Could we expand it in some aspect you consider important to understand the code?
Are there any references to external resources that could enrich this notebook?
Is the information (text) contained in the NB enough to perform the proposed task? Is there something missing?
Do the objectives under 'what you will learn' match the content? Do we provide everything we promise in the objectives?

caiocmello added 2 commits

April 16, 2025 16:35


          layout-review

31f66ff


          joined code in query

f694946

caiocmello requested a review from Ferdaous-af

April 16, 2025 14:47

review-notebook-app bot commented Apr 16, 2025

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

caiocmello marked this pull request as ready for review

April 16, 2025 14:47

This was referenced Apr 16, 2025

NB#07 Exploring Entity Co-occurrence Networks #38

Open

NB#09 Visualising Place Entities on Maps #37

Open

Ferdaous-af reviewed

View reviewed changes

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

We define people's connections based on whether they occur in the same content item

Small rewording suggestion for clarity:

We define people's connections based on whether their names are mentioned in the same content item.

Please note that when a newspaper does not have segmentation (OLR)

Perhaps add here what OLR is and a link to the FAQ section on it: https://impresso-project.ch/app/faq#What-OCR

Suggestion:

Please note that when a newspaper lacks segmentation (OLR – Optical Layout Recognition), content items for this title correspond to entire pages.

To unveil the reasons why they occur together, further analysis using different methods is necessary.

Suggested clarification:

Understanding the reason behind a co-occurrence typically requires further contextual or qualitative analysis.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

I think it would be clearer to harmonize the structure across the list (suggestion below). Also, it could be useful to clarify that “persons” refer to named entities extracted from content items, and to specify what formats are meant by “different formats” when exporting the network graph.

Suggested rewording:

`What will you learn?`

By completing this notebook, you will learn how to:

Retrieve a list of named persons mentioned in content items for a given query;
Transform this list of entities into a dataframe suitable for generating co-occurrence network graphs;
Create and display an interactive network graph to visualise connections between persons mentioned together in Impresso content;
Export the resulting dataframes as CSV files to support reproducibility
Save the network graph in different formats (png, svg, gexf, and json) for further analysis.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

If it's okay to make this part a little bit longer, it would helpful to briefly describe what each link offers and perhaps also add links to documentation on Impresso API and python library, NetworkX and ipysigma. Also good to change the order of the two suggested resources, as the "Exploring and Analyzing etc." assumes familiarity with "From Hermeneutics to data etc."

Suggestion:

`Useful resources`

If you’d like to go deeper into network analysis or its use in historical research, the following resources are recommended:

From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources: A conceptual and practical guide to extracting structured data from historical sources and creating meaningful network visualizations
Exploring and Analyzing Network Data with Python: An introduction to working with the NetworkX package and drawing conclusions from network metrics when working with humanities data.

Additional references:

- Impresso Public API documentation

- Impresso python library

- NetworkX documentation

- ipysigma documentation

Also, a suggestion of other resources:

Introduction to Social Network Analysis: Youtube tutorials by Martin Grandjean reviewing the main concepts of social network analysis, and highlighting the challenges that arise when analyzing relational historical objects.

Demystifying Networks, Parts I & II by Scott B. Weingart: an older but still interesting resource with a simple introduction to networks, including concept definitions and key vocabulary

The Six Degrees of Francis Bacon Project: a DH project that reconstructs the social network of early modern intellectual life in Britain and includes publications and methodology.

Historical Network Research Community: A hub for scholars working at the intersection of history and network analysis. Offers conference proceedings, reading lists, and tutorials.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

The sentence “all person entities mentioned in all articles that talk about the Prague Spring” is a bit misleading and could be interpreted as exhaustive, while in reality, if I'm not wrong, the query in the code returns the top 100 most frequently mentioned person entities associated with that query, not all possible results.

Suggested rephrasing:

First, we retrieve the top 100 most frequently mentioned person entities in all articles that talk about the Prague Spring using search facets method from the Impresso Python library.

Also, a brief explanation of what "search facets method" would be helpful

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

I think an explanation of the output of this query would be helpful to avoid any confusion on the results.

Actual results of the query: "Contains 100 items (0 - 100) of 2355 total items": the word item itself can be confused with content item explained above, which is not the case here as the results are about named entities.

Suggestion:

The result is a list of the 100 most frequently mentioned person entities, where each entry includes:

a unique identifier (value),
the number of times the person is mentioned (count),
and the display name (label).

Note: these 100 entries are the most frequent out of a total of 2,355 persons mentioned in all matched content items.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

If running in Colab - activate custom widgets to allow ipysigma to render the graph.

It would be helpful to add more details here on how to activate custom widget, I'm not sure how one does this in Colab ? in any case the code rendered correctly

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

The output will prompt you to choose 'what should represent the size of the nodes' in your graph. Select it before you continue.

I suggest to rephrase this for more clarity on "what should represent the size of the nodes", by clarifying that these are centrality measures.

Suggestion:

The output will prompt you to choose from a dropdown list 'what should represent the size of the nodes', i.e. which centrality measure should determine the size of the nodes in your graph. Select it before you continue. These measures help reveal the structural importance of each node within the network.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

Refresh the next cell after changing the value above.

More clarity needed here, perhaps something along the line "If you want to change the centrality measure above, re-run the next cell to update the visualisation"

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

Line #24.    # Displaying the graph with a size mapped on degree and
# a color mapped on a categorical attribute of the nodes
Sigma(g, node_size=node_size, edge_size='count', clickable_edges=True, )

I believe node size depends on the user’s selection in the dropdown, not only on degree ? Also there is no color mapping defined in the code.

Suggestion: change the comment to "node size based on the selected centrality measure", and add a comment on the weight of the edge or its size "edge thickness based on co-occurrence count", and either not mention the color mapping or add a line in the code where the color is mapped to the attribute.

Reply via ReviewNB

explore-vis/entity_network.ipynb

		@@ -132,17 +132,8 @@
		},

Ferdaous-af Apr 17, 2025 •

edited

Loading

the Visualising Place Entities on Maps notebook

the Named Entity Recognition with impresso-pipelines notebook

The links here don't redirect to the mentioned notebooks.

The correct links I believe are:

Visualising Place Entities on Maps: https://github.com/impresso/impresso-datalab-notebooks/blob/main/explore-vis/place-entities_map.ipynb
Named Entity Recognition with impresso-pipelines: https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoAPI.ipynb

which demonstrates how to visualise in a map mentions to places in the Impresso corpus.

Suggestion for a smoother rephrasing of this sentence:

which shows how to visualise mentions of places from the Impresso corpus in a map.

Reply via ReviewNB


          updates ferdaous comments

7eb747c

caiocmello linked an issue

that may be closed by this pull request

NB#07 Exploring Entity Co-occurrence Networks #38

Open

caiocmello closed this

caiocmello deleted the entity_network branch

April 28, 2025 12:56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet