Health-AI Ethics Atlas - Requirements #29

iamrmrishan · 2024-02-11T19:09:33Z

iamrmrishan
Feb 11, 2024

Is there any specific reason to mention only D3 and Leaflet. I think we can use deck.gl. Only downside is heavy memory demand from clients’ machines to render a map resulting less mobile browser compatibility. Open to discussion.

manascb1344 · 2024-02-15T14:17:10Z

manascb1344
Feb 15, 2024

Hi,
Has anyone who is interested in this project received any tasks?

3 replies

selenbw Feb 23, 2024

The project has not started yet, once it starts we will share the next steps.

selenbw Feb 23, 2024

Please read carefully: #28

kumar-piyush12 Nov 11, 2024

Proposed Contribution: Use of 3D printed food products (esp. meat) (whose model is made via AI prompts) is set to reduce animal cruelty (particularly in meat sector). I want to showcase the rising trends (startups, manufacturing-trends) on a global map (interactive).
Please let me know your remarks.
Email: f20210815@pilani.bits-pilani.ac.in

heymitali · 2024-02-22T10:16:16Z

heymitali
Feb 22, 2024

Hi, I am Mitali. I am very much interested in this project. Kindly guide me for the next steps.

1 reply

selenbw Feb 23, 2024

The project has not started yet, once it starts we will share the next steps.

selenbw · 2024-02-23T19:29:32Z

selenbw
Feb 23, 2024

Is there any specific reason to mention only D3 and Leaflet. I think we can use deck.gl. Only downside is heavy memory demand from clients’ machines to render a map resulting less mobile browser compatibility. Open to discussion.

There is not specific reason to use only D3 and Leaflet. Plan is to decide based on the final group discussion and open to suggestions,

0 replies

selenbw · 2024-02-25T07:25:33Z

selenbw
Feb 25, 2024

Let's discuss: Web Scraping Task

Your first task is to develop a web scraping script to compile a list of AI ethics guidelines and policies specifically related to healthcare or medicine. Focus on the following:

Identify and target reputable websites and online databases that host or reference such guidelines (e.g., WHO, IEEE, medical journals, government health departments).
Your script should extract key information including the name of the guideline/policy, the issuing organization, the publication date, and a brief summary or abstract.
Ensure your script is efficient and can handle different website layouts and structures.
The goal is to create a comprehensive and accurate dataset through automated web scraping, demonstrating your proficiency in coding, data extraction, and handling web data.
Please discuss here, ask questions and provide solutions.

7 replies

KYash03 Feb 27, 2024

Creating a universal web scraping script wouldn't work because, as mentioned, each website has its unique structure and layout. I suggest we experiment with a combination of Beautiful Soup and spaCy/NLTK (NER) to extract the required information.

Apurv428 Feb 28, 2024

Hi @selenbw , do we have any targeted websites for this or we are planning to search the entire web?

selenbw Mar 6, 2024

Publish or Perish and scrapping pubmed is definetely a good start.
Is there any way to compare their results given the same key words?
The websites we should include are below (open to other suggestions)
AI Ethics Guidelines Global Inventory
Linking Artificial Intelligence Principles” (LAIP) guidelines
Pubmed
Embase.com
Web of Science
https://www.aiethicist.org/
https://www.coe.int/en/web/artificial-intelligence/ethical-frameworks
Bonus:
Some Examples to ger inspired (hoping better dosing though :)):
https://playground.airespucrs.org/worldwide-ai-ethics
https://www.airespucrs.org/en/worldwide-ai-ethics

iamrmrishan Mar 6, 2024
Author

PoPCites.csv
I have uploaded the CSV here. we can get title, authors and compare results.

Publish or Perish gets a Google Scholar search (it includes the results from above mentioned sites as well). If we are looking for specific sites, we should consider a script. Also Publish or Perish got options for get a seperate result only with PubMed and Web of Science.

201677I0318 Mar 8, 2024

I agree with @KYash03 that we can't write just one script to crawl all websites.
But we can write different scripts for specific websites and put them in a GUI so that we can have something like Publish or Perish.
From this point of view, not only this GSoC project, we can do something just as interesting, that's a lot of work to be done, I've done the IEEE part, and if anyone is interested, welcome to contribute.

By the way Publish or Perish has a bug, the interface for Google scholar is obsolete and the following failure message appears when used.

manascb1344 · 2024-02-28T18:54:52Z

manascb1344
Feb 28, 2024

Hello @selenbw,

I'd like to discuss how we can ensure that our web scraping approach prioritizes trustworthy sources for AI ethics guidelines and policies in healthcare or medicine. Are there specific websites or databases known for hosting authoritative content in this area? Additionally, should we concentrate on scraping particular websites or gather data from a wide range of sources to build a comprehensive dataset?

I've been exploring web scraping techniques for our task, aiming to collect AI ethics guidelines and policies related to healthcare and medicine. I experimented with BeautifulSoup and spaCy on platforms like PubMed, UNESCO, and IEEE. However, I encountered an obstacle - certain websites have measures in place to deter web scrapers.

Despite this challenge, I made progress, particularly with PubMed. The script I've been refining retrieves HTML content from PubMed search results, utilizes BeautifulSoup for parsing, and extracts specific details such as guideline names, organizations, publication dates, and summaries.

To enhance the script's capabilities, I integrated spaCy for text processing tasks.

However, there are still areas where we can enhance the script. Do you have any suggestions on how we can navigate around websites that block web scrapers?

4 replies

loupdaniel Feb 29, 2024

Hello @manascb1344,

Could you please share platforms that restrict web scrapping?

manascb1344 Feb 29, 2024

Based on my scripts, it looks like both the IEEE and UNESCO websites have implemented measures to prevent web scraping. I'm currently exploring alternative methods to extract the desired data directly from the websites. Any suggestions on overcoming these restrictions would be greatly appreciated!

selenbw Mar 6, 2024

The websites we should include are below (open to other suggestions)
AI Ethics Guidelines Global Inventory
Linking Artificial Intelligence Principles” (LAIP) guidelines
Pubmed
Embase.com
Web of Science
https://www.aiethicist.org/
https://www.coe.int/en/web/artificial-intelligence/ethical-frameworks
ALSO IEEE and UNESCO are great ideas too but I wonder about their restriction you mentioned.
Bonus:
Some Examples to get inspired (hoping better dosing though :)):
https://playground.airespucrs.org/worldwide-ai-ethics
https://www.airespucrs.org/en/worldwide-ai-ethics

201677I0318 Mar 8, 2024

Hello @manascb1344
I recently encountered the same problem when I was working on a scrapy on another project, spoofing a website by simulating browser behavior to bypass anti-scrapy settings often works, I tried to write a script that worked well in IEEE.

Here is my code, hope that helpful to you.

import csv
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import tkinter as tk
from tkinter import ttk
from selenium import webdriver



# Set Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")  # Use headless mode
chrome_options.add_argument("--disable-gpu")  # Disable GPU acceleration
chrome_options.add_argument('user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"')

# Create Chrome WebDriver
driver = webdriver.Chrome(options=chrome_options)

# Set URL
url = "https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText="
HomePage = "https://ieeexplore.ieee.org"

# Create CSV file to save results
def create_csv(name):
    with open(name+'.csv', 'w', newline='', encoding='utf-8-sig') as f:
        writer = csv.writer(f)
        writer.writerow(['Cites','Tittle', 'Author Name', 'Publish Year', 'Publisher', 'Abstract', 'URL'])
# Def function to open webpage
def open_url(url):
    driver.get(url)
    time.sleep(2)
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.TAG_NAME, 'body')))
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    return soup
# Open ebpage
def search(keyword):
    soup = open_url(url+keyword)
    # Perform web scraping operations
    count = soup.find("h1", class_="Dashboard-header col-12").find_all("span",class_="strong")[1].text.replace(",","")
    count = int(count)//25 + 1
    print("Get " + str(count) + " pages of results")
    create_csv(keyword)
    for i in range(count):
        print("Processing page " + str(i+1))
        urltemp = url + keyword + "&pageNumber=" + str(i+1)
        soup = open_url(urltemp)
        items = soup.find_all('xpl-results-item', {'_ngcontent-sst-c145': "", '_nghost-sst-c144': ""})
        if items == []:
            print("No results found, reopend the url")
            urltemp = url + keyword + "&pageNumber=" + str(i)
            soup = open_url(urltemp)
            items = soup.find_all("div", {"_ngcontent-sst-c144":""},class_="hide-mobile")
            if items == []:
                print("No results found, skip this page")
                continue
        for item in items:
            tittle = item.find('h3', {'_ngcontent-sst-c144': "", 'class': "text-md-md-lh"}).text
            author = item.find('p', {'_ngcontent-sst-c128': "", 'class': "author text-base-md-lh"})
            if author is None:
                author = ""
            else:
                author = author.get_text(strip=True)
            publishInfo = item.find("div", class_="publisher-info-container")
            publishYear = publishInfo.find("span").text.split(": ")[1]
            publisher = publishInfo.find("span", class_="title").find_next_sibling('span').text
            abstractUrl = item.find("a", {'_ngcontent-sst-c144':"", 'class': "fw-bold", 'xplanchortagroutinghandler': "", 'xplhighlight': ""})
            if abstractUrl is None:
                abstractUrl = item.find("a", {'ngbtooltip':"HTML format", 'tooltipclass':"document-toolbar-tooltip"})
            abstractUrl = HomePage + abstractUrl['href']
            print("Getting the datil source of "+abstractUrl)
            soup = open_url(abstractUrl)
            cites = soup.find('div',string ="Cites in")
            if cites is None:
                cites = 0
            else: 
                cites = cites.find_previous_sibling('div').text
            abstract = soup.find("div", class_="abstract-desktop-div hide-mobile text-base-md-lh")
            if abstract is None:
                abstract = ""
            else:
                abstract = abstract.find('div', {'_ngcontent-puc-c169': "", 'xplmathjax': ""}).text
            with open(keyword+'.csv', 'a', newline='', encoding='utf-8-sig') as f:
                writer = csv.writer(f)
                writer.writerow([cites,tittle, author, publishYear, publisher, abstract,abstractUrl])
    # Close the browser
    driver.quit()

        

# Get input from user
def get_input():
    keyword = input("Please input the keyword: ")
    search(keyword)
    print("Finish searching")
    print("The result has been saved in "+keyword+".csv")
    print("The browser has been closed")

# get_input()
get_input()

AI ethics guidelines.csv，this is the results file.

AtharvSabde · 2024-03-08T06:27:46Z

AtharvSabde
Mar 8, 2024

Hello everyone, Myself atharv sabde and I am very excited to be part of this project.
@selenbw what next steps should I take to get selected in this project. please guide me. thankyou

0 replies

Shreeman5 · 2024-03-19T22:19:19Z

Shreeman5
Mar 19, 2024

Hello All,
My name is Shreeman Gautam and I am a PhD student doing visualization research at the University of Utah.
@selenbw, I looked at the project, read the contributor guidelines and I think this a good fit for me since I have a background in CSS, HTML and Javascript(D3). I have one question: If the project goes well and we finish all the tasks, will we write a conference paper about it?
Thank you!
Shreeman

0 replies

KarthikDani · 2024-03-24T12:10:07Z

KarthikDani
Mar 24, 2024

Hello @selenbw! I have scraped one of the toughest platforms ethically and it's public data from Linkedin.

With 2 crawler's I was able to fetch nearly 200,000 company profiles along with each of their 16 different company details, all done in two steps. Find my repo on the same Linkedin Company Directory Scraping System

Well I am new to GSoC, and would like to know if I still have time to submit proposal for this project? I LOVE ETHICS!

0 replies

Khyati9505 · 2024-03-27T13:18:55Z

Khyati9505
Mar 27, 2024

Hello everyone,

My name is Khyati Bhat, and I'm currently a first-year student majoring in Biological Engineering at the Indian Institute of Technology, Madras (IITM). I am a beginner to opensource however I have been working on web development, focusing on front-end development using HTML, CSS, and JavaScript along with JavaScript libraries like React. Additionally, I have worked on projects involving machine learning utilizing various Python libraries like Numpy, Pandas, OpenCV, Matplotlib, and Seaborn as well as Python Frameworks like Tensorflow and Keras. 
After reviewing many of the project ideas, I find myself very intrigued by the Health - AI Ethics Atlas project as this project has the potential to become a tool that help solve a significant problem that the medical field is facing right now, and is a project I have a personal stake in.  
   @selenbw I'm working on the proposal right now, would it be possible to send it to you by email for feedback?

Email: Khyati9505@gmail.com

0 replies

kruti107 · 2024-03-31T19:08:16Z

kruti107
Mar 31, 2024

Hello,
I am Kruti Pandya currently in last semester of MTech AI journey.I would be happy to be the part of the project "Health-AI Ethics Atlas" @selenbw
email: kruti.pmtai22@sot.pdpu.ac.in

0 replies

selenbw · 2024-06-04T06:37:45Z

selenbw
Jun 4, 2024

This project wasn't selected for GSoC, but my team will continue working on it. We're aiming for a conference or journal paper and plan to finish in 3 months. It will require consistent commitment. If you're seriously interested in joining the team, please contact me at selen.bozkurt@emory.edu.

1 reply

Zawa-ll Jul 26, 2024

Hello! I am interested in contributing to the Health-AI Ethics Atlas project. I just sent you an email, looking forward to your response for further discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health-AI Ethics Atlas - Requirements #29

{{title}}

Replies: 11 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Health-AI Ethics Atlas - Requirements #29

Replies: 11 comments · 16 replies

iamrmrishan Mar 6, 2024 Author

Replies: 11 comments 16 replies

iamrmrishan Mar 6, 2024
Author