This Jupyter Notebook contains the data crawled from ICLR 2020 OpenReview webpages and their visualizations. The list of submissions (sorted by the average ratings) can be found here.
- Python3.6
- selenium
- pyvirtualdisplay (run on a headless device)
- numpy
- h5py
- matplotlib
- seaborn
- pandas
- imageio
- wordcloud
This Jupyter Notebook contains the data and visualizations that are crawled ICLR 2020 OpenReview webpages. All the crawled data (sorted by the average ratings) can be found here. The accepted papers have an average rating of 6.2431 and 3.4246 for rejected papers. The distribution is plotted as follows.
The distribution of reviewer ratings centers around 4 (mean: 4.1837).
The cumulative sum of reviewer ratings.
You can compute how many papers are beaten by yours with
# See how many papers are beaten by yours
def PR(rating_mean, your_rating):
pr = np.sum(your_rating > np.array(rating_mean))/len(rating_mean)*100
same_rating = np.sum(your_rating == np.array(rating_mean))/len(rating_mean)*100
return pr, same_rating
my_rating = (6+6+6)/3. # your average rating here
pr, same_rating = PR(rating_mean, my_rating)
print('Your papar ({:.2f}) is among the top {:.2f}% of submissions based on the ratings.\n'
'There are {:.2f}% with the same rating.'.format(
my_rating, 100-pr, same_rating))
# accept rate orals spotlight posters
# ICLR 2017: 39.1% (198/507) 15 183
# ICLR 2018: 32.0% (314/981) 23 291
# ICLR 2019: 31.4% (500/1591) 24 476
# ICLR 2020: 26.5% (687/2594) 48 108 529
[Output]
Your papar (6.00) is among the top 21.79% of submissions based on the ratings.
There are 8.24% with the same rating.
The word clouds formed by keywords of submissions show the hot topics including deep learning, reinforcement learning, representation learning, generative models, graph neural network, etc.
This figure is plotted with python word cloud generator
from wordcloud import WordCloud
wordcloud = WordCloud(max_font_size=64, max_words=160,
width=1280, height=640,
background_color="black").generate(' '.join(keywords))
plt.figure(figsize=(16, 8))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
The top 50 common keywords and their frequency.
The average reviewer ratings and the frequency of keywords indicate that to maximize your chance to get higher ratings would be using the keywords such as compositionality, deep learning theory, or gradient descent.
The average review length is 407.91 words. The histogram is as follows.
All individual ratings:
The average rating for each paper:
The authors with more than 5 submissions.
See How to install Selenium and ChromeDriver on Ubuntu.
To crawl data from dynamic websites such as OpenReview, a headless web simulator can be created by
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
executable_path = '/Users/waltersun/Desktop/chromedriver' # path to your executable browser
options = Options()
options.add_argument("--headless")
browser = webdriver.Chrome(options=options, executable_path=executable_path)
Then, we can get the content from a webpage
browser.get(url)
To know what content we to crawl, we need to inspect the webpage layout.
I chose to get the content by
key = browser.find_elements_by_class_name("note_content_field")
value = browser.find_elements_by_class_name("note_content_value")
The data includes the abstract, keywords, TL; DR, comments.
The following content is hugely borrowed from a nice post written by Christopher Su.
- Install Google Chrome for Debian/Ubuntu
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
- Install
xvfb
to run Chrome on a headless device
sudo apt-get install xvfb
- Install ChromeDriver for 64-bit Linux
sudo apt-get install unzip # If you don't have unzip package
wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
If your system is 32-bit, please find the ChromeDriver releases here and modify the above download command.
- Install Python dependencies (Selenium and pyvirtualdisplay)
pip install pyvirtualdisplay selenium
- Test your setup in Python
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(1024, 1024))
display.start()
browser = webdriver.Chrome()
browser.get('http://shaohua0116.github.io/')
print(browser.title)
print(browser.find_element_by_class_name('bio').text)
Collected at 12/23/2019 03:59:42 PM
Number of submissions: 2594 (withdrawn/desk reject submissions: 383)