-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI-literature paper extraction (Xuli Tang) #1
Comments
发件人: Yan, Xiaoran I can certainly run it with the new list. However, can you first check the format of the results and confirm this is what you want? Thank you! On 2/18/19 2:03 PM, Tang, Xuli wrote:
|
Hi Prof Xiaoran The WoS file is in CSV format, not TSV. Please adjust your delimiter to "," and see if it works. Xiaoran On 2/22/19 8:27 PM, Tang, Xuli wrote:
|
All missing journals are now matched, with the following spelling correction: COMPUTER SPEECH AND LANGUAGE -> Computer Speech & Language All most all missing conferences are also matched, with the following correction: |
Hi Prof Xiaoran, I have received your data, you did a great job, we appreciate all your help. Thank you very much Have a wonderful night! Xuli 2/27/2019 |
Hi Prof. Xiaoran Yan, Happy to know you will join us! I think we will start it recently, i will inform you when we get start. The github repo is a very good resource, is it possible to fuse/link it to our AI papers? Have a wonderful night! 2019/4/3 发件人: Yan, Xiaoran Certainly. Let me know when you are ready to discuss your plan. I can join your group meeting or have some individual discussion if your prefer. By the way, in terms of topic diversity, MAG has a filed of study tag inferred from the full text for each paper. And they recently added a paper resources table which contains github repo's associated with each paper. Not sure how useful they will be but worth considering. Thanks! Xiaoran |
Hi Xuli, I did a quick search and found that only a small proportion of core Paper Set has their code(3889/477604) or data listed (842/477604). Another 2555 papers has their project web page listed. Not sure if the data is good enough, but you can download it from Let me know if you still want such data for the Extent Set and the Patent Set. Xiaoran |
Hi Prof. Xiaoran,
(1) The source of patent, where did you crawl these patents? (2) I found the patents you once send me are papers, it made me so confused, why they don't have Patent number ?
|
Hi Xuli Tang, The patent data is from MAG, which in turn comes from Lens.org Upon further inspection on lens.org, it seems the listed documents do have valid patent associated with them. For example: The previous MAG data did not contain patent number. The recent update on 07/30/2019 included this new information. Let me know if you will be interested in an updated dataset with new information. You should have received an official response for your CADRE fellow application. Although your proposal was not selected, you are still eligible to receive continued technical/data support from our team. Please use this Github channel for follow-up communications. Thanks! Xiaoran |
From: Yan, Xiaoran
Sent: Sunday, February 17, 2019 9:53 PM
To: Tang, Xuli xulitang@iu.edu
Cc: Ma, He mahe@iu.edu; Hutchinson, Matthew Alexander maahutch@iu.edu; Pentchev, Valentin vpentche@iu.edu; Patricia L Mabry (pmabry@iu.edu) pmabry@iu.edu; Ding, Ying dingying@indiana.edu
Subject: RE: Could you please help me with AI data sets?
Hi Xuli,
Here are my first take of your requested dataset. The data is from MAG and consists of three TSV files. You can download with the following links.
AIpapersAll.tsv (0.8GB): contains all the papers in your lists of journals and conferences
https://iunimag.blob.core.windows.net/mag-2019-01-25/AIpapersAll.tsv?st=2019-02-18T02%3A27%3A56Z&se=2019-02-26T02%3A27%3A00Z&sp=rl&sv=2017-07-29&sr=b&sig=0wSNGWAM9MVG7zWUtSm1QpPMv4N%2BuCePkuAqZNU4NB0%3D
AIpapersOthers.tsv (7GB): contains all citing and cited documents (includes all papers and patents in MAG) that is external to the listed journals and conferences
https://iunimag.blob.core.windows.net/mag-2019-01-25/AIpapersOthers.tsv?st=2019-02-18T02%3A30%3A38Z&se=2019-02-26T02%3A30%3A00Z&sp=rl&sv=2017-07-29&sr=b&sig=espQuP%2BSdRQvhv9CzKPfoMSKJHbvY%2FrbuhOobpFEPA0%3D
AIpapersCitationsAll.tsv (7GB): contains all citations with available citing context
https://iunimag.blob.core.windows.net/mag-2019-01-25/AIpapersCitationsAll.tsv?st=2019-02-18T02%3A32%3A30Z&se=2019-02-19T02%3A32%3A30Z&sp=rl&sv=2017-07-29&sr=b&sig=Sf9EoRTjbv2vDfWKu1sahHVkE27W%2BZM4%2FArMK1v%2B9Zk%3D
The links will be valid for a week. Please download and get back to me if you find any problems. From my experience, it takes a few updates to finalize as your research progress. I will be producing a WoS dataset for the journals later this week.
Thank you!
Xiaoran
From: Tang, Xuli
Sent: Wednesday, February 6, 2019 1:06 PM
To: Yan, Xiaoran yan30@iu.edu
Subject: Could you please help me with AI data sets?
Hi Xiaoran Yan,
Could please help us to select data from your database?
We want the data:
Paper Set {All papers from List of Journals, all papers from list of Conferences}
Patent Set {All patents have cited or referenced papers in Paper set}
Extend Set {All papers have cited or referenced the paper in Paper Set}
We want all fields related to papers or patents.
Journal list and conferences list is attached.
Thank you!
Xuli Tang
The text was updated successfully, but these errors were encountered: