Skip to content

Commit

Permalink
240808
Browse files Browse the repository at this point in the history
  • Loading branch information
ssocean committed Aug 8, 2024
1 parent e475012 commit 76e246b
Show file tree
Hide file tree
Showing 3 changed files with 111 additions and 5 deletions.
56 changes: 51 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,63 @@
# From Words to Worth: Newborn Article Impact Prediction with LLM

<!-- <p align="center">
<img src="demo/icon.png" alt="icon" width="25%">
<p align="center">
<img src="img\model.png" alt="icon" width="25%">
</p>

<h1 align="center">
LLM Impact Predictor
</h1> -->
</h1>

### [Early Access Version]
###### This [paper](https://arxiv.org/abs/2408.03934?context=cs.CL) is currently under peer review. The code might change frequently. If you have any issues, feel free to reach out via Email: oceanytech@gmail.com or open an issue in the repository.
###### This [paper](https://arxiv.org/abs/2408.03934?context=cs.CL) is currently under peer review. The code might change frequently. We are currently experiencing a severe staff shortage. If you encounter any issues during the replication process, please feel free to contact us through an issue or via email:oceanytech@gmail.com.

<!-- If you have any issues, feel free to reach out via Email: oceanytech@gmail.com or open an issue in the repository. -->

## Introduction

This repository contains the official implementation for the paper **"From Words to Worth: Newborn Article Impact Prediction with LLM"**. The tool is designed to help researchers predict the future impact of newly published academic articles using just their titles and abstracts.
This repository contains the official implementation for the paper **"From Words to Worth: Newborn Article Impact Prediction with LLM"**. The tool is designed to PEFT the LLMs for the prediction of the future impact.

## Installation
At this Early Access stage,installation could be a little bit complicated.

First, you need pull the repo and type following commands in the console:
```
cd ScImpactPredict
pip install -r requirements.txt
```
Second, you have to manully modify the 'xxxForSequenceClassification' in the `transformers` package.
```
class LlamaForSequenceClassification(LlamaPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.model = LlamaModel(config)
self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
self.post_init()
# Add codes here!
self.loss_func = 'mse'
self.sigmoid = nn.Sigmoid()
...
def forward(...):
...
hidden_states = transformer_outputs[0]
logits = self.score(hidden_states)
# Add codes here!
if not self.loss_func == 'bce':
logits = self.sigmoid(logits)
if input_ids is not None:
batch_size = input_ids.shape[0]
...
# Add codes here!
if self.config.problem_type == "regression":
if self.loss_func == 'bce':
loss_fct = BCEWithLogitsLoss()
elif self.loss_func == 'mse':
loss_fct = MSELoss()
# loss_fct = MSELoss()
elif self.loss_func == 'l1':
loss_fct = L1Loss()
elif self.loss_func == 'smoothl1':
loss_fct = nn.SmoothL1Loss()
```
Binary file added img/model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
absl-py
accelerate==0.30.1
annotated-types==0.7.0
bitsandbytes==0.42.0
certifi==2024.2.2
charset-normalizer==3.3.2
deepspeed==0.14.2
einops==0.8.0
filelock==3.14.0
flash-attn==2.6.1
fsspec==2024.5.0
grpcio
hjson==3.1.0
huggingface-hub==0.23.1
idna==3.7
importlib_metadata==7.1.0
Jinja2==3.1.4
joblib==1.4.2
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.4
packaging==24.0
pandas==2.2.2
peft==0.11.1
pillow==10.3.0
protobuf==3.20.3
psutil==5.9.8
py-cpuinfo==9.0.0
pydantic==2.7.1
pydantic_core==2.18.2
pynvml==11.5.0
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.2
safetensors==0.4.3
scikit-learn==1.5.1
scipy==1.13.1
six
sympy==1.12
tensorboard==2.17.0
tensorboard-data-server==0.7.2
threadpoolctl==3.5.0
tiktoken==0.7.0
tokenizers==0.19.1
torch==2.3.0
torchaudio==2.3.0
torchvision==0.18.0
tqdm==4.66.4
transformers==4.41.1
transformers-stream-generator==0.0.4
triton==2.3.0
typing_extensions==4.12.0
tzdata==2024.1
urllib3==2.2.1
Werkzeug
zipp==3.18.2

0 comments on commit 76e246b

Please sign in to comment.