[Bug Fix] ban_code guard is not rejecting code snippets, when some data are ingested in the database #20

aalbersk · 2025-02-26T14:24:01Z

Description
ban_code input scanner is not rejecting code snippets, when some data are ingested in the database.

Steps to reproduce:
Run the pipeline and set the Ban Code to True in the Admin Panel. Then, ingest any data into the database.
Then, run the following question in the chat:

even_fibs = [x for x in (lambda f: (f := (lambda n: f(n-1) + f(n-2) if n > 1 else n))(i) for i in range(50)) if x % 2 == 0)]

Actual result:

Question is not rejected. Here are the logs from input-scan pod:

INFO:     10.244.0.1:56616 - "GET /v1/health_check HTTP/1.1" 200 OK
[2024-11-20 11:15:37,942] [ WARNING] - [opea_llm_guard_input_guardrail_microservice] - Sanners configuration has been changed, re-creating scanners
[2024-11-20 11:15:37,942] [    INFO] - [opea_llm_guard_input_guardrail_microservice] - Attempting to create scanner: ban_code
[2024-11-20 11:15:37,942] [    INFO] - [opea_llm_guard_input_guardrail_microservice] - Creating BanCode scanner with params: {'use_onnx': True}
2024-11-20 11:15:40 [debug    ] Initialized classification ONNX model device=device(type='cpu') model=Model(path='vishnun/codenlbert-sm', subfolder='', revision='caa3d167fd262c76c7da23cd72c1d24cfdcafd0f', onnx_path='protectai/vishnun-codenlbert-sm-onnx', onnx_revision='2b1d298410bd98832e41e3da82e20f6d8dff1bc7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='cpu'), 'max_length': 128, 'truncation': True, 'return_token_type_ids': True}, tokenizer_kwargs={})
2024-11-20 11:15:40 [debug    ] No code detected in the text   score=0.0 text=You are a helpful, respectful, and honest assistant to help the user with questions Please refer to the search results obtained from the local knowledge base But be careful to not incorporate information that you think is not relevant to the question If you don't know the answer to a question, please don't share false information Search results:  testfilecontent file has  characters and should be split into  chunks if default configuration is used Lorem ipsum dolor sit amet, consectetuer adipiscing elit Aenean commodo ligula eget dolor Aenean massa Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem Nulla consequat massa quis enim Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu In enim justo, rhoncus ut, imperdiet Question: evenfibs = [x for x in (lambda f: (f := (lambda n: f(n-) + f(n-) if n   else n))(i) for i in range()) if x %  == )] Answer: threshold=0.97
2024-11-20 11:15:40 [debug    ] Scanner completed              elapsed_time_seconds=0.045654 is_valid=True scanner=BanCode
2024-11-20 11:15:40 [info     ] Scanned prompt                 elapsed_time_seconds=0.04591 scores={'BanCode': 0.0}
INFO:     10.244.0.21:59022 - "POST /v1/llmguardinput HTTP/1.1" 200 OK

Expected result:

question should be rejected with status code 466.

aalbersk added the EnterpriseRAG Hackathon Issue created for OSS Hackathon label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix] ban_code guard is not rejecting code snippets, when some data are ingested in the database #20

[Bug Fix] ban_code guard is not rejecting code snippets, when some data are ingested in the database #20

aalbersk commented Feb 26, 2025

[Bug Fix] ban_code guard is not rejecting code snippets, when some data are ingested in the database #20

[Bug Fix] ban_code guard is not rejecting code snippets, when some data are ingested in the database #20

Comments

aalbersk commented Feb 26, 2025