CEDARScript
(Concise Examination, Development, And Refactoring Script) is a SQL-like language designed to concisely:
- Express code manipulations and refactorings (if you know what you want to change in your code);
- The CEDARScript runtime can edit any file in the code base according to the commands it reads
- Perform code analysis to quickly get to know a large code base without having to read all contents of all files.
- The CEDARScript runtime searches through the whole code base and only returns the desired results
- SQL-like syntax for intuitive code querying and manipulation;
- Shows improved results in refactoring benchmarks when compared to standard diff formats
- Reduced token usage via semantic-level code transformations, not character-by-character matching;
- Scalable to larger codebases with minimal token usage;
- Project-wide refactorings can be performed with a single, concise command
- Avoids wasted time and tokens on failed search/replace operations caused by misplaced spaces, indentations or typos;
- High-level abstractions for complex refactoring operations via refactoring languages (currently supports Rope syntax);
- Relative indentation for easily maintaining proper code structure;
- Allows fetching or modifying targeted parts of code;
- Locations in code: Doesn't use line numbers. Instead, offers more resilient alternatives, like:
- Line markers. Ex:
LINE "if name == 'some name':"
- Identifier markers (
VARIABLE
,FUNCTION
,CLASS
). Ex:FUNCTION 'my_function'
- Line markers. Ex:
- Language-agnostic design for versatile code analysis
- Code analysis operations return results in XML format for easier parsing and processing by LLM (Large Language Model) systems.
- CEDARScript AST Parser (Python)
- CEDARScript Editor
- CEDARScript Prompt Engineering
- Provides prompts that teach
CEDARScript
to LLMs - Also includes real conversations held via Aider in which an LLM uses this language to propose code modifications
- Provides prompts that teach
- CEDARScript Integrations - Provides
CEDARScript
edit format for Aider
CEDARScript
can be used as a way to standardize and improve how AI coding assistants interact with codebases, learn about your code, and communicate their code modification intentions while keeping token usage low.
This efficiency allows for more complex operations within token limits.
It provides a concise way to express complex code modification and analysis operations, making it easier for AI-assisted development tools to understand and perform these tasks.
One can use CEDARScript
to concisely and unambiguously represent code modifications at a higher level than a standard diff format can.
IDEs can store the local history of files in CEDARScript format, and this can also be used for searches.
- Code review systems for automated, in-depth code assessments
- Automated code documentation and explanation tools
- ...
Quick example:
UPDATE FUNCTION "my_func"
FROM FILE "functional.py"
MOVE WHOLE
INSERT BEFORE LINE "def get_config(self):"
RELATIVE INDENTATION 0;
There are many more examples to look at...
This capability is designed to help developers, AI assistants, and other tools quickly gain a comprehensive understanding of a project's structure, conventions, and context.
-
Convention Discovery: CEDARScript can automatically extract coding conventions from designated files like
CONVENTIONS.md
:SELECT CONVENTIONS FROM ONBOARDING;
-
Context Retrieval: Quickly access project context from files like
.context.md
or.contextdocs.md
:SELECT CONTEXT FROM ONBOARDING;
-
Comprehensive Project Overview: Gather all essential project information in one query:
SELECT * FROM ONBOARDING;
Ideas to explore:
- Automatic generation of project structure visualizations
- Integration with version control history for context-aware onboarding
- Customizable onboarding queries for specific project needs
- Tree-Sitter query language integration, which could open up many possibilities;
- Create a browser extension that allows web-chat interfaces of Large Language Models to tackle larger file changes;
- Select a model to fine-tune so that it natively understands
CEDARScript
; - Provide language extensions that will improve how LLMs interact with other resource types;
- Explore using it as an LLM-Tool Interface;
This could open up many possibilities, like:
QUERY LANGUAGE 'tree-sitter'
FROM PROJECT
PATTERN '''
(function_definition
name: (identifier) @func_name
parameters: (parameters) @params
body: (block
(return_statement) @return_stmt))
'''
WITH ANALYSIS
COUNT @func_name AS "Total Functions"
AVERAGE (LENGTH @params) AS "Avg Parameters"
PERCENTAGE (IS_PRESENT @return_stmt) AS "Functions with Return";
Find all classes and their methods in Python files, then insert a print statement after each method definition:
QUERY LANGUAGE 'tree-sitter'
FROM PROJECT
PATTERN '''
(class_definition
name: (identifier) @class_name
body: (block
(function_definition
name: (identifier) @method_name)))
'''
WITH ACTIONS
INSERT AFTER @method_name
CONTENT '''
@0: print("Method called:", @method_name)
''';
Cross-language refactoring: replace all calls to "deprecated_function" across Python, JavaScript, and TypeScript files.
QUERY LANGUAGE 'tree-sitter'
FROM PROJECT
LANGUAGES ["python", "javascript", "typescript"]
PATTERN '''
(call_expression
function: (identifier) @func_name
(#eq? @func_name "deprecated_function"))
'''
WITH ACTIONS
REPLACE @func_name
WITH CONTENT "new_function";
We can define project-specific linting rules using Tree-sitter queries:
QUERY LANGUAGE 'tree-sitter'
FROM PROJECT
PATTERN '''
(import_statement
(dotted_name) @import_name
(#match? @import_name "^(os|sys)$"))
'''
WITH LINT
SEVERITY "WARNING"
MESSAGE "Direct import of system modules discouraged. Use custom wrappers instead.";
As Large Language Models (LLMs) become increasingly accessible through web-based chat interfaces, there's a growing need to enhance their ability to handle larger codebases and complex file changes. We propose developing a browser extension that leverages CEDARScript to bridge this gap.
-
Seamless Integration: The extension would integrate with popular LLM web interfaces (e.g., ChatGPT, Claude, Gemini) by leveraging llm-context.py, allowing users to work with larger files and codebases directly within these platforms.
-
CEDARScript Translation: The changes proposed by the LLM would be concisely expressed as
CEDARScript
commands, enabling more efficient token usage. -
Local File System Access: The extension could securely access the user's local file system, allowing for direct manipulation of code files based on
CEDARScript
instructions generated by the LLM. -
Diff Visualization: Changes proposed by the LLM would be presented as standard diffs or as
CEDARScript
code, allowing users to review and approve modifications before applying them to their codebase. -
Context Preservation: The extension would maintain context across chat sessions, enabling long-running refactoring tasks that span multiple interactions.
This browser extension would expand the capabilities of web-based LLM interfaces, allowing developers to leverage these powerful AI tools for more substantial code modification and analysis tasks. By using CEDARScript as an intermediary language, the extension would ensure efficient and accurate communication between the user, the LLM, and the local codebase.
This initiative could enhance the efficiency and effectiveness of AI-assisted code analysis and transformation.
- Improved Accuracy: A fine-tuned model will have a deeper understanding of CEDARScript syntax and semantics, leading to more accurate code analysis and generation.
- Efficiency: Native understanding of CEDARScript will reduce the need for extensive prompting.
- Consistency: A model trained specifically on CEDARScript will produce more consistent and idiomatic output, adhering closely to the language's conventions and best practices.
- Extended Capabilities: Fine-tuning could enable the model to perform more complex CEDARScript operations and understand nuanced aspects of the language that general-purpose models might miss.
- Model Selection: We will evaluate various state-of-the-art language models to determine the most suitable base model for fine-tuning. Factors such as model size, pre-training data, and architectural features will be considered.
- Dataset Creation: A comprehensive dataset of CEDARScript examples, covering a wide range of use cases and complexities, will be created. This dataset will include both CEDARScript commands and their corresponding natural language descriptions or intentions.
- Fine-tuning Process: The selected model will undergo fine-tuning using the created dataset. We'll experiment with different fine-tuning techniques, depending on the resources available and the desired outcome.
- Evaluation: The fine-tuned model will be rigorously tested on a held-out test set to assess its performance in understanding and generating CEDARScript. Metrics such as accuracy, fluency, and task completion will be used.
- Iterative Improvement: Based on the evaluation results, we'll iteratively refine the fine-tuning process, potentially adjusting the dataset, fine-tuning parameters, or even the base model selection.
As Large Language Models continue to evolve and find applications in various real-world scenarios, there's a growing need for standardized ways for LLMs to interact with external tools and APIs. We envision `CEDARScript` as a potential solution to this challenge.
- Standardized Tool Interaction:
CEDARScript
could serve as an intermediary language between LLMs and various tools, providing a consistent, SQL-like syntax for expressing tool usage intentions. - Tool-Agnostic Commands: By defining a set of generic commands that map to common tool functionalities,
CEDARScript
could simplify the process of integrating new tools and APIs. - Complex Tool Pipelines: The language's SQL-like structure could allow for easy chaining of multiple tool operations, enabling more complex workflows.
- Abstraction of API Complexity: CEDARScript could hide the underlying complexity of diverse tool APIs behind a simpler, unified interface.
This approach could potentially enhance LLMs' ability to leverage external tools and capabilities, making it easier to deploy them in diverse real-world applications. Future work could explore the feasibility and implementation of this concept, aiming to create a more seamless integration between LLMs and the tools they use to interact with the world.
- .QL - Object-oriented query language that enables querying Java source code using SQL-like syntax;
- JQL (Java Query Language) - Allows querying Java source code with SQL. It's designed for Java code analysis and linting;
- Joern - While primarily focused on C/C++, Joern is an open-source code analysis platform that uses a custom graph database to store code property graphs. It allows querying code using a Scala-based domain-specific language;
- Codebase Context Suite - A comprehensive tool for managing codebase context, generating prompts, and enhancing development workflows;
- CONVENTIONS.md
- Cedar Policy Language ('CEDARScript' is not a policy language. 'Cedar' and 'CEDARScript' are totally unrelated.)