Project Update - Big News! #45

daveshap · 2024-09-19T15:37:22Z

daveshap
Sep 19, 2024
Maintainer

Executive Summary

On Wednesday, 2024-09-18, the Raspberry team had a great meeting with folks from LAION (https://laion.ai/) to discuss potential collaboration. We identified two overarching areas of alignment between our projects.

Multi-agentic approaches (cognitive architectures) to data synthesis as well as parallel projects not directly related to Raspberry
Data synthesis pipelines and data automation, more directly related to Raspberry

Top Level Topics

Quick recap

The team discussed various topics including reinforcement learning, different types of intelligence, and the potential of combining code interpreters, math interpreters, and external grounding for AI models. They explored using multiple models to generate and refine data, with a focus on developing a multi-agent framework for scalable data generation and verification. The meeting also covered team introductions, ongoing projects, and potential approaches to improve AI capabilities using open-source models and automated data synthesis pipelines.

Topics Discussed

Reinforcement Learning and AI Models: The team explored the potential of combining code interpreters, math interpreters, and external grounding for AI models. They discussed how these elements could be integrated to create a more comprehensive and consistent model of reality.
Multi-Agent Framework: The team proposed developing a multi-agent framework for scalable data generation and verification. This approach would involve using multiple models to generate and refine data, with minimal human review required.
Open-Source Data Synthesis Pipeline: A key project discussed was the development of an automated data synthesis pipeline using open-source resources. This pipeline would aim to generate high-quality datasets more efficiently and cost-effectively than traditional methods.
Model Fine-Tuning Strategy: The team considered using an open-source model like Llama 3.1 as a base for fine-tuning. The strategy involved using copies of the model to generate diverse trajectories, evaluate them, and iteratively refine the dataset.
Data Validation Techniques: Methods for validating synthesized chains of thought were discussed. These included using rubrics to evaluate factors such as salience, logical soundness, and relevance. The team also considered implementing RAG (Retrieval-Augmented Generation) as part of the verification step in the data synthesis pipeline.
Semantic Clustering Challenges: The team debated the effectiveness of using multi-dimensional sentence embeddings for fine-tuning categorization. Concerns were raised about the vulnerability of sentence embeddings to bag-of-words approaches and the difficulty of clustering semantically similar ideas.
Specialized Model Development: There was a proposal to develop specialized models or prompts for different stages of the data synthesis process. This approach would allow for more targeted and efficient data generation and verification.
Compute Resource Exploration: The team planned to investigate potential compute resources, specifically mentioning European supercomputers, for large-scale data generation and self-verification processes.

MrChaos42 · 2024-09-22T11:23:32Z

MrChaos42
Sep 22, 2024

This is great news

0 replies

mindplay-dk · 2024-12-06T07:29:19Z

mindplay-dk
Dec 6, 2024

3 months on, no updates.

Is this still happening?

Are you doing something behind closed doors? Making any progress? Or why are you not sharing the results?

0 replies

thehunmonkgroup · 2024-12-07T14:09:00Z

thehunmonkgroup
Dec 7, 2024
Collaborator

@mindplay-dk There's work going on, we haven't been tracking it in this repo -- not behind closed doors, just in different public repos.

Paper -> CoT pipeline: https://github.com/thehunmonkgroup/raspberry-paper-to-cot-pipeline
o7: https://github.com/DataBassGit/o7

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Update - Big News! #45

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Project Update - Big News! #45

daveshap Sep 19, 2024 Maintainer

Executive Summary

Top Level Topics

Quick recap

Topics Discussed

Replies: 3 comments

MrChaos42 Sep 22, 2024

mindplay-dk Dec 6, 2024

thehunmonkgroup Dec 7, 2024 Collaborator

daveshap
Sep 19, 2024
Maintainer

MrChaos42
Sep 22, 2024

mindplay-dk
Dec 6, 2024

thehunmonkgroup
Dec 7, 2024
Collaborator