Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertions are bery inconsistent #87

Open
Agustin-Perezz opened this issue Feb 20, 2025 · 5 comments
Open

Assertions are bery inconsistent #87

Agustin-Perezz opened this issue Feb 20, 2025 · 5 comments

Comments

@Agustin-Perezz
Copy link

I'm testing the detox copilot but is doing things inconsistent, even if the task is very simple.

Here is my handler:

import OpenAI from 'openai';
import type { ChatCompletionMessageParam } from 'openai/resources/chat/completions';

class OpenAIPromptHandler {
	private readonly openai = new OpenAI({
		apiKey: process.env.EXPO_PUBLIC_OPENAI_API_KEY,
	});

	// GPT-3.5-turbo can handle up to 4K tokens in context window
	private readonly MAX_PROMPT_LENGTH = 4000; // Use most of the available context window
	private readonly MAX_TOKENS = 256; // Reasonable limit for response tokens

	private readonly SYSTEM_PROMPT = `You are a Detox E2E test assistant for React Native. You MUST ONLY generate Detox commands.

CORRECT Detox patterns to use:
expect(element(by.text("Welcome"))).toBeVisible()
expect(element(by.id("button"))).toExist()
expect(element(by.id("input"))).toHaveText("text")
await element(by.text("Submit")).tap()

INCORRECT patterns (DO NOT USE):
❌ onView(withText("text"))              // This is Espresso
❌ cy.get("[data-test=button]")          // This is Cypress
❌ await page.locator("text").click()    // This is Playwright
❌ import statements or setup code
❌ comments or explanations

Rules:
1. Return ONLY the Detox command
2. No imports, no comments, no setup
3. No code blocks or markdown
4. Keep exact text/labels from the prompt

Example inputs and outputs:
Input: "Verify that the Welcome message is visible"
Output: expect(element(by.text("Welcome"))).toBeVisible()

Input: "Check if Submit button exists"
Output: expect(element(by.text("Submit"))).toExist()

Input: 'Verify that the "Hello!" message is displayed'
Output: expect(element(by.text("Hello!"))).toBeVisible()`;

	async runPrompt(prompt: string): Promise<string> {
		const truncatedPrompt =
			prompt.length > this.MAX_PROMPT_LENGTH
				? prompt.substring(0, this.MAX_PROMPT_LENGTH) + '...(truncated)'
				: prompt;

		const messages: ChatCompletionMessageParam[] = [
			{
				role: 'system',
				content: this.SYSTEM_PROMPT,
			},
			{ role: 'user', content: truncatedPrompt },
		];

		try {
			const response = await this.openai.chat.completions.create({
				model: 'gpt-3.5-turbo',
				messages: messages,
				max_tokens: this.MAX_TOKENS,
				temperature: 0.1, 
			});

			return (response.choices[0].message.content ?? '').trim();
		} catch (error: any) {
			console.error('OpenAI API Error:', error);
			throw new Error(
				`Failed to generate test commands: ${error?.message || 'Unknown error'}`,
			);
		}
	}

	isSnapshotImageSupported() {
		return true;
	}
}

export default OpenAIPromptHandler;

The test:

import { copilot, expect } from 'detox';

import OpenAIPromptHandler from './OpenAIPromptHandler';

describe('Home Screen', () => {
	beforeAll(async () => {
		await device.launchApp();
		const promptHandler = new OpenAIPromptHandler();
		copilot.init(promptHandler);
	});

	beforeEach(async () => {
		await device.reloadReactNative();
	});

	it('Should render home view', async () => {
		await copilot.perform('Verify that the "Welcome!" message is displayed');
	});
});

And the result:
Image

@Agustin-Perezz
Copy link
Author

If I left the handler implementation like the docs throws me an error of to much tokens in the request.

@asafkorem
Copy link
Collaborator

Thanks for the report @Agustin-Perezz, did you try other LLMs like Sonnet?

@asafkorem
Copy link
Collaborator

Also, Detox Copilot uses an older version of Pilot, we'll upgrade it's version soon and it might improve your tests.

@asafkorem
Copy link
Collaborator

@Agustin-Perezz try to remove the instructions from the system prompt, Detox Pilot already gives the LLM the necessary context for the APIs supported. Not clear why it writes Espresso code. Also, it looks like your issue is with the LLM you're using

@Agustin-Perezz
Copy link
Author

Agustin-Perezz commented Mar 5, 2025

Thanks, I will try another LLM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants