Jobs
Meine Anzeigen
Jobs per E-Mail
Anmelden
Einen Job finden Firmen
Suchen

Freelance agent evaluation engineer

Stainach
Mindrift
Inserat online seit: 6 Mai
Beschreibung

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What This Opportunity Involves We're building a dataset to evaluate AI coding agents — how well a model handles real-world developer tasks. You'll create challenging tasks and evaluation criteria within realistic simulated environments: Build virtual companies following a high-level plan - codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history
Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair
Design tasks set in isolated environments - emulations of a developer's workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase
Write tests that accept all correct solutions and reject incorrect ones - neither too strict (breaking on valid approaches) nor too lenient (passing bad ones)
Iterate with an AI agent on tests - verifying they catch real problems, don't miss bad solutions, and don't break on good ones
Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios
Iterate based on feedback from expert QA reviewers who score your work on quality criteria What This Is NOT Not data labeling
Not prompt engineering
Not writing code from scratch - the agent writes most of the code; you guide and evaluate A significant part of the work is done together with AI - it's very hard to create tasks that challenge frontier models without using frontier models. What We Look For This opportunity is a good fit f...

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern
Ähnliches Angebot
Computer science expert with python experience - ai projects on mindrift
Stainach
Mindrift
Ähnliche Angebote
Jobs Stainach
Jobs Liezen
Jobs Steiermark
Home > Stellenangebote > Freelance Agent Evaluation Engineer

Jobijoba

  • Bewertungen Unternehmen

Stellenangebote finden

  • Stellenangebote nach Jobtitel
  • Stellenangebote nach Berufsfeld
  • Stellenangebote nach Firma
  • Stellenangebote nach Ort

Kontakt / Partner

  • Kontakt
  • Veröffentlichen Sie Ihre Angebote auf Jobijoba

Impressum - Allgemeine Geschäftsbedingungen - Datenschutzerklärung - Meine Cookies verwalten - Barrierefreiheit: Nicht konform

© 2026 Jobijoba - Alle Rechte vorbehalten

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern