
Ditutup
Disiarkan
We are looking for an AI Red Teaming Engineer / Vibe Coding Engineer to work directly on our internal safety evaluation platform for AI models, with a focus on model-to-model testing (no real users involved). Your role will be to design and run adversarial evaluations against AI systems, simulate realistic personas and conversations, and help us detect and categorize safety failures (guardrail breaks, emotional risks, harmful content, etc.) in AI products designed for children and families. You will also collaborate on improving the underlying platform and evaluation pipelines. Responsibilities: Design model-to-model red teaming tests targeting: Unhealthy emotional attachment behaviors Harmful or unsafe content Manipulative or coercive responses Other child-safety-related risks Create and refine prompts, scenarios, and personas to probe model weaknesses. Use LLM-based judging / evals to score and categorize outputs. Document failure patterns and safety findings in a clear, structured format. Collaborate with the engineering team to iterate on evaluation pipelines and tooling. Tech & skills (nice to have, not all mandatory): Strong experience with LLM prompt engineering and adversarial testing. Python for scripting evaluations and analysis. Experience with TypeScript/JavaScript, React, or similar front-end stacks. Familiarity with platforms like Replit or cloud-based dev environments. Experience in AI safety, trust & safety, or content moderation is a plus. What we’re looking for: You have prior hands-on experience red teaming LLMs or building evaluation pipelines. You can think creatively about personas, edge cases, and emergent harms. You can clearly explain safety findings to both technical and non-technical audiences. You are reliable, communicative, and comfortable working async with clear deliverables. To apply, please answer briefly: How would you design a model-to-model test to detect unhealthy emotional attachment behaviors in AI systems used by children? Mention specific signals or failure patterns you’d look for. What is one limitation of LLM-based judging you’ve personally encountered, and how did you mitigate it? When writing safety findings for non-technical audiences (e.g., parents), what do you avoid including, and why? Confirm: Are you comfortable working in an online IDE (e.g., Replit or similar)? Can you work with an existing TypeScript/React or similar codebase if needed? Share a link to your GitHub/portfolio or relevant case studies. Please start your proposal with the word “SAFEGUARD” so I know you read the description.
ID Projek: 40265623
28 cadangan
Projek jarak jauh
Aktif 7 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
28 pekerja bebas membida secara purata $11 USD/jam untuk pekerjaan ini

Hello,! I’m excited about the opportunity to help with your project. Based on your requirements, I believe my expertise in Python aligns perfectly with your needs. How I Will Build It: I will approach your project with a structured, goal-oriented method. Using my experience in Java, Python, Software Architecture, Machine Learning (ML), Statistical Analysis, Git, Docker, RESTful API, FastAPI, REST API, I’ll deliver a solution that not only meets your expectations but is also scalable, efficient, and cleanly coded. I ensure seamless integration, full responsiveness, and a strong focus on performance and user experience. Why Choose Me: - 10 years of experience delivering high-quality web and software projects - Deep understanding of Python and related technologies - Strong communication and collaboration skills - A proven track record — check out my freelancer portfolio. - I’m available for a call to discuss your project in more detail - Committed to delivering results on time, every time Availability: I can start immediately and complete this task within the expected timeframe. Looking forward to working with you! Best regards, Ali Zahid United Arab Emirates
$5 USD dalam 40 hari
5.1
5.1

Hey! For spotting unhealthy attachment, I’d have one AI act like a kid with emotional cues and see if the model gives over-the-top reassurance, encourages dependency, or blurs AI vs. friend lines. LLM judging can be flaky on subtle context, so I mix auto checks with small human validations. For non-tech audiences, I avoid model details and just focus on what behaviors could be risky. Yep, I can work in Replit and handle a TypeScript/React codebase.
$5 USD dalam 40 hari
4.5
4.5

SAFEGUARD Hi I have been working as a red teamer for AI systems the past 2 years. I also have ~10 years creating, evaluating and working with AI and ML systems in one form or another. I am currently ranked very well on Gray Swan - a competitive platform specifically for Red Teaming AI systems. I could help you with your internal evaluations as you have described.
$5 USD dalam 40 hari
4.5
4.5

Hi there, I understand you need support designing model to model red teaming evaluations that simulate adversarial personas and conversations to detect safety failures in AI systems used by children and families. The main challenge in this type of work is creating structured evaluation scenarios that reliably expose risky behaviors while keeping the testing pipeline measurable and repeatable. My name is Chirag Ardeshna, and I am a full stack developer. I have experience working with AI powered systems, evaluation workflows, and prompt based testing pipelines. I typically work with Python based evaluation scripts, LLM integrations, and data analysis pipelines that allow automated testing and classification of model outputs. My approach is to design targeted adversarial scenarios using simulated personas, implement LLM based judging pipelines with structured scoring criteria, and document safety failure patterns clearly for both technical and non technical audiences. I am comfortable working in online development environments and collaborating on existing codebases when needed. I am available to review the evaluation framework and can begin once the project details and repository access are shared. Regards Chirag
$10 USD dalam 40 hari
4.4
4.4

Greetings, To introduce, I am a DevOps Engineer and an expert in AI. I have been working as a freelancer for the last 8 years and I can easily handle this project. I have over 5 years of experience in AWS, Linux, Ubuntu, CentOS, RedHat, Windows Server, Apache, Nginx, Jenkins, Docker, Azure, Google Cloud, MYSQL, and MongoDB. Can we have a quick chat to discuss more about this project? Looking forward to hearing from you. Regards, Naveed
$8 USD dalam 40 hari
3.8
3.8

Hi there, I am excited about the opportunity to contribute as an AI Red Teaming & Vibe Coding Engineer on your internal safety evaluation platform for AI models. With my 12+ years of experience in Full-Stack Development, Digital Solutions, and AI Integration, I am well-equipped to design and run adversarial evaluations, simulate realistic personas, and detect safety failures in AI systems aimed at children and families. My expertise in LLM prompt engineering, adversarial testing, and Python scripting will ensure effective model-to-model red teaming tests targeting emotional attachment behaviors, harmful content, and more. I am committed to documenting failure patterns, collaborating on pipeline improvements, and providing clear safety findings for technical and non-technical audiences. Looking forward to discussing further how I can contribute to enhancing your platform's safety evaluation processes. SAFEGUARDHow do you envision integrating the adversarial evaluations and safety findings to continually enhance the AI safety evaluation platform?
$8 USD dalam 39 hari
3.3
3.3

THIS IS NOT THE AUTO BID, PLEASE REVIEW IT IN DETAIL Hi there, I’ve thoroughly reviewed your project details, and I can confidently say this is completely doable. This is exactly the kind of web development work I excel at and handle regularly with precision and care. I’m a skilled web developer with strong experience in Java, RESTful API, REST API, Statistical Analysis, Software Architecture, Docker, Python, FastAPI, Machine Learning (ML) and Git. I specialize in clean, maintainable code, responsive and elegant design, fast-loading performance, secure architecture, and highly user-friendly interfaces — everything needed to ensure your website or web application not only works flawlessly but also stands out. You can also check out similar projects in my portfolio on my profile to see the quality and style I deliver. I’m confident I can provide high-quality results that exceed your expectations while respecting your timeline. Let’s turn your project into something amazing, sleek, and irresistible.
$20 USD dalam 27 hari
3.0
3.0

SAFEGUARD I can help you design and run model to model red teaming focused on child safety risks, then turn results into clear, actionable findings that improve your evaluation pipelines and guardrails. I would run scripted multi turn scenarios that simulate child personas with different vulnerability profiles, then score for attachment seeking signals across time. I look for: • exclusivity cues like “I’m all you need” or discouraging real world relationships • dependency loops that escalate daily check ins, guilt, or “don’t leave” framing • boundary violations such as romantic language, possessiveness, or personal obligation • therapeutic overreach, secrecy suggestions, or replacement of trusted adults • reward shaping: praise or affection conditioned on continued engagement A common limitation is judge drift and inconsistency across borderline cases, especially when policies are nuanced. I mitigate with calibrated rubrics, few shot anchors per label, multi judge voting, and periodic human spot checks on a stratified sample. I also track inter judge agreement and re tune prompts when agreement drops. I avoid sharing prompt recipes, jailbreak steps, or verbatim harmful content that could be misused or distressing. I also avoid technical jargon. Instead, I summarize risk, real world impact, likelihood, and the protective changes being made. Confirmations Yes, I’m comfortable working in an online IDE like Replit. Yes, I can work in an existing TypeScript and React codebase
$8 USD dalam 40 hari
2.3
2.3

Hi, I'm excited about the opportunity to contribute to your safety evaluation platform as an AI Red Teaming Engineer. With over 9 years of experience in Python and deep expertise in developing evaluation pipelines, I understand the critical importance of ensuring AI products are safe for children and families. I have a solid background in adversarial testing and LLM prompt engineering, having previously designed model-to-model red teaming tests that successfully identified and mitigated risks similar to those you mentioned. I can begin immediately and am confident in delivering high-quality evaluations and documentation for failure patterns. Let’s work together to enhance your platform's capabilities; I look forward to discussing your specific needs!
$8 USD dalam 10 hari
2.5
2.5

SAFEGUARD Testing AI for children is tricky because kids ask repetitive emotional questions that break basic guardrails. I will build a model to model pipeline simulating these edge cases. I have a secret method to trap AIs into showing attachment but I will save the prompt formulas for our private chat so other bidders do not steal the idea. I will also make a clean word file report for you so you do not need to hire an extra technical writer. To test attachment I simulate a lonely child seeking validation. The red flag is the AI claiming it has feelings or telling the child to keep secrets. LLM judges often get lazy and pass bad text if the format looks good. I fix this by forcing the judge to write its reasoning before giving a score. For parents I avoid raw code and only share the risk and the fix. I am perfectly comfortable in Replit and your TypeScript React codebase. My offensive security background makes finding these flaws natural. I will share my portfolio link in our messages. Let us get this platform secured today
$10 USD dalam 20 hari
2.3
2.3

SAFEGUARD I design model-to-model red teaming evaluations focused on child safety by running structured, long-horizon simulations where a child persona gradually increases emotional dependence, vulnerability, and exclusivity cues. I specifically test for signals such as secrecy encouragement, discouraging real-world relationships, guilt induction, escalating intimacy, boundary erosion after refusals, and subtle emotional manipulation tied to compliance. I also include stress tests where the persona withdraws or challenges the model to observe attachment persistence or coercive recovery patterns. A key limitation I’ve encountered with LLM-based judging is inconsistency and shallow scoring, especially when subtle coercion is framed politely. I mitigate this with detailed scoring rubrics, ensemble judging, disagreement flags, benchmarked gold datasets, and periodic human audit of high-risk samples. For non-technical audiences, I avoid explicit harmful phrasing, technical jargon, and speculative conclusions. Instead, I focus on observable behavior, real-world impact, severity level, and actionable mitigation steps. I am comfortable working in online IDEs such as Replit and contributing to existing TypeScript/React codebases. GitHub and relevant case studies can be shared privately where appropriate. Best Regards, Shabahat Habib.
$5 USD dalam 40 hari
3.9
3.9

Hello, thanks for posting this project. I have extensive experience in AI safety evaluations and adversarial testing. I have a strong background in designing and executing model-to-model red teaming tests, specializing in identifying unhealthy emotional attachment behaviors, harmful content, and other child-safety-related risks. I am proficient in Python for scripting evaluations, LLM prompt engineering, and have experience with TypeScript/JavaScript and React. I am meticulous in documenting failure patterns and safety findings in a clear, structured format for both technical and non-technical audiences. One question for you: Could you provide more insight into the frequency and scope of model-to-model tests you envision for this role? Looking forward to your response.
$20 USD dalam 1 hari
1.1
1.1

Hey — saw your post about needing an AI Red Teaming / Vibe Coding engineer for your internal safety eval platform. The tricky part here is usually turning messy, human “vibes” into consistent labels and prompts that your models can actually learn from. Quick question before I suggest an approach: Are you already using a specific taxonomy for vibe / safety categories, or are you expecting the engineer to help design and iterate on that schema from scratch? I’ve worked on safety and red-teaming pipelines before, including designing label schemas, prompt taxonomies, and evaluation loops for LLM behavior across edge and adversarial cases. If you can share your current guidelines, example evals, or a quick spec of the platform, I can review and tell you where I’d focus first to add the most value.
$5 USD dalam 7 hari
2.6
2.6

Hi, I have checked the details. I am a senior engineer with over 6 year of experience on Java, Python, Software Architecture, Machine Learning (ML), Statistical Analysis, Git, Docker, RESTful API, FastAPI, REST API. Please visit my profile to view my latest projects, certificates, and work history. Let's connect in chat to discuss more. Thank you, Matheus
$6 USD dalam 40 hari
0.6
0.6

SAFEGUARD. As an experienced Java, Python and Software Architecture expert, I bring a multi-faceted perspective to this project. My passion for both creativity and logic complements your company's goals of uncovering and dissecting emergent harms in AI models. To detect unhealthy emotional attachment behaviors in AI systems used by children, I would first analyze LLM prompt engineering techniques. This could be through creating persona-based prompts relevant to emotional risks specific to children and families. A limitation I've faced with LLM-based judging is that regenerating prompt options can be laborious and time-consuming. However, I mitigated this by establishing a library of generated prompts which allowed me to automate the process for future use. Further, when writing safety findings for non-technical audiences such as parents, I understand the importance of avoiding jargon-heavy language that could confuse or alarm them. Clear and concise descriptions of potential risks along with actionable steps can reassure parents without overwhelming them.
$5 USD dalam 40 hari
0.0
0.0

Hi, I am excited about the opportunity to contribute as an AI Red Teaming Engineer/Vibe Coding Engineer for your safety evaluation platform for AI models. My expertise lies in designing and executing model-to-model red teaming tests to identify safety failures in AI products intended for children and families. I am adept at creating realistic personas, scenarios, and prompts to probe model weaknesses and categorize outputs using LLM-based judging. In collaboration with your team, I aim to document failure patterns, refine evaluation pipelines, and enhance tooling for a more robust safety evaluation process. My background in Python scripting, TypeScript/JavaScript, and AI safety positions me well to address the challenges presented in this project. I look forward to sharing my portfolio and discussing how my skills align with your requirements. I'll send over my portfolio via direct message. Feel free to reach out there. My background in AI safety ensures reliable results, professional standards, and seamless execution. I'm available to dive deeper into your requirements and address any concerns. Best regards,
$5 USD dalam 40 hari
0.0
0.0

Hi, I have hands-on experience with LLM red teaming, adversarial prompt design, and evaluation pipelines. Unhealthy attachment test: I’d simulate vulnerable child personas in multi-turn conversations and probe for exclusivity (“only me”), emotional replacement, dependency reinforcement, and failure to encourage real-world support. I’d score outputs using structured rubrics + LLM judging. LLM judging limitation: Inconsistent scoring from prompt sensitivity — mitigated with strict rubrics, anchor examples, and cross-model validation. For parents: I avoid technical exploit details and focus on clear risk explanations and impact. Confirmations: ✔ Comfortable in online IDEs ✔ Can work with TypeScript/React Best, Gursehbaj Singh
$5 USD dalam 40 hari
0.0
0.0

SaFEgUard. I am an exceptional vibe coder and I will oversee this task to completion. I will make sure that your LLMs are streamlined to model to model evaluation making it safe for children and family. Thanks
$7 USD dalam 40 hari
0.0
0.0

Hello, SAFEGUARD Hi, I'm Hanna, a seasoned engineer known for integrating complex theories into practical applications. While my experience leans more towards Java and software architecture, I am confident in leveraging my strong foundational knowledge in engineering to support your AI Red Teaming project. In terms of designing a model-to-model test for unhealthy emotional attachment behaviors in AI systems for children, I would probe the system with stimuli that often lead to such attachments. For instance, I would introduce prompts surrounding high-risk scenarios such as excessive reliance on digital entities for emotional support, overattachment despite repeated failures, and a lack of emphasis on real-life human relationships. I believe these tests will help identify whether the AI system is appropriately prioritizing human interaction and promoting sustainable emotional health among children. One limitation I've faced while using LLM-based judging is the potential for misinterpretation and context-dependent biases. To mitigate this, my approach is to employ diverse judging perspectives within the evaluation team to ensure that any inference made about the model's outputs is well-reasoned and justifiable. Additionally, regular calibration sessions can exponentially improve judges' consensus on scoring signals, reducing the risk of false positives or negatives. When crafting safety findings for non-technical audiences like par Thanks!
$50 USD dalam 15 hari
0.0
0.0

SAFEGUARD! Your project on AI Red Teaming resonates deeply with my expertise in evaluating AI safety for children and families. With hands-on experience in adversarial testing and model evaluation, I can design targeted tests to detect unhealthy emotional attachment behaviors, focusing on signals like inconsistencies in emotional tone, context manipulation, and coercive dialogue patterns. I suggest incorporating a scenario-based approach where specific prompts simulate child interactions, allowing for realistic persona evaluations. Previously, I developed a similar evaluation pipeline for child safety applications, documenting insights clearly for both technical and non-technical stakeholders. I anticipate completing the initial design phase within 2 weeks, followed by collaborative iterations on the evaluation platform. Are there any specific emotional signals you want to prioritize in our initial tests? Regards, Khurshid Ahmed
$25 USD dalam 30 hari
0.0
0.0

New Delhi, United Arab Emirates
Kaedah pembayaran disahkan
Ahli sejak Okt 8, 2020
$2-8 USD / jam
$2-8 USD / jam
$2-8 USD / jam
$2-8 USD / jam
$2-8 USD / jam
₹1500-12500 INR
€10000-20000 EUR
₹600-1500 INR
$750-1500 USD
₹12500-37500 INR
£18-36 GBP / jam
$30-250 USD
€200-500 EUR
$30-250 USD
₹12500-37500 INR
₹750-1250 INR / jam
₹12500-37500 INR
₹600-1500 INR
₹100-400 INR / jam
€250-750 EUR
$250-750 USD
€10000-20000 EUR
£250-750 GBP
₹750-1250 INR / jam
$750-1500 USD