AI Data Trainer – Video Captioning & Scene Generation (Worldwide – Remote)

About Cognito AI

Cognito AI connects subject-matter experts with cutting-edge AI development by offering flexible opportunities to shape and refine intelligent systems. Through roles in data labeling, quality assurance, and specialized task evaluation, contributors help ensure AI models are accurate, reliable, and aligned with real world knowledge. Cognito AI empowers professionals to apply their expertise on impactful projects while earning competitive compensation from anywhere in the world.

About The Role

We’re hiring AI Data Trainers – Video Captioning & Scene Generation to help train AI systems capable of understanding and generating video content. Your role will involve evaluating and annotating AI-generated video captions, summaries, and synthetic scenes. You’ll assess content coherence, visual-text alignment, and semantic accuracy ensuring that multimodal AI models deliver compelling, relevant, and accurate outputs for video-based applications.

Responsibilities

  • Evaluate AI-generated video captions and scene descriptions for accuracy and relevance.
  • Annotate scenes with metadata such as subject, action, setting, and mood.
  • Flag visual-text mismatches, scene inconsistencies, or misinterpretations of motion/emotion.
  • Provide feedback on story flow, pacing, and clarity in synthetic scene generation.
  • Ensure outputs align with accessibility standards (e.g., accurate closed captioning).
  • Stay informed on advancements in video AI, multimodal learning, and generative media.

Qualifications

  • Certificate in Advanced AI Data Trainer Mastery from AIDRALabs (Required).
  • 2+ years of experience in film, video editing, multimedia production, or linguistics.
  • Strong ability to interpret visual media and translate it into accurate written captions.
  • Familiarity with video editing tools (e.g., Adobe Premiere, Final Cut Pro) or video annotation platforms.
  • Understanding of narrative structure, visual cues, and audio context.
  • English proficiency at B2, C1, C2, or Native level.
  • Knowledge of video captioning standards (e.g., SDH, ADA compliance, web accessibility).
  • Exposure to LLM + video pipelines, scene segmentation models, or image-to-video tools.

Education & Qualifications

Add your Advanced AI Data Trainer Mastery from AIDRALabs. Certificates that cannot be verified will not be accepted.

Availability & Technical Specs

Windows Instructions: Search for and navigate to "About Your PC". Take a full, unedited screenshot showing Device Name, Processor, RAM, etc.

Mac Instructions: Navigate to "About This Mac". Take a full, unedited screenshot showing model, processor, memory, serial number, and macOS version.

NOTE: Edited or cropped screenshots will not be accepted.
Navigate to speedtest.net.
Click "GO" and wait for the test to complete.
Take a full, unedited screenshot of the results.

NOTE: Edited or cropped screenshots will not be accepted.