Easily build production-ready

Viral Clip Finder

in days not months

Create powerful media processing pipelines with just one API call. Our open-source platform makes it simple to build and scale your workflows.

What is Mediatoad?

Mediatoad is a open-source, well-structured, enterprise-grade media processing platform that emphasizes developer experience, scalability, and maintainability while providing powerful media processing capabilities through a unified API.

npm install @mediatoad/sdk

.transcription()

Transcribe your audio and video files with high accuracy and speed.

const mt = new MediaToad("API_KEY");

const result = await mt.createJob({
  jobId: 'transcription-job-mov_bbb', // Unique identifier for the job
  assets: [
    {
      name: 'mov_bbb.mp4', // Name of the media file
      url: 'https://www.w3schools.com/html/mov_bbb.mp4', // URL to fetch the media file
    },
  ],
  tasks: [
    {
      id: 'transcription-task-xzYGa6N1F9nyGEubqvXH9', // Unique identifier for the transcription task
      operation: 'transcription', // Task type: "transcription"
      asset: 'mov_bbb.mp4', // asset picked from "assets" to transcribe
      provider: 'assemblyai', // Third-party transcription provider
      language: 'hi', // Language of the transcription
      
      notify: {
        url: 'https://webhook.site/<task-notification-url>', // Webhook url to receive task status updates
      },
      apiKey: process.env.ASSEMBLYAI_API_KEY, // API key for the transcription provider (should be securely stored)
    },
  ],
  storage: {
    bucket: 'my-bucket', // Name of the storage bucket where results will be saved
    base: 'transcripts/', // Folder path in the bucket for storing transcripts
    storageType: 's3', // Storage type (could be s3 or az for s3 or azure blob storage)
  },
  notify: {
    url: 'https://webhook.site/<job-notification-url>', // Webhook to receive job-level status updates
  },
});

File or URL

Input audio/video file or URL

DONE

S3 Upload

Upload to S3 bucket

DONE

Upload Notification

File ready for processing

DONE

Transcription

Using Deepgram

RUNNING

Result and Notification

Transcription result

PENDING

Key Features

Multi-Provider Support
AWS, Azure, AssemblyAI, Deepgram, Google and more
Fallback System
Automatic provider fallback for high reliability
Parallel Processing
Run multiple tasks simultaneously
Webhook Notifications
Real-time updates on job progress

View Documentation

.segmentation()

Automatically detect and segment video content into meaningful scenes.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Video Input

Input video file

DONE

Segmentation

Scene detection

RUNNING

Output

Segmented clips

PENDING

Key Features

Smart Scene Detection
AI-powered scene boundary detection
Custom Duration
Configurable minimum segment length
Keyframe Extraction
Automatic thumbnail generation
Metadata Export
Detailed segment information

View Documentation

.findViralClips()

Automatically identify potentially viral moments in your video content.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Video Input

Input video file

DONE

Analysis

AI Processing

RUNNING

Output

Viral moments

PENDING

Key Features

AI Analysis
ML-powered moment detection
Virality Score
Engagement potential prediction
Auto-Clip Generation
Ready-to-share viral moments
Platform Optimization
Format for different social platforms

View Documentation

.translation()

Translate content across multiple languages with high accuracy.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Content Input

Audio/Text input

DONE

Translation

Multi-language

RUNNING

Output

Translated content

PENDING

Key Features

100+ Languages
Support for major global languages
Context-Aware
Maintains context and nuance in translations
Batch Translation
Process multiple files simultaneously
Quality Metrics
Confidence scores for translations

View Documentation

.clip()

Extract and create clips from your video content with frame-perfect precision.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Video Input

Input video file

DONE

Clip Processing

Precise extraction

RUNNING

Export & Encoding

Format conversion

PENDING

Output

Final clip

PENDING

Key Features

Frame-Perfect Cutting
Precise timecode-based extraction
Batch Clipping
Create multiple clips in one job
Format Conversion
Export to various formats and resolutions
Preview Generation
Automated thumbnail creation

View Documentation

.dub()

Automatically generate voiceovers for your videos in multiple languages.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Video Input

Original video

DONE

Transcription

Extract original speech

DONE

Translation

Convert to target language

RUNNING

Voice Synthesis

Generate voiceover

PENDING

Audio Mixing

Combine with original audio

PENDING

Output

Dubbed video

PENDING

Key Features

Multi-Language Support
Support for 40+ languages and dialects
Multiple Voice Options
Various voice profiles per language
Audio Synchronization
Automatic lip-sync adjustment
Background Preservation
Maintains original music and effects

View Documentation

.textToSpeech()

Convert text to natural-sounding speech with customizable voices.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Text Input

Text content

DONE

Voice Selection

Voice profile

DONE

Speech Synthesis

Audio generation

RUNNING

Output

Audio file

PENDING

Key Features

Natural-Sounding Voices
High-quality, lifelike speech synthesis
Multi-Language Support
Generate speech in 30+ languages
Voice Customization
Adjust pitch, speed, and emphasis
Multiple Audio Formats
Export as MP3, WAV, or OGG

View Documentation

.generateVideo()

Create professional videos programmatically from templates and content.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Template & Content

Input configuration

DONE

Asset Preparation

Media processing

DONE

Scene Composition

Visual assembly

RUNNING

Rendering

Video generation

PENDING

Output

Final video

PENDING

Key Features

Template Library
Pre-designed professional templates
Dynamic Content
Automatically populate with your data
Custom Branding
Add logos, fonts, and color schemes
Batch Generation
Create multiple videos at scale

View Documentation

.faceDetection()

Identify and track objects, faces, and people in your video content with high accuracy.

const mt = new Mediatoad('API_KEY')
const data = await mt.transcribe('url', { provider: 'deepgram' })
const { speakers, transcription } = data

Video Input

Input video file

DONE

Frame Extraction

Prepare keyframes

DONE

Object Detection

AI processing

RUNNING

Tracking

Object tracking

PENDING

Output

Detection results

PENDING

Key Features

Multi-Object Detection
Identify faces, people, objects and custom classes
Temporal Tracking
Track objects across video frames
Custom Models
Support for specialized detection models
Detailed Metadata
Bounding boxes, timestamps, and confidence scores

View Documentation

Powerful Media Processing Features

MediaToad provides a comprehensive suite of tools for handling all your media processing needs.

Workflow Orchestration

Complex multi-step operations
State tracking and resumption
Scalable processing architecture

Media Processing Engine

Built on FFmpeg with an intuitive API layer

Video/audio encoding and transformations
Scene detection and segmentation
Thumbnail generation and metadata extraction

Cloud-Agnostic Storage

Works with AWS S3, Azure Blob Storage, MinIO, and more

No vendor lock-in
Optimized for large-file handling
Seamless media uploads/downloads

AI & Automation

Integrated AI-driven video analysis and processing

Speech-to-text transcription
Object/person detection
Content moderation

Job Tracking & Notifications

Real-time monitoring and integration capabilities

Webhook notifications
Server-Sent Events (SSE)
Automatic retries and failure handling

Developer-Friendly API

Simple JSON-based job definitions

Intuitive job configuration
Comprehensive documentation
Flexible integration options

Global Infrastructure

Deploy and scale your media processing workflows across the globe with ease

Roadmap & Changelog

Upcoming Features

Text-to-Speech (TTS)

in-progress

AI-driven voice synthesis from transcripts

Expected: Q2 2025

Programmatic Video Generation

planned

Automated creation of media content from templates

Expected: Q2 2025

Advanced Face/Object Detection

planned

Enhanced video analytics powered by AI

Expected: Q2 2025

Real-time Processing

planned

Support for live streaming and real-time analysis

Expected: Q2 2025

Recent Updates

v1.2.0

Mar 15, 2024

Added support for WebM format
Improved transcoding speed by 40%
New REST API endpoints

v1.1.0

Mar 1, 2024

Introduced batch processing
Fixed memory leaks in long operations
Updated documentation

How MediaToad Works

A simple, powerful workflow for all your media processing needs.

Submit via API

Define your media operations with a simple JSON request

Temporal Orchestration

Workflows are managed with retries, state tracking, and scalable processing

Media Processing

FFmpeg and AI services handle encoding, transcription, and analysis

Storage & Delivery

Processed files are stored in your specified cloud backend

Notifications

Receive webhook/SSE notifications when jobs complete

Analytics & Insights

Track performance metrics and monitor workflow efficiency

View API Documentation

Who Benefits from MediaToad?

MediaToad serves a wide range of users with diverse media processing needs.

For LLM Providers

Streamline your media processing pipeline for production

LLM providers can leverage MediaToad to efficiently process and analyze large volumes of media content, enabling better training data preparation and content moderation.

Scale media processing for training data
Automate content moderation workflows
Integrate with existing AI systems

For Developers

Enhance developer experience with simplified media handling

Developers can focus on building great applications without worrying about the complexities of media processing, storage, and workflow management.

Intuitive API for complex media operations
Reliable, fault-tolerant processing
Flexible integration with existing systems

For Content Platforms

Scale your media infrastructure reliably

Content platforms can handle growing media libraries with a scalable, reliable solution that adapts to changing requirements and traffic patterns.

Process user-generated content at scale
Automate thumbnail and preview generation
Implement content analysis and moderation

For Enterprises

Avoid vendor lock-in with a flexible solution

Enterprises can maintain control over their media processing infrastructure while avoiding the limitations and costs of proprietary solutions.

Cloud-agnostic architecture
Customizable to specific business needs
Integrate with existing enterprise systems

Ready to Transform Your Media Processing?

Get started with MediaToad today and experience the power of open-source media processing at scale.