SKILL FILE

Local Video Clip Engine with AI

Build a local Opus Clip replacement — Whisper for transcription, FFmpeg for rendering, ASS karaoke captions, zero subscriptions.

$0/mo vs $20-40/mo for Opus Clip

~3s per 60s clip (VideoToolbox)

500 lines of TypeScript total

Download Skill File ↓

CROSS-DEPARTMENT FLOW

How video content flows across your company

One recording generates clips, quotes, and assets for every department — automatically

Video Recorded Podcast, interview, webinar

1 Whisper Transcription

2 AI Identifies Clip-Worthy Segments

3 FFmpeg Renders Clips

4 ASS Captions Overlaid

→ Social clips for Reels & Shorts
→ YouTube Shorts publishing
→ Carousel content from quotes
→ Blog post pull-quotes

→ Demo highlight reels
→ Testimonial clips for outreach
→ Case study video snippets
→ Proposal video attachments

→ Engagement-optimized clip formats
→ A/B test thumbnails
→ Platform performance tracking
→ Content velocity metrics

→ Clip links on contact records
→ Meeting highlight reels
→ Video engagement events logged

Instagram Reels

YouTube Shorts

LinkedIn Video Posts

Quote Carousels

Podcast Audiograms

Content pieces tracked

Clip performance logged

Source video linked

Replaces Opus Clip

$19/mo $0/mo

$228/yr saved

Video Recorded Podcast, interview, webinar

Whisper Transcription Word-level timestamps with speaker diarization — runs locally, $0

AI Identifies Clip-Worthy Segments Heuristic scorer ranks segments by engagement signals — $0.006/min

FFmpeg Renders Clips Hardware-accelerated crop, encode, and format — ~3s per 60s clip

ASS Captions Overlaid Word-by-word karaoke animation burned into each clip

→ Social clips for Reels & Shorts
→ YouTube Shorts publishing
→ Carousel content from quotes
→ Blog post pull-quotes

→ Demo highlight reels
→ Testimonial clips for outreach
→ Case study video snippets
→ Proposal video attachments

→ Engagement-optimized clip formats
→ A/B test thumbnails
→ Platform performance tracking
→ Content velocity metrics

→ Clip links on contact records
→ Meeting highlight reels
→ Video engagement events logged

Content Outputs

Instagram Reels from marketing

YouTube Shorts from marketing

LinkedIn Video Posts from marketing

Podcast Audiograms from sales

Quote Carousels from marketing

Everything Tracked

Content pieces tracked

Clip performance logged

Source video linked

Replaces Opus Clip

$19/mo → $0/mo

$228/yr saved

REPLACES

Cancel your Opus Clip subscription

CANCEL THIS

Opus Clip

$20/mo

× Subscription fees
× Data locked in their dashboard
× Per-seat pricing
× Export limits

BUILD THIS

SoloStack + Claude Code

$0/mo

✓ Pay-per-use, no subscription
✓ Your data in your repo
✓ Zero vendor lock-in
✓ Unlimited exports

Save $240/year

WHAT YOU GET

What this skill file teaches Claude

Drop one markdown file into your repo. Claude Code learns how to run this entire workflow.

Word-by-Word Karaoke Captions

ASS subtitle \kf tags progressively fill each word with brand yellow (#FEBB02) as it's spoken — the exact same visual effect as Opus Clip, with full control over fonts, colors, and positioning.

Automatic Clip Selection

Heuristic scorer slides a window across the transcript and ranks segments by engagement signals: word density, sentence boundaries, questions, keyword triggers like 'here's the thing' and 'the key is'.

Hardware-Accelerated Rendering

FFmpeg's VideoToolbox encoder uses Apple Silicon hardware to render clips 10-20x faster than software encoding. A 60-second clip renders in ~3 seconds at 8Mbps quality.

Flexible Output Formats

Crop to vertical (9:16 for Reels/TikTok), square (1:1 for Instagram feed), or keep original aspect ratio. One command flag controls the output format.

Fully Local & Private

Nothing leaves your machine. No cloud uploads, no API keys for transcription, no render queues. Your raw podcast footage stays on your disk.

Human-Editable Subtitles

Captions are standard ASS subtitle files — open them in any text editor to fix typos, adjust timing, or change styling before the final render.

Apify Actor: N/A — fully local pipeline · $0 (runs entirely on your machine)

IN ACTION

Build it with plain English

Tell Claude Code what to do. It handles the rest.

claude — solostack/

you: |

Processing podcast.mp4 (45:12, 1920x1080, 30fps)...

✓ Transcribed 45:12 with Whisper medium (8,247 words)
✓ Scored 156 candidate windows
✓ Selected top 5 clips (38s, 62s, 45s, 71s, 54s)
✓ Generated karaoke captions for 5 clips
✓ Rendered 5 vertical clips (9:16) via VideoToolbox

Output:
  clips/clip-001.mp4 (38s) — score: 87
  clips/clip-002.mp4 (62s) — score: 84
  clips/clip-003.mp4 (45s) — score: 81
  clips/clip-004.mp4 (71s) — score: 78
  clips/clip-005.mp4 (54s) — score: 74

Total render time: 14s

you: |

Transcribing interview.mp4 with Whisper medium...

✓ 32:18 transcribed (5,891 words, 94% avg confidence)

Top 8 clip candidates:
  #1 [12:04-13:12] score:91 — "Here's what most people get wrong about..."
  #2 [05:22-06:30] score:88 — "The key insight we discovered was..."
  #3 [24:15-25:18] score:85 — "If I had to start over tomorrow..."
  #4 [08:41-09:52] score:82 — "Let me tell you exactly what happened..."
  #5 [18:33-19:28] score:79 — "The biggest mistake I see is..."
  ...

Use: npx tsx clip.ts render interview.mp4 --clips 1,3,5

you: |

Rendering clips 1, 3, 5 in square format (1:1)...

✓ clip-001.mp4 — 68s, 1080x1080, captions burned ✓ (2.8s render)
✓ clip-003.mp4 — 63s, 1080x1080, captions burned ✓ (2.6s render)
✓ clip-005.mp4 — 55s, 1080x1080, captions burned ✓ (2.3s render)

3 clips rendered in 7.7s total
Output: clips/

USE CASES

What you can build with this

Podcast clip repurposing

Turn a 60-minute podcast episode into 5-10 vertical clips with animated captions, ready for TikTok, Reels, and YouTube Shorts — in under 2 minutes of render time.

Interview highlight reels

Score and extract the most quotable moments from recorded interviews. The heuristic scorer catches questions, keyword triggers, and natural pause boundaries.

Course content snippets

Chop online course recordings into bite-sized clips for social media promotion. Each clip gets word-by-word captions that make it watchable on mute.

Webinar repurposing

Extract the best segments from hour-long webinars for LinkedIn, Twitter, and email campaigns. Square format for feed posts, vertical for Stories/Reels.

IMPORTANT

Things to know

Whisper's medium model downloads ~1.5GB on first run. After that, it's cached locally and runs offline.

Transcription runs at ~1x realtime on M-series Macs — a 60-minute podcast takes ~60 minutes to transcribe. Use the 'small' model for faster (but less accurate) results.

Clip selection uses heuristics, not AI — it finds likely-good segments, but you should review the candidates and pick your favorites. The selection is 80% as good as AI-powered tools.

Center-crop works for most talking-head content. If your video has important action at the edges, you may need to adjust the crop offset manually.

COMPLETE SKILL FILE

Get the full skill file

Everything above is 80% of the skill file. Download the complete version with full implementation details, agent prompts, and ready-to-run scripts.

FAQ

Common questions

How does this compare to Opus Clip? ▼

Opus Clip uses AI to select clips and generates word-by-word captions. This tool does both — clip selection via heuristic scoring (word density, questions, keyword triggers) and captions via ASS karaoke subtitles (\kf tags). The visual caption effect is identical. The main difference: Opus Clip's AI selection may catch some clips the heuristics miss, but you review clips manually anyway. For $0/mo vs $20-40/mo, the trade-off is worth it.

Do I need an API key or cloud account? ▼

No. Everything runs locally. Whisper runs on-device (no OpenAI API key needed — it's the open-source model, not the API). FFmpeg is a local binary. Nothing is uploaded anywhere.

What hardware do I need? ▼

Any Mac with Apple Silicon (M1/M2/M3/M4) runs this well. Whisper medium model uses ~2GB RAM. FFmpeg's VideoToolbox hardware encoder is built into every Mac. On Intel Macs or Linux, it works too — just slower (software encoding via libx264 instead of VideoToolbox).

Can I customize the caption style? ▼

Yes — the captions are standard ASS subtitle files. You can change the font, size, color, outline thickness, position, and animation timing by editing the ASS style definition. Default: white text with brand yellow (#FEBB02) karaoke fill, 4px black outline, bold, bottom-center.

What video formats are supported? ▼

Any format FFmpeg can read — MP4, MOV, MKV, AVI, WebM, and more. Output is always MP4 with H.264 video and AAC audio, optimized for social media playback.

Can I process multiple videos in batch? ▼

Yes. The CLI accepts multiple input files or a glob pattern. Each video is processed sequentially (transcribe → select → caption → render) to avoid memory issues with Whisper.

RELATED SKILLS

Keep building your stack

Ready to automate?

SoloStack gives you every skill pre-installed — scraping, marketing, sales, CRM, and more. One repo. Every department.

Book a Call →

Local Video Clip Engine with AI

How video content flows across your company

Cancel your Opus Clip subscription

Opus Clip

SoloStack + Claude Code

What this skill file teaches Claude

Word-by-Word Karaoke Captions

Automatic Clip Selection

Hardware-Accelerated Rendering

Flexible Output Formats

Fully Local & Private

Human-Editable Subtitles

Build it with plain English

What you can build with this

Podcast clip repurposing

Interview highlight reels

Course content snippets

Webinar repurposing

Things to know

Get the full skill file

Common questions

Keep building your stack

Related Solutions

Free CRM

Free Email Marketing

Free Scheduling

Free Website Builder

Ready to automate?

Local Video Clip Engine with AI

How video content flows across your company

Cancel your Opus Clip subscription

Opus Clip

SoloStack + Claude Code

What this skill file teaches Claude

Word-by-Word Karaoke Captions

Automatic Clip Selection

Hardware-Accelerated Rendering

Flexible Output Formats

Fully Local & Private

Human-Editable Subtitles

Build it with plain English

What you can build with this

Podcast clip repurposing

Interview highlight reels

Course content snippets

Webinar repurposing

Things to know

Get the full skill file

Get the Local Video Clip Engine Skill File

Local Video Clip Engine Skill File

Common questions

Keep building your stack

AI Meeting Intelligence

Ebook & Lead Magnet Generation

LinkedIn Carousel Design

Related Solutions

Free CRM

Free Email Marketing

Free Scheduling

Free Website Builder

Ready to automate?