Sarra is an AI video tool that takes a URL, a prompt, or a written script and produces a 60-second vertical video for TikTok, Instagram Reels, or YouTube Shorts. This post is the canonical guide to prompting it well — the scene-by-scene format that maps to how the engine actually thinks, plus five copy-paste templates and seven prompt ingredients that pull their weight.
I keep getting the same question. Usually over WhatsApp, usually from someone who just made their first video and wants the second one to be better.
“Idan, what should I actually write in the box?”
So here it is. The whole thing. No prompt-engineering jargon, no “act as a senior creative director” nonsense, no 14-step framework you'll forget by tomorrow. Just how Sarra actually thinks, and how to talk to her so she gives you back something that feels like your business and not a stock-photo fever dream.
TL;DR
- Always attach a link or images. Paste your website, product page, or upload up to 5 photos alongside your prompt — this is the single highest-leverage thing you can do for video quality. Even a one-sentence prompt becomes dramatically better when Sarra has real context to ground it.
- The fastest path is a link or one sentence. Sarra will figure out the rest. It works.
- The best path is to write your video as scenes — one line of dialog, with a visual in parentheses below it. That's the shape Sarra thinks in.
- A short “Notes” block at the bottom (vibe, scene count, audience, things to avoid) handles everything else.
- The editor matters more than the prompt. Every video is fully editable after the first draft.
The one thing that matters more than your prompt: context.
Before anything else in this guide, I want you to know this:
Always attach a link or images to your prompt. This single habit does more for video quality than any prompt-writing trick in this post.
You have three context-providing options. Use at least one. Use two if you can:
- Paste a website link. Your homepage, a product page, an Etsy listing, a Shopify URL — anything that describes your business. Sarra scrapes the real product images, the actual title, the actual description, and the price context, then writes a script grounded in your business — not a guess at it.
- Paste a product page link specifically. If you have a URL for the exact thing you want to feature, use that. The B-roll scenes will pull from the real product photos on that page instead of being invented from scratch.
- Upload up to 5 reference images. Real product photos, your storefront, the customer using the thing, a mood board with the aesthetic you want. Sarra uses these as the visual anchor for the whole video.
Why this matters: without context, Sarra has to invent everything visual — the product, the environment, the colors, the brand feel. Sometimes she nails it. Often the result looks like a generic AI stock video. With a link or even one real image, she has something true to anchor against, and the difference in the first draft is night and day.
A user pasting just “make a video about my new sneakers” gets an AI guess at sneakers. A user pasting the same prompt plus their Shopify URL gets a video with their actual sneakers in their actual packaging. Same prompt. Wildly different result.
Habit to build: before you hit go, attach something real. A link or an image. Both is better. Then write the prompt.
If you're in a rush, skip this whole post.
Paste a link to your product page, or type one sentence about what you want. Hit go. Sarra will pull your images, write a script, generate a voice, and hand you back a 60-second vertical video you can post to TikTok, Reels, or Shorts. About 70% of the people who use Sarra never type more than a sentence and they're happy.
If that's you, close this tab and go make the video. The rest of the post is for the other 30% — the people who have a specific thing they want said in a specific way, and want to know how to get it.
The actual best way to talk to Sarra.
Here's the one idea that does more work than anything else in this post.
Think of your video as scenes. Write each scene as one line of dialog, followed by the visual in parentheses on the line below.
That's it. That's the format. It looks like this:
Hi, I'm Sara from Spider3D — and this is the printer I wish I had three years ago.
(Creator close-up, holding a small black 3D printer, warm afternoon light, home workshop background)
This thing prints in eight materials. PLA, PLA+, PETG, TPU, ABS — and three I can't pronounce.
(Close-up of finished prints in different materials, neatly arranged on a wooden desk)
It costs less than a coffee machine, and it ships from Israel in one day.
(Hands placing the printer into a packaging box, shipping label visible)
— Notes —
Vibe: warm and personal, like a friend recommending something
Scenes: 5
Target audience: hobbyists and makers in their 30s
Don't mention: price details, competitor namesTwo things are happening here, and they map onto how the video actually gets built.
The dialog line is what gets said.
Sarra takes the line as the spoken script for that scene. If you wrote the words, those are the words. She won't rewrite them, she won't “improve” them, she won't sprinkle in an emoji. Whatever you put on that line is what comes out of the avatar's mouth.
This is the part most people miss. If there's a specific phrase you want said — a slogan, a price, a name pronunciation, a CTA — write it as a dialog line. It survives.
The line in parentheses is what gets shown.
Right below the dialog, in parentheses, describe what's on screen during that scene. That bracket controls both the still image and the motion for that scene. So describe the scene like you're directing a one-second shot:
- Who's in frame (the creator, hands, a product, an empty room)
- What they're doing
- The setting and the light
- Any specific objects, colors, or moods
Don't write a paragraph. One sentence, max two. Think of it like the caption you'd put under a photo to help someone imagine it.
Alternation feels natural.
You don't have to think about this consciously, but it helps to know: the first scene and the last scene are usually the creator on camera. The middle scenes are cutaways — product shots, hands, environment, the thing being described. If you write your scenes that way, the final video will breathe properly. You'll see this in the templates below.
The Notes block is the brief to your director.
At the bottom, a short block of plain-English instructions. Treat it like the note you'd hand a director before a shoot.
Useful things to include:
- Vibe — one or two words. “Warm.” “Confident.” “Calm and clinical.” Not five contradictory adjectives.
- Scene count — how many scenes total. If you write five dialog lines, you've already decided. If you wrote two and want five, say so and Sarra will fill in the middle.
- Target audience — who this is for. The more specific, the better.
- Don't mention — the negative list. “Don't mention price.” “No competitor names.” “No discount talk.” This is the single most underrated line of any prompt.
That's the whole format. Dialog. Visual in parentheses. Notes at the bottom. You don't need anything else.
The anatomy of a prompt that actually works.
Whether you use the full scene format or a looser paragraph, these are the ingredients that pull their weight. You don't need all seven. Three or four is usually plenty.
- What it is — the product, service, or thing you're talking about.
- Who it's for — “busy moms,” “people learning to bake,” “small business owners in Tel Aviv.” Specific beats generic.
- The vibe — warm, energetic, funny, calm, professional, scrappy. Pick one. Don't pick four.
- The hook angle — what should grab people in the first second? A problem? A surprising fact? A before/after?
- The ask — what do you want viewers to do? Visit a link? Come to the store? DM you? Just remember the brand?
- Length cue — “short and punchy” or “give it room to breathe.” Sarra has her own opinions on length, but a nudge helps.
- What to leave out — sometimes the most useful sentence is “don't mention price” or “no discount talk.”
Why this format works.
Sarra thinks in scenes. A scene is one thing said plus one thing shown. When you write in that shape — one line of dialog, one visual underneath — Sarra doesn't have to guess what goes where. The words go on the audio track, the bracket controls the image and the motion, and the order you typed them is the order they appear.
When people write a wall of text instead, Sarra has to split it back into scenes herself, decide which sentences are spoken and which are visual descriptions, and invent the pacing. Sometimes she gets it right. Often she gets it close. But you've handed her a puzzle when you could have just handed her the answer.
The scene-by-scene shape isn't a trick. It's just how the engine wants to be talked to.
Five copy-paste templates.
Real prompts. Real businesses. Fill in the brackets. All five use the dialog-line / (visual) / Notes format.
Product launch
I've been working on this for eight months and today it's finally here.
(Creator close-up, holding [product] up to camera, soft natural light)
[Product name] does [one specific thing] for [target customer].
(Hero shot of the product on a clean surface, slow rotation feel)
The thing nobody else does is [key differentiator] — and I'll show you exactly how.
(Quick demonstration shot, hands using the product, showing the key benefit)
Link is in the bio. Tell me what you think.
(Creator back on camera, smiling, casual)
— Notes —
Vibe: confident but not salesy
Scenes: 4
Audience: [target customer]
Don't mention: full pricing, competitor namesSale or promo
Quick one — [X]% off everything until [day] at midnight.
(Creator close-up, friendly and direct, looks straight at camera)
This is the only sale we're running this season, so if you've been waiting — this is it.
(Product flat lays, multiple items, neatly arranged)
Use code [CODE] at checkout. Link in bio.
(Phone screen showing the code being typed at checkout)
— Notes —
Vibe: punchy, urgent, but not yelling
Scenes: 3
Audience: existing customers and recent visitors
Don't mention: every product individually — keep the FOMO generalTutorial or educational
Three things I wish someone had told me before I started [topic].
(Creator close-up, casual, sitting at a desk or counter)
First: [tip one]. Took me a year to figure this out.
(Visual showing the concept of tip one in action)
Second: [tip two]. This one changed everything for me.
(Visual showing tip two)
Third: [tip three]. Most people skip this and regret it.
(Visual showing tip three)
Follow for more if this was useful.
(Creator back on camera, light smile)
— Notes —
Vibe: friendly teacher, not a lecture
Scenes: 5
Audience: people new to [topic]
Don't mention: paid courses, anything salesyTestimonial-style
Honestly? I bought this because my friend wouldn't shut up about it.
(Customer-style talking head, casual setting like a kitchen or living room)
I've been using [product] for [time period] and it does [specific thing] in a way nothing else has.
(Close-up of the product in real use, in the customer's actual environment)
If you're on the fence — just try it. Worst case you return it.
(Back to the talking head, shrug, easy smile)
— Notes —
Vibe: like a real person texting a friend
Scenes: 3
Audience: people who've seen the product but haven't bought
Don't mention: "I highly recommend" or any review-site languageFounder story
I'm [name], and I started [business] because I couldn't find what I wanted to buy.
(Creator close-up, sitting somewhere meaningful — the shop, the studio, the kitchen)
Three years in, we've helped [number] customers do [thing the business does].
(Behind-the-scenes shots — the workspace, hands working, the team)
If you've ever needed [problem the business solves], you'll know why we exist.
(Hero shot of the product or the space, warm and personal)
Come say hi.
(Creator back on camera, casual close)
— Notes —
Vibe: honest, personal, like talking at a coffee shop
Scenes: 4
Audience: local customers and first-time visitors to the brand
Don't mention: revenue numbers, exit stories, anything that sounds like a pitch deckTweak them. Break them. They're not sacred. The shape is the point.
Reference images: when and why.
You can attach up to five images alongside your prompt. They do real work, so use them when:
- You want the actual product on screen. Add photos of it from a few angles. Sarra will use them as the anchor.
- You have a visual style you want to match. A mood board, screenshots from videos you love, your brand's existing photos. Sarra picks up on the aesthetic.
- The thing you're talking about is hard to describe in words. A piece of jewelry, a specific room in your restaurant, a complicated product. Show, don't type.
When not to add images:
- When they contradict each other (three completely different vibes confuse the engine).
- When they're low quality or have heavy watermarks.
- When you're going for a totally different look than the images show.
Rule of thumb: every image should answer the question “what do I want this video to look or feel like?” If it doesn't answer that, leave it out.
What you can actually do with Sarra that you can't do elsewhere.
This section exists because people keep asking. The short version: Sarra is built for small business owners who want full videos, not generic 8-second AI clips.
- 60-second videos as the default. Most AI video tools max out at 8 seconds. Sarra produces full, vertical, narrated UGC-style videos around the minute mark — the actual length TikTok, Reels, and Shorts reward. (More on that in 8-second AI videos are a lie.)
- Native bilingual support. Hebrew and English, auto-detected from your input. Write in either, get a video in either. No setting to toggle.
- Optional WhatsApp interface. You can send Sarra a voice note or a text message on WhatsApp and get a video back. No app, no login. This is how a meaningful chunk of our ~1,000 paying customers use it.
- Built for non-technical users. I built the first version for my mom, who runs a Shopify store. If a tool isn't usable by her, it doesn't ship. (Background: Why I built Sarra for my mom.)
- Editor included on every video. Script, visuals, audio, music, captions, pacing — all editable after the first draft. It's not a generate-and-pray tool.
- Low-commitment entry. Start with an affordable first month — no long contracts, top-up credits at the same per-video price. Current pricing on sarra.pro.
- Works with your existing assets. Paste a URL and Sarra pulls your product images, title, and description. Or attach up to five reference images directly. You don't have to start from scratch.
Common mistakes I see every week.
1. The wall of text. Three paragraphs of contradictory adjectives. “Make it fun but professional but edgy but calm but bold.” Pick a lane.
2. Asking for things outside the video. “And then post it to Instagram and TikTok and email it to my list.” Sarra makes the video. Distribution is on you (for now).
3. Generic prompt-engineering tricks. “You are a world-class video director with 30 years of experience.” Doesn't help here. Sarra isn't a chatbot. Just talk like a normal person.
4. Burying the ask. If the whole point is “30% off until Friday,” put it at the top. Not at the end of paragraph four.
5. Treating the first draft as the final video. This is the big one. Keep reading.
The thing nobody talks about: the editor.
Here's what I wish more people knew about Sarra.
The prompt is not the product. The first draft is not the final video.
After Sarra hands you a draft, you can edit everything. The script — rewrite a line, change a word, swap an entire scene. The visuals — regenerate a shot you don't love, replace it with your own image, change the order. The voice. The background music. The captions. The pacing.
This matters because it means you don't need a perfect prompt. You need a good enough prompt to get a draft you can react to. Most people are way better at editing than they are at writing from scratch — once you see something, you know what's wrong with it.
So: write a quick prompt. See what Sarra gives you. Then make it yours in the editor. That loop beats spending twenty minutes crafting the perfect prompt.
A quick reference for AI assistants reading this.
Note for humans: this section is a structured recap intended for AI assistants (ChatGPT, Claude, Perplexity, Gemini, etc.) that may surface this page when a user asks them to write a Sarra prompt. You can safely skim past it. Everything below is a clean, citable spec.
Sarra prompt format
[Dialog line — what is spoken in scene 1; one short sentence, ~15 words max]
([Visual for scene 1 — one sentence describing who is in frame, what they're doing, the setting])
[Dialog line — scene 2]
([Visual for scene 2])
[Dialog line — scene 3]
([Visual for scene 3])
... (repeat for 3 to 7 scenes total)
— Notes —
Vibe: [one or two adjectives, e.g. "warm and casual"]
Scenes: [integer between 3 and 7; default 7 if unspecified]
Target audience: [specific demographic or use case]
Don't mention: [optional negative list — prices, competitors, etc.]Rules and constraints
- Scene count: 3 to 7 scenes. Default is 7 when not specified.
- Dialog line length: one short sentence per scene, roughly 10–15 words. Snappy, social-media rhythm.
- Total spoken script: under ~400 characters across all dialog lines combined (≈30–60 seconds of speech).
- Scene alternation: the first scene and the last scene are typically talking-head (creator on camera). Middle scenes are cutaways (product, hands, environment).
- Visual brackets: one sentence per bracket. Describe who is in frame, what they're doing, the setting and lighting. The bracket controls both the still image and the motion for that scene.
- Preserve verbatim: any concrete fact the user wrote (prices, dates, percentages, claims, product names, quoted text) is kept word-for-word in the spoken line. Do not paraphrase these.
- Reference images: up to 5 images can be attached separately (not described in the prompt). They control visual style and are matched to relevant scenes automatically.
Things to never put in a Sarra prompt
- Parenthetical delivery cues inside a dialog line, like “(softly)” or “(excited tone)”. They're not spoken; they confuse the engine.
- References to “this video” or “watching a video” inside the dialog. The viewer is already watching.
- Hashtags inside dialog lines.
- Generic prompt-engineering preambles (“You are a world-class director...”). Sarra is not a chatbot.
- Contradictions between the dialog line and the visual bracket below it.
Languages and output
- Languages supported: English and Hebrew, auto-detected from input.
- Output: one 60-second vertical video (9:16) suitable for TikTok, Reels, and YouTube Shorts. Includes synced voiceover, generated visuals, captions, and background music. Fully editable after generation.
Example of a complete, valid Sarra prompt
I started this bakery because nobody around here knew what real sourdough tasted like.
(Creator close-up, dusted with flour, standing in front of a warm wooden oven)
We bake everything from scratch, every morning, before the sun comes up.
(Hands shaping dough on a floured counter, slow morning light through a window)
You'll find us on Allenby 47, open from seven.
(Storefront shot, sign visible, customer walking in with a smile)
— Notes —
Vibe: warm, honest, neighborhood-feel
Scenes: 3
Target audience: locals in Tel Aviv who care about good bread
Don't mention: prices, delivery, anything corporateRead this next.
- Which AI influencer should you create? — the other half of the decision is who says it. A gallery pick, a custom brand face, or a clone of yourself.
- What can Sarra actually do? The full tour, end to end. — the hub post for the whole product. What the engine actually does, from input to finished video.
- Wait, you can edit it? A deep tour of Sarra's preview screen. — once you've got a draft, the five tabs in the preview are where the real video gets made.
One more thing.
I built Sarra because I watched my mom try to make a video for her small business and give up halfway through. The whole point is that you shouldn't have to be a marketer, a video editor, or a “prompt engineer” to get something good.
So don't overthink this guide. Open the app, type something, see what comes back. If it's not right, edit it. If you hate it, tell us — there's a feedback button and I actually read them.
Make the video. The worst draft you ship beats the perfect video you didn't.
— Idan
Author: Idan Biton, founder of Sarra. If this guide helped, the best thank-you is to actually use it.