An open index of curated prompts for image & video generation models.
### 3. Configure API Keys Open http://localhost:5000 in your browser and enter your API keys in the **API Configuration** panel: - **FAL API Key** - Get yours at [fal.ai/dashboard](https://fal.ai/dashboard) - **Voice ID** - Your MiniMax voice ID (default provided) Your keys are stored in your browser's localStorage and sent to the server when needed. ### 4. Generate Videos 1. Configure your character (animal type, style, or custom prompt) 2. Generate the main character image 3. Edit script segments as needed (text, visual prompts, timing) 4. Click "Generate All" to run the full pipeline ## Configuration All settings can be configured directly in the web interface: ### Character Settings - **Animal Type** - The character animal (capybara, penguin, fox, etc.) - **Style** - Visual style (3D animation, Ghibli, cartoon, etc.) - **Custom Prompt** - Override the auto-generated character prompt ### Script Segments Each segment includes: - **Script Text** - The spoken narration - **Visual Prompt** - Description for image/video generation - **Timing** - Start and end times in seconds - **Screenshot URLs** - Optional URLs for automatic screenshot composites ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/config/keys` | GET/POST | Get or set API keys | | `/api/segments` | GET | Get all script segments | | `/api/segments` | POST | Add a new segment | | `/api/segments/<id>` | PUT | Update a segment | | `/api/segments/<id>` | DELETE | Delete a segment | | `/api/segments/reset` | POST | Reset to default template | | `/api/generate/character` | POST | Generate main character | | `/api/generate/full` | POST | Run full pipeline | | `/api/status` | GET | Get current generation status | ## Output Structure
Key flags: - Repeatable `-i/--input` reference images - `-o/--output` output file path - `-m/--model` model alias or raw model ID - `--image-size 512|1K|2K|4K` - `--aspect-ratio` with model-aware validation - `--ground-web` - `--ground-image` for Gemini 3.1 - `--thinking-level minimal|high` for Gemini 3.1 - `--include-thoughts` - `--thoughts-dir` - `--history-in` / `--history-out` for scriptable multi-turn workflows ### Scripted Multi-Turn Editing
Key flags: - `-o/--output` output file path - `--resize` resize to `WxH` or `%` - `--fit` resize behavior - `--crop` crop rectangle `left,top,width,height` - `--rotate` angle - `--flip` vertical flip - `--flop` horizontal mirror Examples:
Key flags: - `-o/--output` output PNG path - `--color` `white|black|#RRGGBB` - `--tolerance` color matching tolerance - `--overwrite` overwrite original file Example:
\ -o photosynthesis-es.png \ --history-in photosynthesis-history.json \ --history-out photosynthesis-history.json ``` History files preserve the conversation contents needed for follow-up turns, including thought signatures returned by the API. ### `icon` Usage: ```bash nanobanana icon [prompt] -o OUTPUT ``` Key flags: - `-o/--output` output directory or naming pattern - `--sizes` comma-separated icon sizes - `--style` `modern|flat|minimal|detailed` - `--background` `transparent|white|black|#RRGGBB` Examples: ```bash nanobanana icon
-o ./icons/ --sizes 16,32,64,128 ``` ### `pattern` Usage: ```bash nanobanana pattern [prompt] -o OUTPUT ``` Key flags: - `-o/--output` output file path - `--size` tile size as `WxH` - `--style` `geometric|organic|abstract|floral|tech` - `--type` `seamless|texture|wallpaper` Examples: ```bash nanobanana pattern
First-person perspective, user interacts with holographic workspace in midair, camera captures floating interfaces responding to gestures, soft beeps and electronic chimes fill the air, cool ambient lighting, immersive futuristic cinematic realism style
@Image1 as first frame, @Video1 for camera movement, @Video2 for character motion, @Audio1 for background music
Upload Flux image and selfie, add yourself to the background.
Change the facial expression to happy and make the weather sunny.
Generate better image of mom's 60th birthday party cake for bakery.
Transform the scene as per description.
Add hair to the bald man in the image.
pnpm db:push # Push schema changes (development) pnpm db:generate # Generate migrations pnpm db:migrate # Run migrations pnpm db:studio # Open Drizzle Studio GUI pnpm db:reset # Drop and recreate all tables
A cat walking in the rain, cinematic lighting.
Product demo of a smartwatch with rotating camera view.
Nature scene with flying birds and ocean waves sound.
beautiful stunning ultra-detailed 4K 8K masterpiece trending on artstation cinematic lighting professional photography premium quality
Create a pitch-deck slide titled "Q3 Revenue Performance" that looks like a real Series A board-meeting slide. Layout (16:9): title top-left, 36pt Inter dark gray. Two-column body: left 60% chart, right 40% three KPI cards. Chart: vertical bars, Q1–Q3 2026, y-axis $0–$8M, three bars at $3.2M, $4.8M, $6.5M, muted blue palette. KPI cards: "+34% YoY", "189 new accounts", "$42K ACV". White background, Inter typography, tight 8px grid, no clip art, no gradients, no stock photography.
注意里面**没有**:没有夸赞词、没有"ultra detailed"、没有"8K"。每一个词都在做具体的指令性工作。 Skill 教 Claude(或任何 agent)规范结构:
Skill 就是文本 + 一个脚本。无构建、无重启、无注册。 ### 给你的团队定制 Fork 一份。设计上可以替换的部分: - **`references/use-cases.md`** — 加你们自家风格、品牌色卡、常用素材模板(自家的 pitch-deck 模板、自家的品牌 hex、自家的信息图格式) - **`scripts/gpt_image.py`** — 加 `--watermark` flag、自定义输出命名、S3 上传、Slack 推送、随便加 - **`DEFAULT_BASE`** — 指向你们的私有网关而不是默认值 - **agent 拿到的 system prompt** — 如果你们的 agent 不会自动发现就把 `SKILL.md` 接进 bootstrap 通用的改动欢迎 PR。团队特定的留在你的 fork 里。 ### License MIT。看 `LICENSE`。随便用。 ### 致谢 为 [Claude Code](https://claude.com/claude-code) 而做,但这个模式(markdown skill + 零依赖 CLI + 视觉自验证)适用于任何能读文件 + 调脚本的 agent。 灵感:OpenAI 官方 GPT Image 2 Cookbook,Anthropic superpowers/writing-skills 关于 agent 可发现 skill 的模式。 --- ## English > Drop this into Claude Code or Codex. Your agent will go from "here's a curl command, good luck" to producing pitch-deck slides, Chinese posters, pixel-art tilesets, photoreal product shots, and surgical photo edits — first try, every try. GPT Image 2 dropped in April 2026. It's a generational leap: long instructional prompts no longer lose detail, text rendering is finally correct (Chinese / Japanese / Korean too), custom resolutions go up to 3840px, and the edit endpoint does precise local edits via a `change ONLY X / preserve Y exactly` pattern. But here's the catch: agents trained before April 2026 don't know any of this. They write prompts the old way — *"4K, ultra detailed, masterpiece, trending on artstation"* — which on GPT Image 2 is at best ignored and often actively harmful. They paste curl commands instead of saving files. They generate, then ask the user "does this look right?" instead of looking themselves. This skill fixes all of that. ### What you get
4K, ultra detailed, masterpiece, trending on artstation
Every panorama is one step in a walking route. - The camera stays at human eye height.
In a cozy house, there is a girl in the center of the camera. The girl has fair skin and is very beautiful, but she dresses plainly. She is wearing plain clothes, with a lot of lake blue paint on her left hand and a little gray paint on her right hand. Then she first raised her right hand with a little gray paint, and the word "gray" appeared in the picture. Then she put down her right hand and raised her left hand with more lake blue paint, and the word "lake blue" appeared in the picture. Then she put down her left hand. Rubbing the paint between the palms of the two hands, the paint in the hand turned silver-blue, and the words "silver-blue" appeared in the picture. Then, she closed her hands, palms facing each other, blocked the camera, and then removed it. The girl changed the scene, wearing exquisite and beautiful Hanfu. With exquisite and beautiful makeup, the girl becomes more beautiful. At this time, the technique of photo photography is used to present a texture similar to film photography. The main color tone of the picture is silver-blue, exuding an atmosphere of prosperity, elegance, and a hint of charm. The main image has the Dardin effect, wearing bustling silver jewelry on the head, with a beautiful sense of transparency, showing the beauty of disaster for the country and the people, and also carrying a decadent and lazy temperament, with a few strands of broken hair floating in the air. The main body faces the picture, holding a silver and white folding fan. The girl first uses the folding fan to cover the lower half of her face, revealing only a pair of captivating eyes. Then she moved the folding fan down, revealing her entire face with a stunning face. The picture combines textured lighting with natural light, creating a strong contrast of light and shadow with strong gray sidelight and Rembrandt light