Navigate the fragmented landscape of Hebrew and Yiddish ML datasets and models. Covers ivrit.ai (22K+ hours of Hebrew audio, whisper-large-v3 ASR variants, Yiddish models), Dicta (DictaLM 3.0 LLM family, DictaBERT variants, HeQ reading comprehension), the Israeli National NLP Program / NNLP-IL (HebrewSentiment, HebNLI), AlephBERT, and Knesset Plenums. Helps researchers and ML engineers pick the right dataset for a task by use case, license (commercial vs research), Hebrew register coverage, and model-dataset pairing. Use when choosing training data for a Hebrew NLP or ASR project, verifying license compatibility for a commercial product, finding a baseline model for a Hebrew downstream task, or exploring Yiddish ML resources. Do NOT use for Arabic NLP, general HuggingFace dataset discovery, or Hebrew OCR dataset selection (use hebrew-ocr-forms).
Trust score 87/100 (Trusted) · 10+ installs · 3 GitHub contributors · MIT license
The Israeli ML community punches above its weight, but the datasets and models are scattered. ivrit.ai publishes world-class Hebrew speech corpora on one HuggingFace org, Dicta publishes Hebrew LLMs and BERT variants on another, the Israeli National NLP Program maintains benchmarks under HebArabNlpProject. Licenses vary from fully commercial-friendly to research-only. A researcher trying to pick the right combination for fine-tuning a Hebrew sentiment classifier on customer support chat for a commercial product has to hunt across five orgs and read every dataset card.
npx skills-il add skills-il/developer-tools@v1.0.3-hebrew-ml-datasets-navigator --skill hebrew-ml-datasets-navigator -a claude-codeI want to train a sentiment classifier on Hebrew customer support chat for a commercial SaaS product. Which dataset should I use, which starting model, and what does the license say about attribution?
I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?
I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?
I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?
Added HEBREW-MMLU, CulturaX, FineWeb-2, ParaShoot, HeSum, academic resources. Stripped 27 em dashes.
Apr 25, 2026
Content fix: find_dataset.py catalog now matches the markdown reference. Added translation datasets (NeuLabs-TedTalks, OPUS-100, MADLAD-400) and additional Dicta text corpora.
Apr 13, 2026
Build and manage shipping integrations with Israeli carriers, including Israel Post, Cheetah, HFD, and Mahir Li, plus locker pickup services (BOX2GO, Shlager, Done). Use when user asks about "shipping Israel", "Cheetah delivery", "meshloach", "shipping label", "HFD", "locker pickup Israel", or setting up carrier integrations for an e-commerce store. Covers carrier selection, Israeli address formatting, label generation, cross-carrier tracking system setup, and customer delivery notifications. Do NOT use for looking up a specific package tracking status (direct users to mypost.israelpost.co.il or hfd.co.il). Do NOT use for international shipping outside Israel or customs/import.
Manage media assets through Cloudinary's REST API -- upload, transform, optimize, and deliver images and videos. Use when user asks about image upload, media optimization, image transformations, responsive images, video management, CDN delivery, or mentions Cloudinary specifically. Covers Upload API, Admin API, URL-based transformations, AI-powered effects (gen_remove, gen_replace, background removal), and delivery optimization. Israeli-founded (2012) with R&D in Petach Tikva; global HQ in Santa Clara, California. Do NOT use for non-Cloudinary media hosting or local image processing without cloud upload.
Best practices for programmatic video creation in React with Remotion, including full Hebrew RTL support. Covers animations, compositions, sequencing, transitions, TikTok-style captions with word highlighting, AI voiceover, 3D, charts, Hebrew Google Fonts, and bidirectional text animations. Use when working with Remotion code or creating Hebrew social media videos and marketing content. Do NOT use for non-Remotion video editing or general React development.
Want to build your own skill? Try the Skill Creator · Submit a Skill