Process and extract data from scanned Israeli government forms using OCR. Supports Tabu (land registry), Tax Authority forms, Bituach Leumi documents, and other official Israeli paperwork. Use when user asks to OCR Hebrew documents, extract data from Israeli forms, "lesarek tofes", parse Tabu extract, read scanned tax form, or process Israeli government documents. Includes Hebrew OCR configuration, field extraction patterns, and RTL text handling. Do NOT use for handwritten Hebrew recognition (requires specialized models) or non-Israeli form processing.
Trust score 93/100 (Verified) · 96+ installs · 2 GitHub contributors · MIT license
Israeli government forms still arrive in large volumes as scans and unstructured PDFs, with handwriting, stamps, and dense Hebrew text. Standard OCR tools struggle with Hebrew due to script direction, similar-looking characters, and low-quality scans. The result is hours of manual data entry and frequent errors.
npx skills-il add skills-il/localization@v1.1.0-hebrew-ocr-forms --skill hebrew-ocr-forms -a claude-codeHow do I scan an income tax form 106 and extract its data: salary, withheld tax, and deductions, into a JSON structure?
How do I extract data from a scanned Tabu (land registry) document? I need to retrieve property owners, block and parcel numbers, liens, and registrations.
How do I validate that a scanned government document (such as an ID card or business license) is authentic and that the extracted data is accurate?
Added OCR engine comparison (Tesseract vs Google Cloud Vision, AWS Textract, Azure Read, Claude Vision) and Reference Links. Added enriched content.
Apr 14, 2026
Generate professional Hebrew documents including PDF, DOCX, and PPTX with full RTL support and proper Hebrew typography. Use when user asks to create Hebrew PDF, generate Israeli business documents, "lehafik heshbonit", "litstor hozeh", build Hebrew Word document, create Hebrew PowerPoint, or produce Israeli templates such as Heshbonit Mas (tax invoice), Hozeh (contract), Hatza'at Mechir (proposal), or Protokol (meeting minutes). Covers reportlab, WeasyPrint, python-docx, and pptxgenjs with bidi paragraph support. Do NOT use for OCR or reading existing documents (use hebrew-ocr-forms instead).
Guide developers in using Hebrew NLP models and tools including DictaLM, DictaBERT, AlephBERT, and ivrit.ai. Use when user asks about Hebrew text processing, Hebrew NLP, "ivrit", Hebrew tokenization, Hebrew NER, Hebrew sentiment analysis, Hebrew speech-to-text, or needs to process Hebrew language text programmatically. Covers model selection, preprocessing, and Hebrew-specific NLP challenges. Do NOT use for Arabic NLP (different tools) or general English NLP tasks.
Generate SRT subtitles from video or audio files with Hebrew and English transcription support. Use when you need to create captions, transcripts, or hardcoded subtitles for social media and WhatsApp. Supports automatic language detection, translation between Hebrew and English, and burning subtitles directly into video files.
Want to build your own skill? Try the Skill Creator · Submit a Skill