pythontr.com
Tesseract nedir? OCR işlemi yapan kütüphanedir. Resim lerin üzerindeki textleri okumaya yarar.
# Kurulum aptitude install autogen autoconf automake libtool - leptonica [url]http://www.leptonica.com/download.html[/url] wget [url]http://www.leptonica.com/source/leptonica-1.69.tar.gz[/url] tar zxf leptonica-1.69.tar.gz cd leptonica-1.69 ./configure make su make install ldconfig exit - tesseract [url]http://code.google.com/p/tesseract-ocr/downloads/list[/url] wget [url]http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.01.tar.gz[/url] tar zxf tesseract-3.01.tar.gz cd tesseract-3.01 ./autogen ./configure make su make install ldconfig exit - Dil dosyası su cd /usr/local/share/tessdata/ wget [url]http://code.google.com/p/tesseract-ocr/downloads/detail?name=tur.traineddata.gz[/url] wget [url]http://code.google.com/p/tesseract-ocr/downloads/detail?name=eng.traineddata.gz[/url] gunzip tur.traineddata.gz gunzip eng.traineddata.gz # Kullanım tesseract resim.jpg output.txt -l tur # Kaynaklar training: [url]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3[/url] # ----------------------------------------------------------------------------- # HOCR NOTLARI # ----------------------------------------------------------------------------- # Kurulum aptitude install exactimage # Kullanım tesseract resim.jpg output.txt -l tur hocr hocr2pdf -i resim.tif -o output.pdf < output.html
Yorumlar