[tess-two] Making a Simple OCR Android App using Tesseract

차근차근/이것저것

[tess-two] Making a Simple OCR Android App using Tesseract

예쁜꽃이피었으면 2015. 3. 12. 16:45

http://gaut.am/making-an-ocr-android-app-using-tesseract/#more-1219

This post tells you how you can easily make an Android application to extract the text from the image being captured by the camera of your Android phone! We’ll be using a fork of Tesseract Android Tools by Robert Theis called Tess Two. They are based on the Tesseract OCR Engine (mainly maintained by Google) and Leptonica image processing libraries.

Recognizing text using your Android phone. Not exactly the end result of this blog post, but what you could achieve.

Note: These instructions are for Android SDK r19 and Android NDK r7c, at least for the time being (written at this tree). On 64-bit Ubuntu, you may need to install the ia32-libs 32-bit compatibility library. You would also need proper PATH variables added (see Troubleshooting section below).

Download the source or clone this git repository. This project contains tools for compiling the Tesseract, Leptonica, and JPEG libraries for use on Android. It contains an Eclipse Android library project that provides a Java API for accessing natively-compiled Tesseract and Leptonica APIs. You don’t need eyes-two code, you can do without it.
Build this project using these commands (here, tess-two is the directory inside tess-two – the one at the same level as of tess-two-test):
```
cd <project-directory>/tess-two
ndk-build
android update project --path .
ant release
```
Now import the project as a library in Eclipse. File → Import → Existing Projects into workspace → tess-two directory. Right click the project, Android Tools → Fix Project Properties. Right click → Properties → Android → Check Is Library.
Configure your project to use the tess-two project as a library project: Right click your project name → Properties → Android → Library → Add, and choose tess-two. You’re now ready to OCR any image using the library.

First, we need to get the picture itself. For that, I found a simple code to capture the image here. After we have the bitmap, we just need to perform the OCR which is relatively easy. Be sure to correct the rotation and image type by doing something like:

// _path = path to the image to be OCRed
ExifInterface exif = new ExifInterface(_path);
int exifOrientation = exif.getAttributeInt(
        ExifInterface.TAG_ORIENTATION,
        ExifInterface.ORIENTATION_NORMAL);

int rotate = 0;

switch (exifOrientation) {
case ExifInterface.ORIENTATION_ROTATE_90:
    rotate = 90;
    break;
case ExifInterface.ORIENTATION_ROTATE_180:
    rotate = 180;
    break;
case ExifInterface.ORIENTATION_ROTATE_270:
    rotate = 270;
    break;
}

if (rotate != 0) {
    int w = bitmap.getWidth();
    int h = bitmap.getHeight();

    // Setting pre rotate
    Matrix mtx = new Matrix();
    mtx.preRotate(rotate);

    // Rotating Bitmap & convert to ARGB_8888, required by tess
    bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
}
bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);

Now we have the image in the bitmap, and we can simply use the TessBaseAPI to run the OCR like:

TessBaseAPI baseApi = new TessBaseAPI();
// DATA_PATH = Path to the storage
// lang = for which the language data exists, usually "eng"
baseApi.init(DATA_PATH, lang);
// Eg. baseApi.init("/mnt/sdcard/tesseract/tessdata/eng.traineddata", "eng");
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();

(You can download the language files from here and put them in a directory on your device – manually or by code)

Now that you’ve got the OCRed text in the variable recognizedText, you can do pretty much anything with it – translate, search, anything! ps. You can add various language support by having a preference and then downloading the required language data file from here. You might even put them in the assets folder and copy them to the SD card on start.

To make things easy, and for you to have a better understanding, I have uploaded a simple application on OCR that makes use of Tess Two on Github called Simple Android OCR (for beginners). If you want a full-fledged application, that has a selectable region while capturing the image, translating the text, preferences etc., then you can checkout Robert Theis’ Android OCR application (for intermediate+)!

Updated: 7 October 2012

References

Using Tesseract Tools for Android to Create a Basic OCR App by Robert Theis
Simple Android Photo Capture by MakeMachine
tess-two README

Troubleshooting

About updating PATH - You need to update your PATH variable for the commands to function, otherwise you would see a command not found error. For Android SDK, add the location of the SDK’s tools andplatform-tools directories to your PATH environment variable. For Android NDK, use the same process to add the android-ndk directory to the PATH variable.
Maven-ising – Check this post by James Elsey. He also mentions that he got it working on Windows without any problems.
On Windows: “xcopy is not recognized.” Solution: Move xcopy.exe file from Windows\System32 folder to android-sdk\tools folder (or add %SystemRoot%\system32 to the PATH variable).
On Windows: “The project either has no target set or the target is invalid. Please provide a –target to the ‘android.bat update’ command.” Solution: Run the command
```
android update project --path D:\Softwares\Studies\Android\OCR\Code_Project\tess-two-master\tess-two --target android-19
```
You may also try Ctrl+F-ing your problem on this page, someone might have already encountered it and posted a solution in the comments.

Translations

Japanese by datsuns
French by Mathieu

Projects Made By Users

People have made a lot of projects using this tutorial, some of them are:

DatumDroid by Aviral, Devashish and me
MachineRetina by Salman Gadit

If you have made one too, do tell us in the comments below!

http://stackoverflow.com/questions/19533273/best-ocr-optical-character-recognition-example-in-android

Like you I also faced many problems implementing OCR in Android, but after much Googling I found the solution, and it surely is the best example of OCR.

Let me explain using step-by-step guidance.

First, download the source code from https://github.com/rmtheis/tess-two.

Import all three projects. After importing you will get an error. To solve the error you have to create a res folder in the tess-two project

enter image description here

First, just create res folder in tess-two by tess-two->RightClick->new Folder->Name it "res"

After doing this in all three project the error should be gone.

Now download the source code from https://github.com/rmtheis/android-ocr, here you will get best example.

Now you just need to import it into your workspace, but first you have to download android-ndk from this site:

http://developer.android.com/tools/sdk/ndk/index.html i have windows 7 - 32 bit PC so I have download http://dl.google.com/android/ndk/android-ndk-r9-windows-x86.zip this file

Now extract it suppose I have extract it into E:\Software\android-ndk-r9 so I will set this path on Environment Variable

Right Click on MyComputer->Property->Advance-System-Settings->Advance->Environment Variable-> find PATH on second below Box and set like path like below picture

enter image description here

done it

Now open cmd and go to on D:\Android Workspace\tess-two like below

enter image description here

If you have successfully set up environment variable of NDK then just type ndk-build just like above picture than enter you will not get any kind of error and all file will be compiled successfully:

Now download other source code also from https://github.com/rmtheis/tess-two , and extract and import it and give it name OCRTest, like in my PC which is in D:\Android Workspace\OCRTest

enter image description here

Import test-two in this and run OCRTest and run it; you will get the best example of OCR.

저작자표시 비영리

'차근차근 > 이것저것' 카테고리의 다른 글

이클립스 \| C/C++ \| opeCV \| Tesseract OCR => 2 (0)	2015.03.17
The requested resource is not available. (0)	2015.03.17
이클립스 메이븐 svn (0)	2015.03.12
opencv \| tesseract-ocr \| vs2010 (0)	2015.02.26
count(*) 과 count(1) (0)	2015.01.22

현재글[tess-two] Making a Simple OCR Android App using Tesseract

개인 공부를 위한 블로그입니다.

패스트캠퍼스, 한 번에 끝내는 코딩테스트 369 Java편 초격차 패키지 Online., docker, ubuntu, 백준, MYSQL, 패스트캠퍼스후기, 10953, 직장인자기계발, Gunicorn, 10952, 인증문자, 자바, nginx, 10951, 직장인인강, 패캠챌린지, Django, AWS, vuejs,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 인생에도 예쁜 꽃이 피었으면 좋겠다