Icenis PDF data extranction plug-in,Gemini, briefly.
Gemini is a useful text extracion plug-in soft for defact standard file format in the industry, Adobe's Acrobat.
I will not explain "how to install Gemini" because it is so straightforward.
PDF used for this test is created on Macintosh using Adobe PageMaker 6.5J. It contains both Russian and Japanese data in each column.
You can get the PDF sfpdfm.pdf.
The PDF file sfpdfm.pdf will look like this. (bottom portion is omitted.)
Select what type of output you would like to get.
One of the followings can be selected for text output.
1. With the Acrobat 4.0, choose File->Environment?->Gemini.
2. Then, output file format slection box appears.
3. Select output file format for text and for graphics.
There are various selections are available. File per page, Preserve line breaks, etc.,
4. In Acrobat 4.0, choose Plug-in->Gemini Export.
5. Then you will stipulate two instructions.
Selection of portion to be extracted.
Speciry where to put the extracted data file.
6. Extracted text data will look like this.
The upper portion of the text is garbled. It is a portion written in Russian.
7. Do not panic. Select font of your preference and you will read it.
The picture shows how to choose right font set. This is an example of "YooEdit" a freeware editor for Mac.
This time, I will choose "Pryamoi" font.
Now, you can read Russian.
Of course, Japanese text is extracted properly as shown below.