[To Japanese version][To How to make your Mac read Russian (in Japanese)]

Text extrancion from bilingual PDF


This page is intended to explain how to use

Icenis PDF data extranction plug-in,Gemini, briefly.
Gemini is a useful text extracion plug-in soft for defact standard file format in the industry, Adobe's Acrobat.


I will not explain "how to install Gemini" because it is so straightforward.


PDF used for this test is created on Macintosh using Adobe PageMaker 6.5J. It contains both Russian and Japanese data in each column.

You can get the PDF sfpdfm.pdf.


The PDF file sfpdfm.pdf will look like this. (bottom portion is omitted.)

Bilingual PDF file view

Setting the Gemini


Select what type of output you would like to get.

One of the followings can be selected for text output.


How to set

1. With the Acrobat 4.0, choose File->Environment?->Gemini.

Gemini Setting 1

2. Then, output file format slection box appears.

Gemini setting 2

3. Select output file format for text and for graphics.

There are various selections are available. File per page, Preserve line breaks, etc.,

Gemini Setting 2


Gemini Setting 2

4. In Acrobat 4.0, choose Plug-in->Gemini Export.

Choose Gemini in Acrobat 4.0

5. Then you will stipulate two instructions.

and extraction will be completed.

Selection of portion to be extracted.

Select a portion to be extracted

Speciry where to put the extracted data file.

Stipulate where to put extracted file

6. Extracted text data will look like this.

The upper portion of the text is garbled. It is a portion written in Russian.

Garbled fonts

7. Do not panic. Select font of your preference and you will read it.

The picture shows how to choose right font set. This is an example of "YooEdit" a freeware editor for Mac.

Selection of edit functions preference


This time, I will choose "Pryamoi" font.

Choose Pryamoi font


Now, you can read Russian.

Cyrillic text view



Of course, Japanese text is extracted properly as shown below.

Japanese text view

[To Japanese version][To How to make your Mac read Russian (in Japanese)]