Welcome Visitor, Please Login or Register Now Thursday, September 09, 2010 01:18 AM 
Forums Index > Islamware Announcements > FAQ > Convert Text Files From Arabic ASCII (ANSI) to Unicode or UTF-8 (or UTF8)
New Topic   New Reply
Search for:
Author Message
ahmed ahmed's personal page
Administrator Team Member
Posts: 80 Display member's posts
Joined: 07-17-06 12:53 PM
Member Offline
View Member's Profile Visit member's website http:// Send private message
 
Back to top
Convert Text Files From Arabic ASCII (ANSI) to Unicode or UTF-8 (or UTF8) Edit Delete Reply with quote Quote
First Post Posted on: 09-05-08 11:25 PM next post first post

How to Convert Text Encoding

Most files on our download pages saved in ANSI or ASCII on Windows system with Charset WINDOWS-1256.

I recieved many emails about converting from these ASCII arabic text formats to international orportable UTF-8 or unicode.

There are some topics that I answered here on this forums about some notes on how to do the conversion in general with some code in VB or Perl and some refernces to links on the web.

The easiest tool I found which is also free for converting from any format to another is a tool called:

iconv

NIX systems (Linux and UNIX) comes with a command line program called iconv that can be used to convert from one text encoding to another easily. The -c tells the program to ignore unknown characters. The -f specifies the encoding type to change from and the -t is the encoding to change to.

iconv -c -f ASCII -t UTF-8 [FILE_NAME] > [NEW_FILE]

You can download iconv for Windows from the GnuWin32 project here (download the libiconv which includes the iconv.exe application). You will need to download the libintl3.dll library from GnuWin32 also and drop the dll in the bin folder.

To repair a broken UTF-8 file, you can also run it like this by specifying the same encoding type in the from and to parameters:

iconv -c -f UTF-8 -t UTF-8 [FILE_NAME] > [NEW_FILE]

So download it from here:

http://gnuwin32.sourceforge.net/packages/libiconv.htm

Homepage

http://www.gnu.org/software/libiconv

Download

If you download the Setup program of the package, any requirements for running applications, such as dynamic link libraries (DLL's) from the dependencies as listed below under Requirements, are already included. If you download the package as Zip files, then you must download and install the dependencies zip file yourself. Developer files (header files and libraries) from other packages are however not included; so if you wish to develop your own applications, you must separately install the required packages.

Description Download Size Last change Md5sum
• Complete package, except sources   Setup   969371   14 October 2004   e0217c09792beec74578516d9fff55ce
• Sources   Setup   1756913   14 October 2004   d1118316cde62735b98234885957c947
 
• Binaries   Zip   828380   14 October 2004   2ded584cdcfc87e1e3db257dd5b44651
• Dependencies   Zip   50981   14 October 2004   51488157b1202e06e840b6d745a7be32
• Developer files   Zip   731496   14 October 2004   6d3c9b30fd449541954207d65da9193c
• Documentation   Zip   31016   14 October 2004   d43a77a9834092659eb462e37aaa9b7a
• Sources   Zip   4638975   14 October 2004   e9014f9057c5cd4cafe06f996aad9f54

You can also download the files from the GnuWin32 files page.

For Windows I'd recommend you download the complete binary setup program:

Complete package, except sources   Setup

Then after you install it, from the DOS prompt you will be able to run the iconv.exe program:

C:>iconv -l

The above command will let it print all supported encodings

Here are some example that I use to convert the ASCII Arabic files saved in charset WINDOWS-1256 to a UTF-8 format:

iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Maliks-Muwatta.txt>Hadith-Maliks-Muwatta.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Musnad-Ahmad-ibn-Hanbal.txt > Hadith-Musnad-Ahmad-ibn-Hanbal.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sahih-Bukhari.txt>Hadith-Sahih-Bukhari.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sahih-Muslim.txt>Hadith-Sahih-Muslim.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-Abu-Dawud.txt>Hadith-Sunan-Abu-Dawud.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Darami.txt>Hadith-Sunan-al-Darami.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Nasai.txt>Hadith-Sunan-al-Nasai.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Tirmidhi.txt>Hadith-Sunan-al-Tirmidhi.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-Ibn-Maja.txt>Hadith-Sunan-Ibn-Maja.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 QuranArabicNoTashkil.txt>QuranArabicNoTashkil.Unicode.txt
iconv  -f WINDOWS-1256 -t UTF-8 QuranArabicTashkil.txt>QuranArabicTashkil.Unicode.txt

Of course this is a lib that you can interface with your own programs if you want.

 


-----------------------
Ahmed ELsheshtawy
IslamWare CEO
www.islamware.com
-----------------------
-----------------------

ahmed ahmed's personal page
Administrator Team Member
Posts: 80 Display member's posts
Joined: 07-17-06 12:53 PM
Member Offline
View Member's Profile Visit member's website http:// Send private message
 
Back to top
iconv php function Edit Delete Reply with quote Quote
Reply #: 1 Posted on: 09-09-08 03:12 PM last post previous post
There a builtin PHP function called iconv that also do file/text conversion from one endcoding to another:

http://www.php.net/manual/en/function.iconv.php

string iconv ( string $in_charset , string $out_charset , string $str )

echo iconv("ISO-8859-1""UTF-8""This is a test.");
?>

Code: PHP (HTML)    Select All    Expand All
<?php
            $fileName = "ArabicQuran.txt";
            // Read in the contents
            $data = file_get_contents($fileName);

            // Just display on the screen the file being modified
            echo "Converting " . $fileName . "...\n";

            // Convert the contents
            $data = iconv("windows-1256","UTF-8", $data);

            // Write back out to the same file
            file_put_contents($fileName,$data);
?>
 


-----------------------
Ahmed ELsheshtawy
IslamWare CEO
www.islamware.com
-----------------------
-----------------------

Page 1 of 1
Go to page:

New Topic   New Reply Mark Unread
Jump to:  
Delete   Move     Lock   +Favorits   +Notify   Print