How to Convert Text Encoding
Most files on our download pages saved in ANSI or ASCII on Windows system with Charset WINDOWS-1256.
I recieved many emails about converting from these ASCII arabic text formats to international orportable UTF-8 or unicode.
There are some topics that I answered here on this forums about some notes on how to do the conversion in general with some code in VB or Perl and some refernces to links on the web.
The easiest tool I found which is also free for converting from any format to another is a tool called:
iconv
NIX systems (Linux and UNIX) comes with a command line program called iconv that can be used to convert from one text encoding to another easily. The -c tells the program to ignore unknown characters. The -f specifies the encoding type to change from and the -t is the encoding to change to.
iconv -c -f ASCII -t UTF-8 [FILE_NAME] > [NEW_FILE]
You can download iconv for Windows from the GnuWin32 project here (download the libiconv which includes the iconv.exe application). You will need to download the libintl3.dll library from GnuWin32 also and drop the dll in the bin folder.
To repair a broken UTF-8 file, you can also run it like this by specifying the same encoding type in the from and to parameters:
iconv -c -f UTF-8 -t UTF-8 [FILE_NAME] > [NEW_FILE]
So download it from here:
http://gnuwin32.sourceforge.net/packages/libiconv.htm
http://www.gnu.org/software/libiconv
If you download the Setup program of the package, any requirements for running applications, such as dynamic link libraries (DLL's) from the dependencies as listed below under Requirements, are already included. If you download the package as Zip files, then you must download and install the dependencies zip file yourself. Developer files (header files and libraries) from other packages are however not included; so if you wish to develop your own applications, you must separately install the required packages.
| Description |
|
Download |
|
Size |
|
Last change |
|
Md5sum |
| • Complete package, except sources |
|
Setup |
|
969371 |
|
14 October 2004 |
|
e0217c09792beec74578516d9fff55ce |
| • Sources |
|
Setup |
|
1756913 |
|
14 October 2004 |
|
d1118316cde62735b98234885957c947 |
| |
| • Binaries |
|
Zip |
|
828380 |
|
14 October 2004 |
|
2ded584cdcfc87e1e3db257dd5b44651 |
| • Dependencies |
|
Zip |
|
50981 |
|
14 October 2004 |
|
51488157b1202e06e840b6d745a7be32 |
| • Developer files |
|
Zip |
|
731496 |
|
14 October 2004 |
|
6d3c9b30fd449541954207d65da9193c |
| • Documentation |
|
Zip |
|
31016 |
|
14 October 2004 |
|
d43a77a9834092659eb462e37aaa9b7a |
| • Sources |
|
Zip |
|
4638975 |
|
14 October 2004 |
|
e9014f9057c5cd4cafe06f996aad9f54 |
You can also download the files from the GnuWin32 files page.
For Windows I'd recommend you download the complete binary setup program:
| Complete package, except sources |
|
Setup |
Then after you install it, from the DOS prompt you will be able to run the iconv.exe program:
C:>iconv -l
The above command will let it print all supported encodings
Here are some example that I use to convert the ASCII Arabic files saved in charset WINDOWS-1256 to a UTF-8 format:
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Maliks-Muwatta.txt>Hadith-Maliks-Muwatta.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Musnad-Ahmad-ibn-Hanbal.txt > Hadith-Musnad-Ahmad-ibn-Hanbal.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sahih-Bukhari.txt>Hadith-Sahih-Bukhari.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sahih-Muslim.txt>Hadith-Sahih-Muslim.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-Abu-Dawud.txt>Hadith-Sunan-Abu-Dawud.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Darami.txt>Hadith-Sunan-al-Darami.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Nasai.txt>Hadith-Sunan-al-Nasai.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-al-Tirmidhi.txt>Hadith-Sunan-al-Tirmidhi.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 Hadith-Sunan-Ibn-Maja.txt>Hadith-Sunan-Ibn-Maja.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 QuranArabicNoTashkil.txt>QuranArabicNoTashkil.Unicode.txt
iconv -f WINDOWS-1256 -t UTF-8 QuranArabicTashkil.txt>QuranArabicTashkil.Unicode.txt
Of course this is a lib that you can interface with your own programs if you want.
|