Howto change the encoding of text files in Linux systems

undefined

How many times did you want to change the encoding of a text files in Linux systems? or How many times did you try to watch a movie and it’s subtitles .srt  showed in unreadable shapes “characters” and you needed to change it to readable characters?

Sure many times you tried/needed to change the encoding of text files in Linux systems.

All this because your apps/programs using a wrong encoding format for your text files. The solution for this is very simple Just changing the text files encoding will end your problems.

 

In this mini post, I’ll show you how to change the encoding of your text files by using iconv Linux command.

  • Changing a File’s Encoding using iconv Linux command

To use iconv Linux command you need to know the encoding of the text file you need to change it.  Use the following syntax to convert the encoding of a file :

$ iconv -f [encoding] -t [encoding]  [filename] > [output_filename]
Option Description
-f, --from-code Convert characters from encoding
-t, --to-code Convert characters to encoding

Example 1: Convert a file’s encoding from iso-8859-1 to UTF-8 and save it to New_storks.srt

$ iconv -f iso-8859-1 -t utf-8 storks.srt > New_storks.srt

Here’s the New_storks.srt is UTF-8 encoded.

Example 2: Convert a file’s encoding from cp1256 to UTF-8 and save it to output.txt

$ iconv -f cp1256 -t utf-8  input.txt > output.txt

Here’s the output.txt is UTF-8 encoded.

Example 3: Convert a file’s encoding from ASCII to UTF-8 and save it to output.txt

$ iconv -f ascii -t utf-8 input.txt > output.txt

Here’s the output.txt is UTF-8 encoded.

Example 4: Convert a file’s encoding from UTF-8 to ASCII

Hints:

1. UTF-8 can contain characters that can't be encoded with ASCII, the iconv will generate the error message "illegal input sequence at position X" unless you tell it to strip all non-ASCII characters using the -c option.
2. With using iconv with the -c option, you could loose some characters from your text file.
$ iconv -c -f utf-8 -t ascii  input.txt > output.txt
Option Description
-c Omit invalid characters from output

Finally, to list all the coded character sets known run -l option with iconv as follow:

$ iconv -l
Option Description
-l, --list List known coded character sets

Here’s the output of the above command:

The following list contain all the coded character sets known. This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters. One coded character set can be
listed with several different names (aliases).

437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
 866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
 8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
 ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
 ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
 BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BRF, BS_4730, CA, CN-BIG5,
 CN-GB, CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278,
 CP280, CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424,
 CP437, CP500, CP737, CP770, CP771, CP772, CP773, CP774, CP775, CP803, CP813,
 CP819, CP850, CP851, CP852, CP855, CP856, CP857, CP860, CP861, CP862, CP863,
 CP864, CP865, CP866, CP866NAV, CP868, CP869, CP870, CP871, CP874, CP875,
 CP880, CP891, CP901, CP902, CP903, CP904, CP905, CP912, CP915, CP916, CP918,
 CP920, CP921, CP922, CP930, CP932, CP933, CP935, CP936, CP937, CP939, CP949,
 CP950, CP1004, CP1008, CP1025, CP1026, CP1046, CP1047, CP1070, CP1079,
........

Finally, I hope this article is useful for you.

If You Appreciate What We Do Here On Mimastech, You Should Consider:

  1. Stay Connected to: Facebook | Twitter | Google+
  2. Support us via PayPal Donation
  3. Subscribe to our email newsletters.
  4. Tell other sysadmins / friends about Us - Share and Like our posts and services

We are thankful for your never ending support.

Leave a Reply

Your email address will not be published. Required fields are marked *