encoding - How to determine the fileencoding of a file in linux correctly -

- February 15, 2013

this question has answer here:

how can detect encoding/codepage of text file 18 answers

when have file created in vim/linux :set fileencoding=utf-8 , have diacritics (as e.g. german umlauts) in file, calling file myfile.txt results myfile.txt: utf-8 unicode text. if have no diacritics in file, determination of file encoding results myfile.txt: ascii text.

why that? , how can determine safely, whole bunch of files encoded correctly using utf-8 file encoding?

edit:

ascii 7-bit , subset of utf-8. want know if source files encoded in utf-8 can hold diacritics sometime in future. imo not obvious , find way determine safely.

there no generic , reliable way find encoding text file use. furthermore quite few encoding supersets of ascii-7 (utf-8, iso 8859-*, ...)

in case of utf-8, 1 trick add (otherwise unnecessary) bom (byte order mark) @ beginning of file. in case file displays :

some.txt: utf-8 unicode (with bom) text

i think vim option : :set bomb

unfortunately, while editors understand bom, bash not. don't add shell scripts !

Search This Blog

JAV

encoding - How to determine the fileencoding of a file in linux correctly -

Comments

Post a Comment

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -