encoding - How to determine the fileencoding of a file in linux correctly -
this question has answer here:
when have file created in vim/linux :set fileencoding=utf-8
, have diacritics (as e.g. german umlauts) in file, calling file myfile.txt
results myfile.txt: utf-8 unicode text
. if have no diacritics in file, determination of file encoding results myfile.txt: ascii text
.
why that? , how can determine safely, whole bunch of files encoded correctly using utf-8 file encoding?
edit:
ascii 7-bit , subset of utf-8. want know if source files encoded in utf-8 can hold diacritics sometime in future. imo not obvious , find way determine safely.
there no generic , reliable way find encoding text file use. furthermore quite few encoding supersets of ascii-7 (utf-8, iso 8859-*, ...)
in case of utf-8, 1 trick add (otherwise unnecessary) bom (byte order mark) @ beginning of file. in case file
displays :
some.txt: utf-8 unicode (with bom) text
i think vim
option : :set bomb
unfortunately, while editors understand bom, bash not. don't add shell scripts !
Comments
Post a Comment