encoding - How to determine the fileencoding of a file in linux correctly -
this question has answer here:
when have file created in vim/linux :set fileencoding=utf-8 , have diacritics (as e.g. german umlauts) in file, calling file myfile.txt results myfile.txt: utf-8 unicode text. if have no diacritics in file, determination of file encoding results myfile.txt: ascii text.
why that? , how can determine safely, whole bunch of files encoded correctly using utf-8 file encoding?
edit:
ascii 7-bit , subset of utf-8. want know if source files encoded in utf-8 can hold diacritics sometime in future. imo not obvious , find way determine safely.
there no generic , reliable way find encoding text file use. furthermore quite few encoding supersets of ascii-7 (utf-8, iso 8859-*, ...)
in case of utf-8, 1 trick add (otherwise unnecessary) bom (byte order mark) @ beginning of file. in case file displays :
some.txt: utf-8 unicode (with bom) text i think vim option : :set bomb
unfortunately, while editors understand bom, bash not. don't add shell scripts !
Comments
Post a Comment