Text formats are methods of storing text information in files, i.e. keeping data as strings of symbols like those in a common book or official paper. This is a lot of text formats available nowadays, some maintaining standards and others loosely organized to use with only one or two applications.
The main difference between text files lies between “plain” and “rich” text files.
- Plain text formats contain strings of characters only. Primary differences between those formats are character encoding (which can be ASCII, UTF-8, Windows-1250 or other) and a combination that marks the end of a string (a “carriage return”, a “line feed” or both).
- Rich text formats contain layout data along with text data. These formats divide into “logical” (such as HTML or XML) and “physical” layout types, as well as upon “character” and “binary” formats, where the “character” text format contains layout data as marked character tags.
In a vast sea of different data types, you can often select text files by extension. Let us describe some most popular extensions for you.
- TXT is a typical “plain” text file.
- HTM (HTML) is a text file with HTML tags. It is “plain” for viewing from an editor but turns “rich” when you want to see a final document. This is a standard for Internet pages.
- RTF is a common, convertible rich text format, with selection of captions, font types and sizes, page layout etc.
- DOC/DOCX are binary “rich” formats for office text files supported primarily by Microsoft Office Word and other Word-compatible text editors.
- ODT is a freeware alternative for DOC, provided by OpenOffice.org software and its clones such as LibreOffice. It is also a binary formatted rich text document.
- PDF is a “physical” layout format for electronic documents, with a standard for its tags and properties. You can either view PDF files as documents or print these files as physical pages.
- XML is a tag-based format that is like HTML. Its primary purpose is to create configuration files and text documentation for different software.