A common technique for implementing file types is to include the type as part of the file name (see Fig. 4).
The name is split into two parts -a name and an extension, usually separated by a period character.
In this way, the user and the OS can tell from the name alone what the type of a file is.
Figure 4:
Common file types.
The system uses the extension to indicate the type of the file and the type of operations that can be done on that file. Only a file with a , or extension can be executed, for instance.
Application programs also use extensions to indicate file types in which they are interested. For example, assemblers expect source files to have an extension, and the Microsoft Word word processor expects its files to end with a extension.
Because these extensions are not supported by the OS, they can be considered as ``hints'' to the applications that operate on them.
Many file systems support names as long as 255 characters. Some file systems distinguish between upper and lower case letters (Case (in)sensitivity).
Windows 95 and Windows 98 both use the MS-DOS file system, and thus inherit many of its properties, such as how file names are constructed. In addition, Windows NT and Windows 2000 support the MS-DOS file system and thus also inherit its properties. However, the latter two systems also have a native file system (NTFS) that has different properties (such as file names in Unicode).
Consider the Mac OS X OS. In this system, each file has a type, such as TEXT (for text file) or APPL (for application).
Each file also has a creator attribute containing the name of the program that created it.
This attribute is set by the OS during the call, so its use is enforced and supported by the system.
The UNIX system uses a crude magic number stored at the beginning of some files to indicate roughly the type of the file -executable program, batchfile (or shell script), PostScript file, and so on.
Not all files have magic numbers, so system features cannot be based solely on this information.
UNIX does not record the name of the creating program, either. UNIX does allow file-name-extension hints, but these extensions are neither enforced nor depended on by the OS (interpreted by tools); they are meant mostly to aid users in determining the type of contents of the file.
Extensions can be used or ignored by a given application, but that is up to the application's programmer.
In contrast, Windows is aware of the extensions and assigns meaning to them. Users (or processes) can register extensions with the operating system (Interpreted by OS).
UNIX also has character and block special files (Device Files).
Character special files are related to input/output and used to model serial I/O devices such as terminals, printers, and networks.
Block special files are used to model disks.
Other files are binary files, which just means that they are not ASCII files. Usually, they have some internal structure known to programs that use them (see Fig. 5).
Figure 5:
(a) An executable file. (b) An archive.
Every OS must recognize at least one file type; its own executable file. A simple executable binary file taken from a version of UNIX is seen in Fig. 5a .
Although technically the file is just a sequence of bytes, the operating system will only execute a file if it has the proper format.
It has five sections: header, text, data, relocation bits, and symbol table.
The header starts with a so-called magic number, identifying the file as an executable file (to prevent the accidental execution of a file not in this format).
Then come the sizes of the various pieces of the file, the address at which execution starts, and some flag bits.
Beyond this header, executable files are typically divided into subsections (the text and data of the program itself).
Try the following commands:
readelf -S exe_file
objdump -h exe_file
Second example of a binary file is an archive, also from UNIX (see Fig. 5b).
It consists of a collection of library procedures (modules) compiled but not linked.
Each one is prefaced by a header telling its name, creation date, owner, protection code, and size.