What Exactly Are Computer Files?

Data stored in memory disappears when the computer is turned off. To save data for a long time, it is necessary to use hard disk, CD, U disk and other devices. In order to facilitate the management and retrieval of data, the concept of "file" is introduced.

An article, a video, an executable program can all be saved as a file and given a file name. The operating system manages the data on the disk in units of files.

If thousands of files are not classified and put together, it is obviously very inconvenient for users to use, so the mechanism of tree-shaped directory (directory also called folder) is introduced, and files can be placed in different folders. Folders can also nest folders, which is convenient for users to manage and use files, just as Windows Explorer presents.

Generally speaking, files can be divided into various categories such as text files, video files, audio files, image files, executable files, etc., which are classified from the function of the file. From the perspective of data storage, all files are essentially the same, and they are all composed of bytes. In the final analysis, they are all 0 and 1 bit strings. Different files show different forms (some are text, some are video, etc.), which is mainly because the creator of the file and the interpreter (the software that uses the file) have agreed on the file format.

The so-called "format" is a convention about what the content of each part of the file represents. For example, a common plain text file (also called a text file, usually with a ".txt" extension) refers to a file that can be opened in the Windows "Notepad" program and can be seen as a meaningful piece of text . The format of a text file can be described in one sentence: each byte in the file is the ASCII code of a visible character.

In addition to plain text files, images, videos, executable files, etc. are generally referred to as "binary files". If you open the binary file with the "Notepad" program, you will see a piece of garbled characters.

The so-called "text files" and "binary files" are only conventional classifications from the perspective of computer users, not computer science classifications. Because from a computer science point of view, all files are made up of binary bits and are binary files. Text files and other binary files are just in different formats.

In fact, as long as the format is specified, and you are not afraid of wasting space, you can use text files to represent images, sounds, videos, and even executable programs. Simply put, if the convention is to use the characters '1', '2', ..., '7' to represent seven notes, then a text file consisting of these characters can be played into a piece by music software that follows that convention.

Let's look at another example of using a text file to represent an image: an image is actually a matrix of points, each of which can have a different color, called a pixel. Some images are 256-color, and some are 32-bit true color (that is, the color of a pixel is represented by a 32-bit integer).

Taking a 256-color image as an example, the 256 numbers from 0 to 255 can be used to represent 256 colors, then each pixel can be represented by a number. Then agree that the two numbers at the beginning of the file represent the width and height of the image (in pixels), then the following text file can represent a 256-color image with a width of 6 pixels and a height of 4 pixels:

6 4
24 0 38 129 4 154
12 73 227 40 0 0
12 173 127 20 0 0
21 73 87 230 1 0

The format of this "text image" file can be described as: the first line of two numbers represents the number of pixels in the horizontal direction and the number of pixels in the vertical direction, each subsequent line represents a line of pixels in the image, and each number in a line corresponds to A pixel representing its color. Image processing software that understands this format can render the above text file as an image. Video is made up of 24 images per second, so text files can also represent video.

The above method of representing images as text files is very inefficient and wastes too much space. A lot of whitespace in a file is a waste. In addition, 2 or even 3 characters are often used to represent a pixel, which also causes a lot of waste, because one byte is enough to represent 256 numbers from 0 to 255. Therefore, a more space-saving format can be agreed to represent a 256-color image. The description of this file format is as follows: The 0th and 1st bytes in the file are integer n, representing the width of the image (2 bytes of The value range of n is 0~65 535, indicating that the image can only be at most 65 535 pixels wide), and the second and third bytes represent the height of the image. Next, every n bytes represent a row of pixels in the image, where each byte corresponds to the color of a pixel.

Storing 256-color images in this format saves a lot of space compared to storing images in the text format above. Open it in the "Notepad" program, and you will see garbled characters. This image file is also called a "binary file".

The formats of real image files, audio files, and video files are complex, and some are compressed, but as long as the file production software and interpretation software (such as image viewing software, audio and video playback software) follow the same format convention, The user can see the content of the file in the file interpretation software.