File Formats
July 14th, 2008
What is a file format
File Format is a way of putting your software application's data into a file on the disk when you do a save.
For example, if you create a document using a spreadsheet application, when you save the document, the file format should have ways to store how the data is organised in rows and columns, where you have put colours etc. so that you can later re-open the saved document in the application and it should look exactly the same as you created it. File formats are ways of encoding the information in the document, so that they can be stored in files on disks.
Why are they needed
As stated above, they are needed so that you can save your application to disk. The disk files are really just one long string of bytes - like a very long sentence. So, file formats specify how how the information in a document is converted to a string of bytes.
There is another reason for file formats: they allow documents created in one application to be opened using another application when the two applications understand a common file format. For example, you can create a drawing using the drawing application, save it, then open it in your word processor.
OK, what's the controversy
Well, my goal is to just make you aware of idea of file formats and not go into any controversy. However, you should be aware of the controversy. So, here it is.
There are two types of formats: open and closed. Open formats are those that have a written specification for the format which is available to everybody. Anybody can write an application to read or write a file in that format. Closed ones are those whose specifications are not public. There are some formats which fall some where in between - their specifications not public but available from the creators for a fee and some legal formalities.
But, why should I care
As mentioned above, file formats are vital for data exchange between applications. For example, your digital camera stores an image in TIFF format but you can later convert it to JPG to put it on your blog. The camera need not be programmed with producing every possible image format because it produces the image in an open format. Anyone can write an application to read the camera's captured image in TIFF and convert it to a different format. It also helps with future-proofing. Since your camera produces images in an open format, there is a good chance that it will be supported by new image editing softwares even when your camera vendor stops supporting your camera, since anybody can read your image files because its in an open format.
File formats are used extensively in EDA (Electronic Design Automation), which is where I became aware of them. EDA vendors produce software to create electronic designs. Electronic design work happens in steps: first comes the schematic capture, followed by a simulation and then synthesis and then placement and routing. The software to do each of these steps can come from a different vendor and hence the tools need to talk to each other. However, when a software is being written, the developer does not clearly know what exact software will come before and after it. So, the industry has agreed to use file formats, each software tool is build with ability to import a standard file format and export data in that standard file format. Problem Solved.
Postscript: One of the popular EDA file format is LEF & DEF created by Cadence Design System and later open sourced. I worked on the Cadence's original LEF & DEF reader and writer which was also the first project I worked on as a software professional.