Lecture 8

Chapter 9 - Advanced File Input and Output

Synopsis
In Lecture 3 and Lab 2, we covered basic input and output (I/O) functions. In this lecture, we will discuss lower-level file I/O from Chapter 9: Advanced File Input and Output. We will learn how to read in a variety of file formats, and extend functions such as fprintf for writing files.


Daily Quiz

Quiz 14


Advanced File I/O

A Word About File Extensions

Before we get started, let's get a quick refresher on file extensions. Formally, a file extension is an identifier specified at the end of a filename that indicate some characteristic of the file content. For example, .jpeg indicates that the file is an image compressed with the lossy jpeg algorithm. We've also seen extensions for comma-separated value files (.csv) and plain text files (.txt). Although file extensions give you an idea about the contents of the file, they may not always be present or accurate. For example, many tab-separated value files are incorrectly given the .csv extension. It's always a good idea to take a look at your file in a text editor before trying to read the data into MATLAB.

High-Level I/O

We were introduced to two different high-level I/O functions in Lecture 3-- load and save. These functions work well as long as your input and output data can be stored in a matrix. More specifically, the data must be of the same type and have regular dimensions. These functions are quite useful, but fail when we encounter more complex files with multiple data types. For these files, we need a lower-level function. To use these lower-level functions, we need to follow a few steps:

  1. Open the file
  2. Read/Write/Append file
  3. Close the file

Open/Close Files

Before we can read, write, or append, we need to open the file of interest. To do this, we will use the fopen command. Along with the filename, we provide fopen with the type of permissions needed for that file. The permissions can be 'r' for reading, 'w' for writing, or 'a' for appending.

% open file
fid = fopen('filename', 'permission');

The fopen function returns a file identifier that we typically assign to the variable fid. This file identifier will be used in the function calls to read and write to the target file. If fopen fails, it will return the value -1. By checking for the value -1, we can perform some action if the file fails to open.

Once a file has been opened and we performed our desired task, we need to close the file. In MATLAB, we can close a file with the fclose function. This function takes the file identifier and returns a 0 if the file closed successfuly or a -1 if not. Again, this provide a convenient way for us to check if the function was successful.

% close file
cr = close(fid);

Reading

MATLAB has a spectrum of file reading functions that range from high-level to low-level. Below is a list of non-specialized functions and their actions.

load     - reads matrix formatted data into a matrix 
fscanf   - reads formatted data into a matrix using conversion formats (ex. %d, %f, and %s)
textscan - reads text data into a cell array using conversion formats (ex. %d, %f, and %s)
fgets    - reads one line of a file (including newline)
fgetl    - reads one line of a file (excluding newline)

The functions fscanf and textscan are similar in their action, but store their results in different data structures. Since fscanf stores the file data in a matrix, the data type must all be the same and we must specify the matrix dimensions. If our data consists of doubles and characters, the characters will be store with their character encoding. To recover the character, we are required to perform a subsequent casting step. Since the textscan function stores the file data in a cell array, we are not constrained to one data type and do not need to specify dimensions. In practice, you will likely find yourself using textscan rather than fscanf.

% open file for reading
fid = fopen('filename', 'r');

% read with fscanf
mat = fscanf(fid, 'format', dimensions);

% reading with textscan
data = textscan(fid, 'format');

% close file
cr = close(fid);

The fgetl and fgets functions are a bit different than the previous functions we have seen. Instead of reading the entire file in one go, fgetl and fgets only read one line at a time. As such, we need to place these functions within a loop if we hope to read an entire file. Fortunately, as we read each line of the file, MATLAB keeps track of the line we are on. We can use the feof to ask MATLAB if we have reached the end of the file. This gives us a convenient way to loop through all lines of a file.

1
2
3
4
5
6
7
8
9
10
11
% open file for reading
fid = fopen('filename', 'r');

% loop over each line and display
while ~feof(fid)
    line = fgetl(fid);
    disp(line)
end

% close file
cr = close(fid)

Writing/Appending

Whether we are writing or appending to a file will depend on the permissions used when calling fopen. Regardless, we can use a familiar function fprintf to perform the write/append. The arguments passed to fprintf are the same as before except we also pass the file identifier.

% open file for writing
fid = fopen('filename', 'w');

% write to file
fprintf(fid, 'format', variable(s));

% close file
cr = fclose(fid);

MAT Files

As you begin to use MATLAB for more complex data processing, you might find yourself in a situation where you would like to save all of the variables in your workspace to save yourself the hassle of redoing your work. We can use the save function to save our entire workspace or a specific set of variables. We can then open them later using the load function. This is particularly helpful when you are in the middle of a project and have to leave for a meeting.

1
2
3
4
5
6
7
8
% save entire workspace
save filename

% save specific variable
save filename variable

% append variable to MAT file
save -append filename variable

Final Words

Today, we have seen file I/O functions that range from high-level to low-level. These should allow you to access data within a wide variety of file formats. Keep in mind, MATLAB has many other specialized functions for reading/writing specific file types (ex. xlsread for Excel files). In the next lab, we will play around with these functions and attempt to access data in files with unknown extensions.