Finding files
by Mark G. Wiseman
After you have been using C++Builder for a while, the project folders on your hard drive can become really cluttered with unnecessary files. Backup files, debug files, pre-compiled header files, and others can take up a lot of space. To help manage this situation, I’d like to share with you a utility program I’ve written to clean up my hard drive.
This will be the first of many discussions covering several sections of the program’s code in a series of articles. Future articles will look at the user interface code, the code for storing and retrieving settings, and the code for deleting files from the hard drive. I will also talk about design issues when working with C++Builder and planning for program enhancements.
Project file organization
Before we can delete those pesky files, we have to find them. So, how do we do it? Some of it will depend on the way the files are organized, so I’ll tell you a little about how I organize my C++Builder project files. (I’m not going to talk about version control software other than to say any project involving more than one developer should always use version control software.)
In the past, I have always kept all of my project files on a separate hard drive. My latest computer has only one very large hard drive, though, so now I keep all of my projects in one folder named PROJECTS. Each project is then put in a separate folder under the PROJECTS folder.
Organizing projects this way makes backups very easy, just backup everything in the PROJECTS folder and its subfolders.
Or do I really mean everything? What about all those files that clutter up folders and take up space? Do I really need to waste time and backup media to backup files that I don’t need? What I really need is a utility program that will run before my automated backup each night and delete all those unnecessary files. (You do backup your files every night, don’t you?)
There are a few projects that don’t reside in the PROJECTS folder. Those reside in subfolders of a folder named ARTICLES. There is, for instance, a subfolder named FINDFILES.
So, what I need is code that will search through different folders and their subfolders to find all the files that match certain name patterns.
Finding the best approach
The Windows API provides a structure, WIN32_FIND_DATA, and three functions, FindFirstFile(), FindNextFile() and FindClose() that we will need to search for files. FindFirstFile() searches for the first file that matches a specific file name pattern and, if found, will fill in a WIN32_FIND_DATA structure. FindNextFile() will find the next file that matches the pattern and FindClose() will close the handle returned by FindFirstFile(). These functions are well documented in the Windows API online help.
We could just use these functions and start writing code, but this isn’t very object oriented and there are things we could do to make finding files easier. What about wrapping this structure and these functions in a class or two?
If you read Part 4 of my series of articles, “Hidden Treasures of Sysutils” (Vol. 3, Number 8; August 2000), you know that Borland has written some wrapper code for these functions. You will also know that it is flawed. Borland’s code uses an int to store file size and this can be a problem with very large files. The Windows API stores file size in two DWORD variables. And, as you’ll see, with very little effort we can develop some wrapper code that will be much more useful and easier to use.
Wrapping WIN32_FIND_DATA
First let’s wrap the Windows API structure WIN32_FIND_DATA. Here is the definition for WIN32_FIND_DATA:
typedef struct _WIN32_FIND_DATA
{
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
DWORD nFileSizeHigh;
DWORD nFileSizeLow;
DWORD dwReserved0;
DWORD dwReserved1;
TCHAR cFileName[ MAX_PATH ];
TCHAR cAlternateFileName[ 14 ];
} WIN32_FIND_DATA;
This structure contains a DWORD that holds the file’s attributes (such as read-only or hidden). It also contains three different dates stored in another Windows API structure FILETIME and the file size variables, nFileSizeHigh and nFileSizeLow, that I mentioned earlier. Next, are two reserved DWORD variables that we don’t need to worry about and, finally, two forms of the file name: the long file name stored in cFileName and the old DOS 8.3 file name stored in cAlternateFileName.
Since we’re programming in C++Builder, there are two obvious things we can do to make this structure easier to use. We can return the three dates in the VCL TDateTime format and we can return the two file names in AnsiString format. And, while we’re at it let’s combine those two size variables and return them as a single __int64.
Let’s take a look at the class I have named TFFData (for “Find Files Data”) and you’ll see why I say that the code will return the above values. Listings A and B contain the code for TFFData. As you can see from these listings, I did not derive TFFData from WIN32_FIND_DATA. Instead, I made the WIN32_FIND_DATA structure a private member of TFFData named data. I did this for two reasons.
First, users of this class should not be changing the values in data. Making data private prevents this. I could have achieved this goal by inheriting privately from WIN32_FIND_DATA, but then this code would not be very portable which is the second reason I made WIN32_FIND_DATA a member of TFFData.
I realize that I’m using the VCL specific TDateTime and AnsiString; and that this will also limit portability. However, this may not be a problem if I am trying to port my code to the upcoming version of C++Builder for Linux. I have my fingers crossed anyway.
All but one of the functions in TFFData is implemented as an inline function for efficiency.
The one function I did not implement as an inline is FileTimeToDateTime(), which is a private function used to convert a FILETIME structure to a TDateTime.
You might be wondering why this function and the three functions that use it, GetCreationDateTime(), GetLastAccessDateTime() and GetLastWriteDateTime(), return a bool instead of a TDateTime. As it turns out, not all versions of Windows keep track of all three dates for all files. As a result, these functions return true if the date exists and false if it does not. The converted FILETIME is then returned to a TDateTime that was passed into the functions by reference.
I want to point out a few more things in TFFData. The constructor for this class fills the memory occupied by the class with zeroes. This is a safety measure. I could have tested within every member function to be sure that each instance of the class had been initialized with data for a real file, but I thought this was overkill for a simple utility class. Filling the data with zeroes should be good enough.
The functions GetName(), GetAlternateName(), GetAttributes() and GetSize() perform exactly what their names imply. Of course, GetSize() returns the proper __int64 and GetName() and GetAlternateName() return the more useful AnsiString.
In addition to GetAttributes(), there are three additional functions that deal with file attributes. I added these after I found I was constantly using the GetAttributes() function to get the attributes for the file and then testing to see if the file was actually a file or if it was a folder. The functions IsFile() and IsFolder() now do those tests for me. The function HasAttribute() is a little more generic and can be used to test if the file in question has a specific attribute. You can find a list of file attributes in the Windows API online help for the WIN32_FIND_DATA structure.
I also included the function GetRawData() just in case I need access to the raw WIN32_FIND_DATA structure.
Finally, you will notice that I have declared the class TFindFile as a friend to TFFData. I will explain this class in the next section.
Wrapping the Windows API functions
The TFindFile class not only wraps the Windows API functions FindFirstFile(), FindNextFile() and FindClose(), it also adds the capability to search for files in subfolders. The Windows API functions do not have this capability. Listings C and D contain the definition and implementation for TFindFile.
The constructor for TFindFile initializes a few data members and is declared inline.
The Find() function is the workhorse function in this class. Find() takes an AnsiString argument that represents a file name pattern. This pattern can contain a file name or pattern and optionally a path to the starting folder to search. If no path is specified, Find() will begin its search in the current directory. Let’s take a closer look at Find().
The pitfalls of recursion
If you’ve written code to search through subdirectories or subfolders, you have probably used recursion. Using this approach on a large disk with a lot of nested subfolders, could quickly lead to stack overload because of the number of times a function might have to call itself. To avoid such a situation, I’ve created a small helper class, local to TFindFiles, named TFFStack. TFFStack uses a TStringList to store folder names. I could have used the STL stack template, but I found this implementation of TFFStack to be a little easier.
The Find() function uses the TFFStack and two private methods, FindFiles() and FindDirs() to search through nested folders without using recursion and therefore avoids any stack problems. It also has an added bonus of presenting folders in alphabetical order.
How do we stop?
Let’s say we call Find() like this:
Find("c:\\*.*");
On my system, the C drive contains about 45,000 files occupying more than 13 gigabytes of disk space. Obviously, Find() may run a very long time on my machine. We need a way to stop it. I’ve chosen a very simple way.
TFindFiles has a private data member, a bool named stop. When stop is set to true, Find() and its helper functions FindFiles() and FindDirs() will stop. They do this by checking the value of stop at various locations in the code. These functions also call the ProcessMessages() method of TApplication to allow for the value of stop to be changed. The inline Stop() function will set the private stop variable to true.
A more complicated way to allow Find() to stop would be to use threads. I’ll leave this exercise to you, or possibly a future article.
How do we know Find() has found a file?
Find() uses the C++Builder language extension of closures to report when it has found a file. A closure is similar to a callback function, only much better.
You can assign a closure function that you write to the OnFileFound property of TFindFiles. (The combination of this property and the closure function, make OnFileFound an event in C++Builder programming lingo.) Every time a file is found, Find() calls the OnFileFound function, passing it a pointer to the instance of TFindFile, the name of the file, the path to the folder containing the file and a copy of the TFFData.
Conclusion
The main topic of my next article will be what you do with the information given to you by the OnFileFound event. I will also finish discussing the TFindFile class, with particular emphasis on the SearchSubfolders and OnSearchFolder properties.
Listing A: FFData.h
#ifndef FFDataH
#define FFDataH
class TFFData {
public:
TFFData();
String GetName();
String GetAlternateName();
unsigned long GetAttributes();
bool HasAttribute(DWORD attribute);
bool IsFolder();
bool IsFile();
__int64 GetSize();
bool GetCreationDateTime(TDateTime &dateTime);
bool GetLastAccessDateTime(TDateTime &dateTime);
bool GetLastWriteDateTime(TDateTime &dateTime);
WIN32_FIND_DATA GetRawData();
private:
bool FileTimeToDateTime(
const FILETIME fileTime,
TDateTime &dateTime);
WIN32_FIND_DATA data;
friend class TFindFile;
};
// Inline Functions --------------------
inline TFFData::TFFData() {
ZeroMemory(&data, sizeof(data));
}
inline String TFFData::GetName() {
return(String(data.cFileName));
}
inline String TFFData::GetAlternateName() {
return(String(data.cAlternateFileName));
}
inline unsigned long TFFData::GetAttributes() {
return(data.dwFileAttributes);
}
inline bool TFFData::HasAttribute(
unsigned long attribute)
{
return(data.dwFileAttributes & attribute ?
true : false);
}
inline bool TFFData::IsFolder() {
return(HasAttribute(FILE_ATTRIBUTE_DIRECTORY));
}
inline bool TFFData::IsFile() {
return(!IsFolder());
}
inline __int64 TFFData::GetSize() {
return(data.nFileSizeHigh *
MAXDWORD + data.nFileSizeLow);
}
inline bool TFFData::GetCreationDateTime(
TDateTime &dateTime)
{
return(FileTimeToDateTime(
data.ftCreationTime, dateTime));
}
inline bool TFFData::GetLastAccessDateTime(
TDateTime &dateTime)
{
return(FileTimeToDateTime(
data.ftLastAccessTime, dateTime));
}
inline bool TFFData::GetLastWriteDateTime(
TDateTime &dateTime)
{
return(FileTimeToDateTime(
data.ftLastWriteTime, dateTime));
}
#endif // FFDataH
Listing B: FFData.cpp
#include <vcl.h>
#pragma hdrstop
#include "FFData.h"
bool TFFData::FileTimeToDateTime(
const FILETIME fileTime, TDateTime &dateTime)
{
if (fileTime.dwLowDateTime == 0 &&
fileTime.dwHighDateTime == 0)
return(false);
FILETIME localTime;
if (FileTimeToLocalFileTime(
&fileTime, &localTime) == 0)
return(false);
SYSTEMTIME systemTime;
if (FileTimeToSystemTime(
&localTime, &systemTime) == 0)
return(false);
dateTime = SystemTimeToDateTime(systemTime);
return(true);
}
#pragma package(smart_init)
Listing C: FindFile.h
#ifndef FindFileH
#define FindFileH
#include "FFData.h"
class TFFStack;
typedef void __fastcall (__closure *TFileFound)
(TFindFile *Sender, String fileName,
String foldername, TFFData data);
typedef void __fastcall (__closure *TSearchFolder)
(TFindFile *Sender,
String folderName, bool &skip);
class TFindFile {
public:
TFindFile();
void Find(String filePattern);
void Stop();
__property bool SearchSubfolders = {
read = searchSubfolders,
write = searchSubfolders};
__property TFileFound OnFileFound = {
read = FOnFileFound, write = FOnFileFound};
__property TSearchFolder OnSearchFolder = {
read = FOnSearchFolder,
write = FOnSearchFolder};
private:
void FindFiles(String filePattern);
void FindDirs(String baseDir, TFFStack &stack);
TFileFound FOnFileFound;
TSearchFolder FOnSearchFolder;
bool stop;
bool searchSubfolders;
};
inline TFindFile::TFindFile() {
FOnFileFound = 0;
FOnSearchFolder = 0;
stop = true;
searchSubfolders = false;
}
inline void TFindFile::Stop() {
stop = true;
}
#endif // FindFilesH
Listing D: FindFile.cpp
#include <vcl.h>
#pragma hdrstop
#include "FindFile.h"
class TFFStack {
public:
TFFStack();
~TFFStack();
void Push(AnsiString name);
AnsiString Pop();
void Empty();
bool Contains(String name);
bool IsEmpty();
private:
TStringList *list;
};
inline TFFStack::TFFStack() {
list = new TStringList;
list->Sorted = true;
list->Duplicates = dupIgnore;
}
inline TFFStack::~TFFStack() {
delete list;
}
inline void TFFStack::Push(AnsiString name) {
list->Add(name);
}
inline AnsiString TFFStack::Pop() {
AnsiString temp = list->Strings[0];
list->Delete(0);
return(temp);
}
inline void TFFStack::Empty() {
list->Clear();
}
inline bool TFFStack::Contains(String name) {
int index;
return(list->Find(name, index));
}
inline bool TFFStack::IsEmpty() {
return(list->Count == 0);
}
// ---------------------------
void TFindFile::Find(String filePattern)
{
stop = false;
String curDir = ExtractFilePath(
ExpandFileName(filePattern));
String fileName = ExtractFileName(filePattern);
bool skip = false;
if (FOnSearchFolder)
FOnSearchFolder(curDir, skip);
if (FOnFileFound && skip == false)
FindFiles(curDir + fileName);
if (searchSubfolders) {
TFFStack dirStack;
while (stop == false) {
Application->ProcessMessages();
FindDirs(curDir, dirStack);
if (dirStack.IsEmpty())
break;
curDir = dirStack.Pop();
skip = false;
if (FOnSearchFolder)
FOnSearchFolder(curDir, skip);
if (skip)
continue;
if (FOnFileFound)
FindFiles(curDir + fileName);
}
}
stop = true;
}
void TFindFile::FindDirs(
String baseDir, TFFStack &stack)
{
TFFData ffData;
String dirPattern = baseDir + "*.*";
HANDLE handle = FindFirstFile(
dirPattern.c_str(), &ffData.data);
if (handle != INVALID_HANDLE_VALUE) {
bool found = true;
while (found == true && stop == false) {
Application->ProcessMessages();
if (ffData.IsFolder() &&
ffData.GetName() != "." &&
ffData.GetName() != "..")
stack.Push(baseDir + ffData.GetName() + "\\");
found = (FindNextFile(handle, &ffData.data) != 0);
}
FindClose(handle);
}
}
void TFindFile::FindFiles(String filePattern)
{
TFFData ffData;
HANDLE handle = FindFirstFile(
filePattern.c_str(), &ffData.data);
if (handle != INVALID_HANDLE_VALUE) {
bool found = true;
while (found == true && stop == false) {
Application->ProcessMessages();
if (FOnFileFound &&
ffData.GetName() != "." &&
ffData.GetName() != "..")
FOnFileFound(ffData.GetName(),
ExtractFilePath(filePattern),
ffData, stop);
found = (FindNextFile(handle, &ffData.data) != 0);
}
FindClose(handle);
}
}
#pragma package(smart_init)