File System Scanner
OverviewThe File System Scanner is a collection of tools intended to record the history of a file system into a relational database.
- Python and Shell scripts stored and executed on a system needing file system scanning
- Web Services hosted on the warehouse web server (views.cira.colostate.edu/tsdw/).
- SQL Server stored procedures.
- Scan the file system to a file
This is a top level script which executes scans of several directory trees. This script needs to be parameterized. The main function of this script is to connect specific directory trees to be scanned to instances of ScannerToFile described below.
This file contains numerous classes to handle the scanning of directory tree contents into various forms. The specific class used is ScannerToFile. Methods of the scanner to file class are called in the doScan.py script. This file contains the work of scanning the file system and storing the results. Also included in this file is the logic for parsing .log or .lst files which list the contents of adjacent .tar.gz archive files. The results of the file system scan are stored in a binary file whose form is defined within the ScannerToFile methods.
- Process the file to the database
This shell script processes the binary scan result files generated in the last step to web services. This shell script contains arguments given to buildHistoryWeb.py
This file parses a file system scan result binary file and publishes the contents to the TSDW web services. This process primarily entails calling preliminary web services /stored procs to set up for a scan, then iterating the file scan records in batches and publishing these to web services, and finally calling methods for post-scan processing.
This file contains a class which represents the connection to TSDW web services.
This is the server side code for the web services. There are many web services methods contained within this file but only a few are used for file system scanning. The services utilized in this case are all mapped directly to sql server stored procedures with the same name. Listed below are a description of the web services / stored procedures.
@Timestamp bigint - Epoch timestamp representing the time a scan was performed
Alters an insert trigger on the FileSystemItem table such that insert/update times are set as the epoch time given in @Timestamp
Sets the Scan bit of all file system item records to 0. As file system items are encountered in the scan result the db record is updated and the scan bit is set to 1. After all scan results have been processed any records which still have a 0 scan bit were not encountered when processing the file scan results.
@IsFile bit - 1 if this is a file 0 if it is a directory
@IsDirectory bit - 1 if this is a directory, 0 if it is a file
@FTPPath varchar(400) = null - The path for the file system item
@Depth int - The depth of the item from root
@FileCount int - If a directory number of files in this directory (does not include sub directories)
@TotalFileSizeBytes bigint - Total number of bytes in this directory (does not include sub directories)
@ModTime bigint - Epoch timestamp representing file modification time.
This plural web service (Items) maps to a singular Stored procedure. The stored procedure receives information about the item and sets the item's scan bit to 1. Insert/Update times are managed by triggers.
@Timestamp bigint = null - Timestamp for completion of file system scan. If given will be used as deletion time for unscanned file system ite,s. If not given then the current time is used
This method completes processes unscanned items by assuming that they have been deleted. Any file system item record which has a scan bit of 0 has it's IsDeleted bit set to 1 and deletion time updated to @Timestamp if it is given or the current system time.
Sets the hierarchy id field for each file system item record which has a null hid and IsDeleted == 0.
Updates the TreeFileCount and TreeTotalSizeBytes fields in the FileSystemItemTable. This applies only to directories. The updated fields represent the count and size of files in the entire directory tree rooted at the path of a directory record entry in the file system item table.
Resets the insert/update triggers on the file system item table to use the current system time instead of a time which may have been set in StartFileSystemScan.