Fun with Python: File Search in Multiple Directories


Introduction

In this installment of FwP, I will share a simple script that will search multiple directories for specific strings in file names. At the moment, some of its functionality is hardcoded, but it can be easily modified to expand its search capabilities. The current version is designed to search for specific strings that represent file types (note, it doesn’t actually search the type - just if it contains, for example, “.txt” in its name).

As with all code I present, this is free to use according to the GNU GPL. I make no guarantees, or warranties, and cannot be held liable for any misuse or abuse. Use at your own risk.

Important Note: This code allows for user input that is not sanitized, which is a big no-no. In normal practice, don’t do this. I wanted to keep the code and concept simple, so I didn’t add any checks. Also, this script can search any file path, so be careful when attempting to search from a root folder that has many subfolders. I have not tested this at scale, but it is not optimized for scalability and does not handle resource management. Use with caution. When running from Command Prompt, ctrl-c should kill it, if it gets out of hand.


Preparation

As always, this article assumes some basic computer knowledge, and the script was developed for Windows (sorry!).

I suggest testing this in a contained folder, and not on any existing folder.

Create a new folder anywhere that is easy to get to from Command Prompt and name it something simple. The script will prompt for this folder’s path.

Now, add a few subfolders to this folder, and toss in some sample files with extensions .txt, .log, and .json.

I created the following structure:

C:\file_analyzer\
-------\folder one
-------\folder two
-------\folder three

In each subfolder I have randomly placed some files called txt_file_001.txt, txt_file_002.txt, log_file_001.log, log_file_002.log, json_file_001.json, txt_file_001.json.

Put the script in the root folder created for this exercise.

Here are screenshots showing my folder structure:


The Code

 

import os

app_running = True

dir_input = False

def finder(txt_to_find,dir):

for (root,dirs,files) in os.walk(dir,topdown=True):

txt_to_search = str(files).find(txt_to_find)

print("Directory: ", root)

if txt_to_search > 0:

for x in files:

if txt_to_find in x:

print("File: ", x)

else:

pass

else:

print("File: None found.")

menu(dir)

def menu(dir):

ui = ""

print("\n\n\n\n")

print("*********** file_finder_v1 ***********")

print("1) find .txt ")

print("2) find .log ")

print("3) find .json ")

print("q) quit ")

print("****************************************")

while app_running:

ui = input("_: ")

match ui:

case "q":

app_running == False

quit()

case "1":

finder(".txt", dir)

case "2":

finder(".log", dir)

case "3":

finder(".json", dir)

case _:

print("Error: command not recognized.")

os.system('cls' if os.name == 'nt' else 'clear')

while dir_input == False:

ui = input("Search which directory? ")

if ui:

dir_input = True

menu(ui)

 

Copy and paste the code into Notepad. Save the file as “file_analyzer.py”. Make sure change Save as type to All Files so that it saves as a .py file, and not a .txt file.


The Script in Action

Open the Command Prompt and navigate to the folder where the script and subfolders are located.

Run the command:

python file_analyzer.py

It will prompt for a directory path to search. Input the full path to the folder created in the preparation phase. Be careful here. Do not search too deeply, and remember this input is not sanitized.

Press enter and it will present the menu. Choose a file type to search for and input the corresponding number.

Press enter and it will present a list of directories, followed by the names of any files of that type which it has found. It will then immediately return to the main menu so that another file type can be searched in the same directory. Quit the program and restart to search a new directory.

Note in the above output, no files were found in the root folder, two files were found in folder one, no files were found in folder three, and one file was found in folder two. Also, note that it searched the folders alphabetically.

Try again with the other format options to see the output.

Now, I will explain how it works, though I will not go line-by-line. Most components are fundamental and should be easy to understand with very basic knowledge. Most of the written code is just formatting the menu, and while loops to capture input and keep the program running.


The Code Analyzed

The heart of this script is the use of the function os.walk(). This requires the os library, obtained with the line:

import os

The primary function is finder(txt_to_find, dir):

def finder(txt_to_find,dir):
	for (root,dirs,files) in os.walk(dir,topdown=True):
		txt_to_search = str(files).find(txt_to_find)
		print("Directory:  ", root)
		if txt_to_search > 0:
			for x in files:
				if txt_to_find in x:
					print("File:  ", x)
				else:
					pass
		else:
			print("File:  None found.")
	menu(dir)

This function takes two variables. The first is the string passed based on the user’s input at the menu prompt. The second is the directory to search provided by the user and passed from the initial prompt when the script is run (note, it is first passed to the menu, then the menu passes it to this function along with the string).

This function uses os.walk() to search the directory, which returns a tuple. The three items are lists that contain the root, directory and all files in the path.

The function then converts the list files into a string so that it can use .find() to search for the text that is passed from the user’s input. This will return an integer value of -1 if nothing is found.

The function then checks against the integer value. If it is less than 0, it will inform the user that nothing was found in that directory. If it is greater than 0, it will return the list of files.

This is all contained in a pair of nested for loops to iterate through each directory and list of files

After both for loops are exhausted, it returns to the main menu, making sure to send the dir variable back to the menu since the menu expects this input. That could probably be cleaner.

NOTE: As mentioned in the Introduction, this script searches for strings, not actual file types. So, it could potentially contain false positives in this use case if, for example, a file name contains “.txt” other than at the end. I didn’t design this to look for file types specifically. I designed it to be modular and search for any string in the list of files within multiple directories. I just use file types as a simple example. The menu can be modified to include any search options, or removed entirely. The point is, the primary function is its ability to search multiple folders for specific strings in file names.


Summary

I did not optimize or sanitize. There are probably many tweaks to make this more efficient and cleaner. I wanted to work with the os libraries more, and I thought that its ability to search directories could have many uses, so I slapped this together quick and dirty to understand how os.walk() functioned.

As always, feel free to use it and modify it, following the GNU GPL. Use at your own risk.

Thanks for reading, and have fun.


Daily Cuppa

Today’s cup of tea is Organic Vanilla Rooibos provided by Equal Exchange. Organic, fair trade, and loaded with earthy sweetness.


If you enjoyed this article or the site in general, feel free to buy the author a cup of tea.

Previous
Previous

Tea with Copi: Movie Night

Next
Next

IT Security 101: Incident Response (IR)