Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.
- No hardcoded logic: you choose any number of groups (lists, enums, dicts, strings).
- Automatic date and time search (many formats supported and validated).
- Unlimited custom patterns: add your own regex groups.
- Configurable priority: filename or path takes precedence.
- Supports
str
andpathlib.Path
. - Returns
None
if not found or not valid. - Simple interface: Use just two functions β
parse()
andcreate_parser()
.
- Installation
- Supported Date and Time Formats
- Usage Examples
- API Reference
- How It Works
- Notes
- Command-Line Interface (CLI)
- PatternMatcher.find_special (advanced usage)
- Contributing
- Project Board
- FAQ / Known Issues
- Author
- License
pip install file_path_parser
Date examples:
- 20240622 (YYYYMMDD)
- 2024-06-22 (YYYY-MM-DD)
- 2024_06_22 (YYYY_MM_DD)
- 22.06.2024 (DD.MM.YYYY)
- 22-06-2024 (DD-MM-YYYY)
- 220624 (YYMMDD)
- 2024-6-2, 2024_6_2
Time examples:
- 154212 (HHMMSS)
- 1542 (HHMM)
- 15-42-12 (HH-MM-SS)
- 15_42_12 (HH_MM_SS)
- 15-42, 15_42 (HH-MM, HH_MM)
All dates and times are validated. E.g. "20241341" is not a date; "246199" is not a time.
All parsing is done through the interface functions β use only these for a clean API:
from file_path_parser.api import parse
result = parse(
"cat_night_cam15_20240619_1236.jpg",
["cat", "dog"], ["night", "day"],
date=True, time=True, patterns={"cam": r"cam\d{1,3}"}
)
print(result)
# {'group1': 'cat', 'group2': 'night', 'date': '20240619', 'time': '1236', 'cam': '15'}
from file_path_parser.api import create_parser
parser = create_parser(
["cat", "dog"], ["night", "day"],
date=True, time=True, patterns={"cam": r"cam\d{1,3}"}
)
result = parser.parse("dog_night_cam22_20240620_0815.jpg")
print(result)
# {'group1': 'dog', 'group2': 'night', 'date': '20240620', 'time': '0815', 'cam': '22'}
from enum import Enum
from file_path_parser.api import parse
class Status(Enum):
OPEN = "open"
CLOSED = "closed"
result = parse(
"open_beta_cam21_20231231_2359.txt",
Status, ["beta", "alpha"],
date=True, time=True, patterns={"cam": r"cam\d{2}"}
)
print(result)
# {'status': 'open', 'group1': 'beta', 'date': '20231231', 'time': '2359', 'cam': '21'}
from file_path_parser.api import parse
result = parse(
"/data/prod/archive/test_20240620.csv",
["prod", "test"], date=True, priority="filename"
)
print(result)
# If priority="filename", group1 == "test"
# If priority="path", group1 == "prod"
from file_path_parser.api import parse, create_parser
def parse(full_path: str, *groups, date=False, time=False, separator="_", priority="filename", patterns=None) -> dict:
'''
One-line parsing.
'''
def create_parser(*groups, date=False, time=False, separator="_", priority="filename", patterns=None) -> FilePathParser:
'''
Returns a reusable parser object.
'''
- Group name is auto-generated:
- Enum: lowercase enum class name.
- Dict: key as group name.
- List/tuple/set: groupN (N = order of argument).
- String: value as group name.
- If group not found or invalid: returns None for that group.
- Date and time always validated (returns None if not real date/time).
- Custom patterns: returns only the captured number (not the full match, e.g.
cam15
β15
).
- Splits filename and path into βblocksβ (by
_
,-
,.
,/
, etc). - For each group, tries to find an exact match (for enums, lists, dicts).
- For
date
andtime
:- Matches all supported formats via regex.
- Validates with
datetime.strptime
.
- For custom patterns:
- Uses provided regex patterns.
- Returns only the matched number (if the pattern looks like
cam\d+
, the result is'15'
forcam15
).
If you want the full match, add explicit parentheses:patterns={"cam": r"(cam\d+)"}
If both path and filename have a group, the value from priority
wins.
- Group name in the result will be None if not found or not valid.
- If both path and filename have the group, value from priority wins.
- You can use any number of groups or patterns β no hard limit.
- PatternMatcher.find_special:
This internal method is not used by default, but can be handy for advanced scenarios (e.g., direct pattern lookup in a string, testing, or future extensions).
The library supports a convenient command-line interface (CLI) for extracting structured information from file names and paths.
After installing dependencies with Poetry, you can use the file-path-parser
utility to parse file names directly from your terminal.
poetry run file-path-parser "cat_night_cam15_20240619_1236.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"
poetry run file-path-parser --help
filepath
β Path or file name to parse--groups
β List of allowed groups (e.g.cat dog
)--classes
β List of allowed classes (e.g.night day
)--date
β Enable date parsing--time
β Enable time parsing--pattern NAME REGEX
β Add custom pattern (can be used multiple times)
poetry run file-path-parser "dog_day_cam2_20240701_0800.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"
The parsing result will be displayed in the terminal.
Note
This method is not used by default in the main API.
It is intended for advanced users or integration testing, and may be useful if you need to extract a pattern (date, time, or your custom pattern) from a string directly.
You can use PatternMatcher.find_special
to extract either a date, time, or any custom group from a filename or arbitrary string.
For custom patterns, it will return only the numeric part (e.g., "15"
for "cam15"
).
from file_path_parser.file_path_parser import PatternMatcher
matcher = PatternMatcher(user_patterns={"cam": r"cam\d{2}"})
print(matcher.find_special("foo_cam15_20240619.txt", "cam")) # Output: "15"
print(matcher.find_special("foo_20240619.txt", "date")) # Output: "20240619"
print(matcher.find_special("foo_1531bar.txt", "time")) # Output: "1531"
Pull requests, bug reports and feature requests are welcome!
All ongoing development, task tracking, and planning for this library is managed in the Project Board.
- See what's in progress, planned, or completed
- Follow the roadmap and feature development
- Suggest improvements or report issues via Issues, which are linked directly to the board
A: The result depends on the priority
parameter:
- If
priority="filename"
(default), the group value from the filename wins. - If
priority="path"
, the value from the directory path wins.
A: Yes. Groups and blocks are matched in a case-insensitive way and support Unicode.
A: By default, the parser splits by any of these: _
, -
, .
, /
, \
, {}
, or space.
If your files use custom separators, let us know!
A: The parser validates all dates/times. "20241341" (wrong month/day) will not be recognized as a date, etc.
A:
- If you use e.g.
patterns={"cam": r"cam\d+"}
, you get just the number, e.g.'cam15'
β'15'
. - If you want the full match (e.g.
'cam15'
), use explicit parentheses:patterns={"cam": r"(cam\d+)"}
.
- If your separator is unusual (not in the list above), you may need to pre-process filenames.
- Extremely exotic date/time formats (not listed in "Supported formats") are not matched.
- Path parsing supports both
str
andpathlib.Path
, but network/multiplatform paths (e.g., UNC, SMB) are not specifically tested.
MIT