FilePathParser

Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.

No hardcoded logic: you choose any number of groups (lists, enums, dicts, strings).
Automatic date and time search (many formats supported and validated).
Unlimited custom patterns: add your own regex groups.
Configurable priority: filename or path takes precedence.
Supports str and pathlib.Path.
Returns None if not found or not valid.
Simple interface: Use just two functions — parse() and create_parser().

from file_path_parser.api import create_parser

parser = create_parser(
    ["cat", "dog"], ["night", "day"],
    date=True, time=True, patterns={"cam": r"cam\d{1,3}"}
)
result = parser.parse("dog_night_cam22_20240620_0815.jpg")
print(result)
# {'group1': 'dog', 'group2': 'night', 'date': '20240620', 'time': '0815', 'cam': '22'}

More usage: Dicts, Enums, Priority, etc.

from enum import Enum
from file_path_parser.api import parse

class Status(Enum):
    OPEN = "open"
    CLOSED = "closed"

result = parse(
    "open_beta_cam21_20231231_2359.txt",
    Status, ["beta", "alpha"],
    date=True, time=True, patterns={"cam": r"cam\d{2}"}
)
print(result)
# {'status': 'open', 'group1': 'beta', 'date': '20231231', 'time': '2359', 'cam': '21'}

If both the path and filename contain a group, the `priority` parameter wins.

from file_path_parser.api import parse

result = parse(
    "/data/prod/archive/test_20240620.csv",
    ["prod", "test"], date=True, priority="filename"
)
print(result)
# If priority="filename", group1 == "test"
# If priority="path", group1 == "prod"

API Reference

from file_path_parser.api import parse, create_parser

def parse(full_path: str, *groups, date=False, time=False, separator="_", priority="filename", patterns=None) -> dict:
    '''
    One-line parsing.
    '''

def create_parser(*groups, date=False, time=False, separator="_", priority="filename", patterns=None) -> FilePathParser:
    '''
    Returns a reusable parser object.
    '''

Group name is auto-generated:
- Enum: lowercase enum class name.
- Dict: key as group name.
- List/tuple/set: groupN (N = order of argument).
- String: value as group name.
If group not found or invalid: returns None for that group.
Date and time always validated (returns None if not real date/time).
Custom patterns: returns only the captured number (not the full match, e.g. cam15 → 15).

How It Works

Splits filename and path into “blocks” (by _, -, ., /, etc).
For each group, tries to find an exact match (for enums, lists, dicts).
For date and time:
- Matches all supported formats via regex.
- Validates with datetime.strptime.
For custom patterns:
- Uses provided regex patterns.
- Returns only the matched number (if the pattern looks like cam\d+, the result is '15' for cam15).
  If you want the full match, add explicit parentheses: patterns={"cam": r"(cam\d+)"}

If both path and filename have a group, the value from priority wins.

Notes

Group name in the result will be None if not found or not valid.
If both path and filename have the group, value from priority wins.
You can use any number of groups or patterns — no hard limit.
PatternMatcher.find_special:
This internal method is not used by default, but can be handy for advanced scenarios (e.g., direct pattern lookup in a string, testing, or future extensions).

Command-Line Interface (CLI) for FilePathParser

The library supports a convenient command-line interface (CLI) for extracting structured information from file names and paths.

🚀 Quick Start

After installing dependencies with Poetry, you can use the file-path-parser utility to parse file names directly from your terminal.

Example usage

poetry run file-path-parser "cat_night_cam15_20240619_1236.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"

Show help

poetry run file-path-parser --help

CLI Options

filepath — Path or file name to parse
--groups — List of allowed groups (e.g. cat dog)
--classes — List of allowed classes (e.g. night day)
--date — Enable date parsing
--time — Enable time parsing
--pattern NAME REGEX — Add custom pattern (can be used multiple times)

Example

poetry run file-path-parser "dog_day_cam2_20240701_0800.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"

The parsing result will be displayed in the terminal.

PatternMatcher.find_special (advanced usage)

Note
This method is not used by default in the main API.
It is intended for advanced users or integration testing, and may be useful if you need to extract a pattern (date, time, or your custom pattern) from a string directly.

You can use PatternMatcher.find_special to extract either a date, time, or any custom group from a filename or arbitrary string.
For custom patterns, it will return only the numeric part (e.g., "15" for "cam15").

from file_path_parser.file_path_parser import PatternMatcher

matcher = PatternMatcher(user_patterns={"cam": r"cam\d{2}"})
print(matcher.find_special("foo_cam15_20240619.txt", "cam"))  # Output: "15"
print(matcher.find_special("foo_20240619.txt", "date"))       # Output: "20240619"
print(matcher.find_special("foo_1531bar.txt", "time"))        # Output: "1531"

Contributing

Pull requests, bug reports and feature requests are welcome!

Project Board

All ongoing development, task tracking, and planning for this library is managed in the Project Board.

See what's in progress, planned, or completed
Follow the roadmap and feature development
Suggest improvements or report issues via Issues, which are linked directly to the board

Visit the Project Board →

FAQ / Known Issues

Q: What happens if both the path and filename contain the same group, but with different values?

A: The result depends on the priority parameter:

If priority="filename" (default), the group value from the filename wins.
If priority="path", the value from the directory path wins.

Q: Can I use non-Latin or Unicode characters in group values?

A: Yes. Groups and blocks are matched in a case-insensitive way and support Unicode.

Q: What separators does the parser recognize between blocks?

A: By default, the parser splits by any of these: _, -, ., /, \, {}, or space. If your files use custom separators, let us know!

Q: What if a value looks like a date/time, but is not real?

A: The parser validates all dates/times. "20241341" (wrong month/day) will not be recognized as a date, etc.

Q: What will be returned for custom patterns?

A:

If you use e.g. patterns={"cam": r"cam\d+"}, you get just the number, e.g. 'cam15' → '15'.
If you want the full match (e.g. 'cam15'), use explicit parentheses: patterns={"cam": r"(cam\d+)"}.

Known Issues

If your separator is unusual (not in the list above), you may need to pre-process filenames.
Extremely exotic date/time formats (not listed in "Supported formats") are not matched.
Path parsing supports both str and pathlib.Path, but network/multiplatform paths (e.g., UNC, SMB) are not specifically tested.

Author

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
src/file_path_parser		src/file_path_parser
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FilePathParser

Table of Contents

Installation

Supported Date and Time Formats

Usage Examples

1. Quick one-line parsing (main use case)

2. Create a reusable parser

More usage: Dicts, Enums, Priority, etc.

If both the path and filename contain a group, the `priority` parameter wins.

API Reference

How It Works

Notes

Command-Line Interface (CLI) for FilePathParser

🚀 Quick Start

Example usage

Show help

CLI Options

Example

PatternMatcher.find_special (advanced usage)

Contributing

Project Board

FAQ / Known Issues

Q: What happens if both the path and filename contain the same group, but with different values?

Q: Can I use non-Latin or Unicode characters in group values?

Q: What separators does the parser recognize between blocks?

Q: What if a value looks like a date/time, but is not real?

Q: What will be returned for custom patterns?

Known Issues

Author

License

About

Uh oh!

Releases

Packages

Languages

License

omigutin/file_path_parser

Folders and files

Latest commit

History

Repository files navigation

FilePathParser

Table of Contents

Installation

Supported Date and Time Formats

Usage Examples

1. Quick one-line parsing (main use case)

2. Create a reusable parser

More usage: Dicts, Enums, Priority, etc.

If both the path and filename contain a group, the priority parameter wins.

API Reference

How It Works

Notes

Command-Line Interface (CLI) for FilePathParser

🚀 Quick Start

Example usage

Show help

CLI Options

Example

PatternMatcher.find_special (advanced usage)

Contributing

Project Board

FAQ / Known Issues

Q: What happens if both the path and filename contain the same group, but with different values?

Q: Can I use non-Latin or Unicode characters in group values?

Q: What separators does the parser recognize between blocks?

Q: What if a value looks like a date/time, but is not real?

Q: What will be returned for custom patterns?

Known Issues

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

If both the path and filename contain a group, the `priority` parameter wins.

Packages