It looks like you're referencing a filename pattern from a JAV (Japanese Adult Video) source — possibly an MP4 file naming convention that includes a code (), a site label ( JAVHD ), and dates.
| Feature | Example value | |---------|----------------| | movie_id | SONE-162 | | source | JAVHD | | release_date | 2024-04-19 | | segment | 02 or 23 | | raw_filename | original string | | is_truncated | True | import re from datetime import datetime def parse_jav_filename(filename: str): """Extract structured features from a JAV-style filename.""" features = "raw_filename": filename, "movie_id": None, "source": None, "release_date": None, "segment": None, "is_duplicate_tag": False SONE-162-JAVHD-TODAY-04192024-JAVHD-TODAY02-23-...
# Detect duplicate JAVHD-TODAY pattern if filename.count("JAVHD-TODAY") > 1: features["is_duplicate_tag"] = True It looks like you're referencing a filename pattern