How I Engineered My Own Live TV and EPG System in Jellyfin
When I first spun up Jellyfin, it was supposed to be simple: an open‑source media server holding my library of movies, shows, and some streaming channels.
But simplicity tends to invite curiosity — and curiosity snowballed into a full technical exploration of how to scrape, parse, and automate my own electronic program guide (EPG) for live TV.
Looking back, this wasn’t just about building a guide.
It was about understanding how Jellyfin works under the hood, why metadata and icon mapping matter, and how to integrate an entire system — Docker, networking, scrapers, schedulers — into something that behaves like a professional IPTV platform.
Jellyfin: The Foundation
At its core, Jellyfin is a .NET‑based media server that manages content through three key layers:
- The database layer, where configuration, libraries, and metadata live (config, data, metadata directories).
- The API layer, which serves media data to the web UI, apps, and clients.
- The Live TV subsystem, which consumes playlists (M3U) and guide data (XMLTV) to emulate a television grid.
Unlike Plex or Emby, Jellyfin’s Live TV stack doesn’t rely on commercial API access. It lets you feed your own playlists and guide files directly — and that’s the opening that made this whole thing possible.
Docker makes deployment clean: the container holds Jellyfin’s binaries and runtime dependencies, while you mount the system’s key folders (config, cache, metadata, epg) externally. That isolation gives you the freedom to rebuild Jellyfin anytime without losing its state or the persistent media metadata you've curated.
Once you understand that architecture — container for runtime, host for persistence — Jellyfin becomes a Playground for automation.
The Idea That Became a Script
The real challenge was data integration. I had a working IPTV source that provided an M3U file with channel streams but no metadata or icons.
Separately, I had a scraped source that published schedule JSON files and included logos but in a wildly inconsistent structure.
Jellyfin, of course, expects a single clean XMLTV EPG file with neatly formatted <channel> and <programme> tags.
So, I did what any rational person would do: I wrote a Python script to merge those worlds into one.
It started simple — just fetching both sources and attempting to match channel names. But the more I dug in, the more I realized the inconsistencies weren’t bugs; they were structural differences that required proper normalization, fuzzy matching, and a stable data model.
How the System Works (Architecturally)
The final system works like this:
-
Input layer:
– The script fetches the live M3U.
– It also fetches or scrapes EPG JSON data.
– It pulls a separate channel‑icon‑mapping.json from my GitHub repo (which I built using channel identifiers and icon matching from a curated logos directory). -
Normalization:
Each data source is parsed and cleaned.
Channel names are normalized to lowercase, punctuation removed, and words like “HD/SD” stripped. This makes matching possible even when sources vary (FOX NEWS HD vs Fox News). -
Matching logic:
Matching happens in layers:- Direct tvg‑id matches.
- Cleaned name comparisons.
- Substring and fuzzy matching using Levenshtein distance (difflib in Python).
- Manual overrides from a JSON map when all else fails.
-
Transformation:
At this stage, the script:- Converts JSON programme items into <programme> XML nodes.
- Detects sports, news, and movie content based on keyword matching and channel categories.
- Converts all published UTC timestamps into the local timezone automatically.
- Appends <category> and <icon> tags for enrichment.
-
Output:
It produces three synchronized files:- tv_guide.xml → the XMLTV feed consumed by Jellyfin.
- tv_guide_raw.json → a raw cache of all JSON schedule data.
- mapping.json → the link between EPG channels and M3U entries.
Each run is self‑contained, using real‑time data from the web, so the output is always current.
The Automation Layer
Getting the script to work on demand was great, but I wanted it to be automated — no human intervention, no manual rebuilds.
So I used cron inside the host to run it periodically:
0 */6 * * * /usr/bin/python3 /path/to/scrape_thetvapp.py >> /media/external5tb/jellyfin/logs/epg.log 2>&1
Every six hours, the script wakes up, rebuilds the EPG, commits the new version, and Jellyfin picks it up automatically because its Live TV configuration points directly at the /epg/ volume mount that the script writes into.
No downloads, no dragging files — just fresh guide data streaming in through automation.
Inside the Script Logic
The script is long now — hundreds of lines — but every piece solves something specific I ran into along the way.
Some notable elements:
-
Timezone detection:
Automatically identifies the host’s timezone offset so programme start/end times appear consistent locally. -
Dual format detection:
Recognizes when the incoming JSON uses alternate field names (e.g., data-listdatetime or data-duration) and restructures it into the normalized format the script writes out. -
Icon mapping:
Reads a public JSON mapping hosted on my GitHub repo and injects the proper <icon src="…"> values into the generated XML. -
Category tagging:
Keyword analysis of description fields triggers automatic tagging for Sports, Movies, or News, which Jellyfin then uses for faster metadata lookups. -
Error resilience:
If any channel fails parsing, the script logs it and continues. The EPG never breaks because of a single malformed feed.
When I first wrote this, I was using simple print() statements to debug mismatches. Eventually I added structured logging with reasons — unmatched channels, duplicate TVG IDs, JSON fetch failures, etc.
That visibility became critical for tuning it.
The GitHub Integration
One of the best decisions was moving configuration and mapping data to GitHub.
Anytime the script runs, it pulls:
- The up‑to‑date mapping file (channel‑icon‑mapping.json)
- The current channel scrape definitions (channelsToScrape.xml)
That turns GitHub into a central repository of canonical truth for the build — easily shareable and versioned.
Jellyfin never talks directly to GitHub; it just reads the refreshed EPG file locally.
But the script ensures everything downstream updates cleanly.
Refining the Logos
Early on, getting logos working right almost broke me.
Jellyfin’s caching logic took priority over XMLTV updates, so icons wouldn’t refresh no matter what the EPG showed.
I eventually learned that:
- Removing the EPG source entirely and re‑adding it resets the cache.
- Using CDN links from jsDelivr, not github.com/?raw=true, ensures the images are directly loadable.
- Once a logo is updated or set manually, it sticks across restarts because Jellyfin stores that field in its local database.
This was one of those small but critical victories — the point where the visual polish finally matched the underlying engineering.
Where Things Stand Now
Today the system runs like clockwork:
- The script runs automatically.
- EPGs regenerate every few hours.
- Jellyfin loads the latest XMLTV without me lifting a finger.
- Each channel displays its logo correctly from the mapping.
- Programme entries get categorized and timestamped precisely.
- Errors and unmatched channels are quietly logged for later review.
When I open the guide now and see the grid populated — with colored category stripes, detailed descriptions, and working live streams — it feels like a fully professional implementation built on open tooling.
What I Learned Along the Way
- Jellyfin is infinitely capable, but you need to understand its philosophy: it expects clean, structured input and assumes you’ll automate the rest.
- Docker isolates complexity. Once your volumes are right, you can experiment in safety — break Jellyfin, rebuild it in seconds, never lose data.
- Automation is the difference between a hobby and a system. The moment the EPG became self‑updating, the setup stopped being fragile.
- Frustration is part of the fun. I lost patience more than once — including snapping at AI (sorry, GPT) — but those moments meant I was learning real internals.
Closing Thoughts
Building this wasn’t just about seeing what’s on TV — it was about understanding how the data gets there.
By combining web scraping, structured data mapping, and Jellyfin’s open architecture, I ended up with a self‑sufficient broadcast system tailored exactly to my environment.
And the best part? It’s mine — open, transparent, and tweakable in every way.
Each EPG refresh reminds me that open‑source isn’t just about code; it’s about curiosity, iteration, and perseverance.
If this sounds like too much work, it probably is.
But if you’re the kind of person who looks at a static interface and wonders “could I make that better?” — then you already get why I built it.