Building an Intelligent RSS News Aggregator for Discord
How I built a sophisticated RSS news aggregation system with AI-powered summaries, feed diversity, and smart deduplication for my Discord bot
The Problem
A friend of mine created a useful RSS feed to aggregate a bunch of San Diego news feeds. My instinct was to use my discord bot to post the content to a new channel in discord, which worked, but generated a ton of noise.

Starting Simple: Direct RSS Posting
Initial Implementation
Before I glance over the initial solution, there is some cool stuff there. To make it work, I created a new slash command to register a feed for a server. Since my discord bot lives in multiple servers, this would register the rss feed on a per channel, per server basis. I implemented a cron job on the box which runs my discord bot (a raspberry pi on my home network) that checks every 10 minutes and posts new feed items to the channel if there is anything new (comparing to a saved list of already posted feed items). Like I said this worked, using a simple ID based system to prevent duplicate posts, but it wasn't very sophisticated and was also quite a lot of posts every 10 minutes.
Architecture:
User runs /news add
↓
Bot stores feed config in app_state.json
↓
Background poster (Docker service) runs every 10 minutes
↓
Fetches RSS feed with feedparser
↓
Posts new items as Discord embeds
↓
Tracks seen item IDs to prevent duplicates
Key Files:
-
bot/app/commands/news/news.py- Slash command handlers -
bot/app/tasks/rss_feed_poster.py- Background polling service -
bot/app/app_state.json- Persistent state storage
Evolution: From Spam to Signal
The "Too Many Articles" Problem
With a sea of articles riddling the newly created news channel, it was not very useful. This was evident in how the other server members quickly muted the new channel. I realized I had to change this if I wanted it to be useful to anyone. To fix this, with some feedback from the friend with the RSS feed, I decided to implement something a bit more sophisticated, using LLMs to help aggregate and summarize content into a few smaller updates throughout the day.
Designing the Summary System
The big tradeoff here was collecting and filtering news without overwhelming my OpenAi API budget, while also making meaningful news summaries. For this reason I went with a two-stage pipeline. First, collect posts throughout the day, then later when its time to post the summary, I would parse the stored entries to make a update.
Stage 1: Collection (rss_feed_poster.py)
-
Runs every 10 minutes
-
Collects new articles from all feeds
-
Stores in
pending_news.json -
Tracks what's been seen
Stage 2: Summarization (rss_summary_poster.py)
-
Runs on a configurable schedule (defaults to 8am/8pm Pacific)
-
Processes pending articles
-
Generates AI summary
-
Posts to Discord
-
Clears pending articles
Why two stages?
-
Decouples data collection from presentation
-
Allows flexible scheduling
-
Enables batching for better AI summaries
-
Reduces API costs (fewer LLM calls)
Direct Mode:
For some RSS feeds that don't update frequently, I added a fallback method that will still post every 10 minutes.
-
Good for: Breaking news, time-sensitive feeds
-
Posts articles as they arrive
-
No AI processing
Summary Mode: Scheduled aggregation (new default)
-
Good for: General news, opinion pieces, most content
-
Collects articles throughout the day
-
Generates contextual summary at scheduled times
-
Reduces noise, increases signal
Building the AI Summarization Pipeline
The goal of the AI summarization pipeline is to get the signal of "what's going on" from multiple similar stories, while avoiding repeatedly posting the same stories over and over. We also want to make sure the stories are relevant to the specific channel or topic, we we needed a way to cut out stories that might be unrelated to the desired topic.
Step 1: URL Filtering
The first step is really just to eliminate an identical stories. This cuts down on noise and reduces the amount of LLM summaries and filtering calls we'll need to make.
Initial Limits:
-
Start with up to 50 most recent articles
-
Configurable per channel
Feed-Specific Filters:
-
Allow custom filter instructions per feed
-
Example: "only San Diego articles" for broad regional feeds
-
AI evaluates each article against filter criteria
URL Deduplication:
-
Same article from different feeds? Keep only one
-
Prevents redundant processing
Step 2: Relevance Ranking
LLM "scoring" is a pretty flawed concept, but with so many news stories coming in all the time, I wanted to implement some kind of sorting system. The LLM doesn't do a terrible job at this, as long as you recognize it's not giving very scientific scoring. Its just a nice gut-check for which stories are more important than others.
AI-Powered Scoring:
-
Evaluates each article for importance (1-10 scale)
-
Considers: timeliness, significance, local relevance
-
Narrow down to top N articles (default: 18)
-
Configurable per channel
Step 3: Story Clustering
Next we identify stories that are similar in content and write a summary about all of them. This allows us to capture details from multiple sources about the same story instead of repeating the story or leaving out details from different sources.
Similarity Detection:
-
AI identifies articles about the same story
-
Groups them into clusters
-
Each cluster = one story
-
Generates single summary per story instead of per article
Example:
Fire breaks out in downtown (CBS8)
Downtown building evacuated (Union-Tribune) } → "Downtown Fire" cluster
Fire contained, no injuries (Fox5)
Step 4: Deduplication Against History
With the clustered and summarized stories in place, we'll do another check, comparing this summary to all the ones posted in the rolling window, s
Story History System:
-
Tracks summaries posted in last N hours (default: 24)
-
Configurable time window per channel (6-168 hours)
-
AI compares new stories against recent history
-
Filters out stories already covered
Why this matters:
-
Breaking news continues for days
-
This prevents a sense of "we already told you this yesterday"
-
User can adjust based on channel purpose
Step 5: Final Summary Generation
Now we're ready to post the full news summary for this period of time. At this stage we have the LLM review the story clusters and post one update of all the stories, then clear out the pending stories from the state to allow the system to start collecting more stories for the next update
Contextual Summarization:
-
Groups all remaining stories
-
Generates cohesive narrative
-
Includes relevant details from all sources
-
Natural language, not bullet points
Advanced Features
Feed Diversity & Fairness
After creating several channels with multiple feeds, it was apparent that some feeds post so often, they drown out other feeds in the same channel. To help regulate how much space in a given news update one feed can take up, I created this feed diversity solution. Here's how it works.
-
Each channel already has a limited number of stories that will be collected from feeds in the pending_stories state (default: 50)
-
Each channel can be configured with a min and max number of stories per feed to be added to pending_stories
-
As stories are collected from feeds, the minimum will be collected from each feed first, up to the maximum per feed
-
If a feed hits the maximum amount, no more stories will be added to pending_stories from that feed until the pending stories are cleared
-
If there are empty slots left after all remaining feeds are exhausted, the remaining slots in pending_stories will be filled from feeds that have extra, using a round-robin approach
Example: News Channel with 5 feeds
Collection Limit of 50 stories
Minimum per feed is 2 stories
Maximum per feed is 8 stories
Feed 1: 125 new stories
Feed 2: 8 new stories
Feed 3: 2 new stories
Feed 4: 12 new stories
Feed 5: 220 new stories
(Without distribution, its possible that all stories would come from Feed 1 and no other feeds would be represented in the update)
With distribution using the values mentioned above, we would first collect 2 stories from each feed (10 total). Feed 1, 2, 4, and 5 still have leftover stories, so we'll collect up to the maximum for each: 6 from each feed (24 total). Of our 50 story maximum, we still have 16 "slots" open, so we'll pull one story from each feed in a round-robin style until we've reached the limit. This way we end up with the following distribution:
Feed 1: 2 + 6 + 6 = 14/125 stories
Feed 2: 2 + 6 + 0 = 8/8 stories
Feed 3: 2 + 0 + 0 = 2/2 stories
Feed 4: 2 + 6 + 4 = 12/12 stories
Feed 5: 2 + 6 + 6 = 14/220 stories
(up to min + up to max + round robin additions = total stories)
Configuration:
/news diversity configure
strategy:balanced
max_per_feed:4
min_per_feed:1
On-Demand Summaries
Sometimes we want an ad-hoc summarization of pending stories. This is helpful with testing and for very busy feeds with lots of stories.
The /news summary Command:
-
Generate summary immediately
-
Don't wait for scheduled time
-
Useful for checking "what did I miss?"
-
Clears pending articles after posting
Article Browsing
It's also helpful to see what posts are in the pending stories before they are summarized. It can be entertaining to browse them and also helps validate that story collection is working as expected.
The /news latest Command:
-
Paginated view of pending articles
-
Filter by specific feed
-
See what's been collected
-
Decide if you want to trigger summary early
This bot is running in production and serving daily news summaries to my Discord community. If you're interested in the code or want to discuss the architecture, feel free to reach out. If you want to add a feature rich discord bot with tons of AI features, you can download and run your own copy of the bot here.