headline/README.md
2022-08-25 17:07:56 +02:00

29 lines
1.8 KiB
Markdown

# Headline
Monitor how article titles are changed over time on news websites.
___
This tool is probably not production ready beacause it was written in two afternoons by an amateur (I'm not a professional programmer). If you want to run it, at least put a reverse proxy between it and public network or run it locally.
I did't do any research on legality of analysing RSS feeds and it's possible you can get into legal issues by presenting the outcomes publicly.
___
## Architecture
The "processor" script will fetch rss feeds configured in `processor/config.yaml` every 5 minutes (configured in `processor/crontab`), store the article in Redis and compare new/old articles to find changes in title.
When change is found, it generates nice visual diff and stores it with other information (detection time, article link, new/old title, etc.) in permanent database (sqlite3 for now).
The "view" script is reading data from the permanent database (sqlite3) and presents it to the user.
## Installation
Run `docker-compose up -d` and everything should start. You can change ./processor/config.yaml to edit rss sources.
After first start, you have to wait for ~5mins for the "processor" to create first empty database. The webserver will throw error until then.
## to-do
* Collect creation time of orig/new article, write it to permanent storage (sqlite3 for now) and display it.
* Write better readme and little more docs.
* Create view with some more info and stats (list of feeds, articles in redis, etc.)
* Create a routine to clear old articles from Redis (otherwise it will just fill up the disk space at some point...)
* IDEA: Figure out how to monitor changes in article description (maybe just compare hashes?) and how to present them. (Right now, the code can store descriptions in redis, but nothing else)