Best way to parse RSS/Atom feeds with PHP
Solution 1
Your other options include:
Solution 2
I've always used the SimpleXML functions built in to PHP to parse XML documents. It's one of the few generic parsers out there that has an intuitive structure to it, which makes it extremely easy to build a meaningful class for something specific like an RSS feed. Additionally, it will detect XML warnings and errors, and upon finding any you could simply run the source through something like HTML Tidy (as ceejayoz mentioned) to clean it up and attempt it again.
Consider this very rough, simple class using SimpleXML:
class BlogPost
{
var $date;
var $ts;
var $link;
var $title;
var $text;
}
class BlogFeed
{
var $posts = array();
function __construct($file_or_url)
{
$file_or_url = $this->resolveFile($file_or_url);
if (!($x = simplexml_load_file($file_or_url)))
return;
foreach ($x->channel->item as $item)
{
$post = new BlogPost();
$post->date = (string) $item->pubDate;
$post->ts = strtotime($item->pubDate);
$post->link = (string) $item->link;
$post->title = (string) $item->title;
$post->text = (string) $item->description;
// Create summary as a shortened body and remove images,
// extraneous line breaks, etc.
$post->summary = $this->summarizeText($post->text);
$this->posts[] = $post;
}
}
private function resolveFile($file_or_url) {
if (!preg_match('|^https?:|', $file_or_url))
$feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url;
else
$feed_uri = $file_or_url;
return $feed_uri;
}
private function summarizeText($summary) {
$summary = strip_tags($summary);
// Truncate summary line to 100 characters
$max_len = 100;
if (strlen($summary) > $max_len)
$summary = substr($summary, 0, $max_len) . '...';
return $summary;
}
}
Solution 3
With 4 lines, I import a rss to an array.
$feed = implode(file('http://yourdomains.com/feed.rss'));
$xml = simplexml_load_string($feed);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
For a more complex solution
$feed = new DOMDocument();
$feed->load('file.rss');
$json = array();
$json['title'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$json['description'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$json['link'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('link')->item(0)->firstChild->nodeValue;
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['item'] = array();
$i = 0;
foreach($items as $key => $item) {
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$guid = $item->getElementsByTagName('guid')->item(0)->firstChild->nodeValue;
$json['item'][$key]['title'] = $title;
$json['item'][$key]['description'] = $description;
$json['item'][$key]['pubdate'] = $pubDate;
$json['item'][$key]['guid'] = $guid;
}
echo json_encode($json);
Solution 4
I would like introduce simple script to parse RSS:
$i = 0; // counter
$url = "http://www.banki.ru/xml/news.rss"; // url to parse
$rss = simplexml_load_file($url); // XML parser
// RSS items loop
print '<h2><img style="vertical-align: middle;" src="'.$rss->channel->image->url.'" /> '.$rss->channel->title.'</h2>'; // channel title + img with src
foreach($rss->channel->item as $item) {
if ($i < 10) { // parse only 10 items
print '<a href="'.$item->link.'">'.$item->title.'</a><br />';
}
$i++;
}
Solution 5
If feed isn't well-formed XML, you're supposed to reject it, no exceptions. You're entitled to call feed creator a bozo.
Otherwise you're paving way to mess that HTML ended up in.
carson
I started developing software on a small Casio basic calculator in 1987. I graduated to C, then C++ and finally Java. Although I work daily with Java I have a wide base of experience in a lot of other languages and technologies as well. Reading my blog will give you a good idea of some of the more recent trails I've explored. I have created a few Stack Exchange related projects: Stack Exchange Firefox Plugin Stack Exchange Javascript Widget Stack Exchange Java library You can find some of my work other places as well: Github Twitter LinkedIn Google Analytics Wordpress Plugin Ruby Ming Gem
Updated on May 07, 2020Comments
-
carson about 4 years
I'm currently using Magpie RSS but it sometimes falls over when the RSS or Atom feed isn't well formed. Are there any other options for parsing RSS and Atom feeds with PHP?
-
Helen Neely over 14 years+1, you should not try to work around any XML that is not well-formed. We've had bad experiences with them, trust me, it was big pain :(
-
artur over 14 years
-
Talvi Watia almost 14 yearsyou have an end-tag with no start tag. ;)
-
Brian Cline almost 14 yearsWell, I had one, but it was being eaten by SO's code formatter since it had no empty line above it. On a related note, you did not start your sentence with a capital letter. ;)
-
Kevin Pastor about 13 yearsHowever, programmers do not get to choose business partners and have to parse what they are given.
-
duality_ almost 13 yearsI don't like such "answers", giving links without any comments. Looks like you google it and link to a few top results. Especially since the asker has some RSS experience and needs a better parser.
-
Tim over 12 yearsPlease change
$feed_uri = $feed_or_url;
to$feed_uri = $file_or_url;
... other than that, thank you for this code! It works great! -
András Szepesházi almost 12 yearsNote that while this solution is great, it'll only parse RSS feeds in it's current form. Atom feeds will not be parsed due to their different schema.
-
ITS Alaska about 11 years
-
yPhil almost 11 yearsWhat if you're building an universal RSS/Atom feed reader ? If any ill-formed xml file can "mess" your HTML, who is the Bozo ? ;) Be liberal in what you receive.
-
vladkras over 10 yearsI don't understand what is
cookHtmlSummarySoup()
for? whay not usestrip_tags()
? -
Brian Cline over 10 years@ITSAlaska Thanks for the reminder. I think even back when I posted this in 2008 it was old code. I've updated it with preg_match accordingly.
-
Brian Cline over 10 years@vladkras Good question. Not sure where that wacky method name came from, looks like someone here edited it. I much prefer a built-in, so I've updated this to use strip_tags(). Thanks for the tip.
-
samayo over 10 yearsI just tried it. It does not give an array
-
PJunior over 10 yearscan u give me the rss feed that u are using?
-
andrewk about 10 yearsIn case you're wondering. It looks like he's using a tumblr rss feed. Anytumblrsite.com/rss would give you the same output.
-
Raptor about 10 yearsIn case somebody needs a little bit advice, Last RSS is the easiest among the three listed above. Only 1 file to "require", and can fetch the RSS within 5 lines, with a decent array output.
-
Guidouil about 10 yearsUsed the 4 lines, did a great job :) but then I rewrote the 1st line :
$feed = file_get_contents('http://yourdomains.com/feed.rss');
might be less intensive than file + implode -
Will Bowman almost 10 yearscant say its "great" using gzinflate and base64_decode, typically disabled for security.
-
Will Bowman almost 10 yearsone line, $feed = json_decode(json_encode(simplexml_load_file('news.google.com/?output=rss')), true);
-
Fluchtpunkt about 9 yearsi really like the one-liner - was looking for something like that - what about error-handling?
-
gadelat about 7 yearspicoFeed github.com/fguillot/picoFeed
-
noob about 7 yearsI've used two of them and LastRss seems not good enough providing a fully functional helper and SimplePie is a bit too complicated. I would like to try some others but comments to those libs are better for people to understand, not just links.
-
musicin3d about 6 yearsWhy on earth are we converting an object into an array???
-
John T over 4 yearsClear and simple solution! Works nicely.
-
Sagive almost 4 yearsit's a dead link for marketing porpuses.
-
Srinivas08 over 3 yearsrather than using $xml = simplexml_load_string($feed), this works pretty simple, in printing the data too ...