Convert HTML output into a plain text using php

20,271

Solution 1

Use php strip_tags

If strip_tags is not working for then maybe you can use regex to extract the info you want.

Try using PHP preg_match with /(<td>.*?<\/td>)/ as the pattern

Solution 2

Have a look at simplexml_load_file():

http://www.php.net/manual/en/function.simplexml-load-file.php

It will allow you to load the HTML data into an object (SimpleXMLElement) and traverse that object like a tree.

Solution 3

try to use PHP function strip_tags

Solution 4

try this one,

<?php
$data = file_get_contents("your_file");
preg_match_all('|<div[^>]*?>(.*?)</div>|si',$data, $result);
print_r($result[0][0]);
?>

I have try this one, and it seems work for me, for you too i hope

Share:
20,271
Dan
Author by

Dan

Updated on July 15, 2022

Comments

  • Dan
    Dan almost 2 years

    I'm trying to convert my sample HTML output into a plain text but I don't know how. I use file_get_contents but the page which I'm trying to convert returns most like the same.

    $raw = "http://localhost/guestbook/profiles.php";
    $file_converted = file_get_contents($raw);
    echo $file_converted;
    

    profiles.php

    <html>
        <head>
            <title>Profiles - GuestBook</title>
            <link rel="stylesheet" type="text/css" href="css/style.css">
        </head>
    <body>
        <!-- Some Divs -->
        <div id="profile-wrapper">
            <h2>Profile</h2>
            <table>
                <tr>
                    <td>Name:</td><td> John Dela Cruz</td>
                </tr>
                <tr>
                    <td>Age:</td><td>15</td>
                </tr>
                <tr>
                    <td>Location:</td><td> SomewhereIn, Asia</td>
                </tr>
            </table>
        </div>
    </body>
    </html>
    

    Basically, I trying to echo out something like this (plain text, no styles)

    Profile
    Name: John Dela Cruz
    Age: 15
    Location: SomewhereIn, Asia
    

    but i don't know how. :-( . Please help me guys , thank you in advance.

    EDIT: Since i am only after of the content of the page, no matter if it's styled or just a plain text , is there a way to select only (see code below) using file_get_contents() ?

     <h2>Profile</h2>
            <table>
                <tr>
                    <td>Name:</td><td> John Dela Cruz</td>
                </tr>
                <tr>
                    <td>Age:</td><td>15</td>
                </tr>
                <tr>
                    <td>Location:</td><td> SomewhereIn, Asia</td>
                </tr>
            </table>