How to read and parse html in Nodejs?

11,067

I would recommend using Cheerio. It tries to implement jQuery functionality to Node.js.

const cheerio = require('cheerio')

var html = "<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often be the cause of confusion and weird errors that are hard to debug.<br />If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />For example:</p>"

const $ = cheerio.load(html)
var paragraph = $('p').html(); //Contents of paragraph. You can manipulate this in any other way you like

//...You would do the same for any other element you require

You should check out Cheerio and read its documentation. I find it really neat!

Edit: for the new part of your question

You can iterate over every element and insert it into an array of JSON objects like this:

var jsonObject = []; //An array of JSON objects that will hold everything
$('p').each(function() { //Loop for each paragraph
   //Now let's take the content of the paragraph and put it into a json object
    jsonObject.push({"paragraph":$(this).html()}); //Add data to the main jsonObject    
});

So the resulting array of JSON objects should look something like this:

[
  {
    "paragraph": "text"
  },
  {
    "paragraph": "text 2"
  },
  {
    "paragraph": "text 3"
  }
]

I believe You should also read up on JSON and how it works.

Share:
11,067
Kucka Prozova
Author by

Kucka Prozova

Updated on June 04, 2022

Comments

  • Kucka Prozova
    Kucka Prozova almost 2 years

    I have a simple project. I need the help this is a related project. I need to read an HTML file and then convert it to JSON format. I want to get the matches as code and text. How I achieve this?

    In this way, I have two HTML tags

    <p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often is the cause of confusion and weird errors that are hard to debug.<br />
    If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />
    For example:</p>
    
    <pre><code class="{python} language-{python}">a_var = 2
    
    def a_func(some_var):
        return 2**3
    
    a_var = a_func(a_var)
    print(a_var)
    </code></pre>
    

    mycode:

    const fs = require('fs')
    const showdown  = require('showdown')
    
    var read =  fs.readFileSync('./test.md', 'utf8')
    
    function importer(mdFile) {
    
        var result = []
        let json = {}
    
        var converter = new showdown.Converter()
        var text      = mdFile
        var html      = converter.makeHtml(text);
    
        for (var i = 0; i < html.length; i++) {
            htmlRead = html[i]
            if(html == html.match(/<p>(.*?)<\/p>/g))
                json.text = html.match(/<p>(.*?)<\/p>/g)
    
           if(html == html.match(/<pre>(.*?)<\/pre>/g))
                json.code = html.match(/<pre>(.*?)<\/pre>/g
    
        }
    
        return html
    }
    console.log(importer(read))
    

    How do I get these matches on the code?

    new code : I write all the p tags in the same json, how to write each p tag into different json blocks?

    $('html').each(function(){
        if ($('p').text != undefined) {
            json.code = $('p').text()
            json.language = "Text"
        }
    })
    
  • Kucka Prozova
    Kucka Prozova over 5 years
    Yeah, that's exactly what I did. But I have a question, I write all the p tags in the same json, how to write each p tag into different json blocks? I updated question.
  • Ayo Reis
    Ayo Reis over 2 years
    Does anyone know a only JS alternative?