SQL like group by and sum for text files in command line?

1,373

Solution 1

In AWK, you could do something like this:

awk '($1 == last) || (last == "") {sum += $2}
     ($1 != last) && (last != "") {print last " " sum; sum = $2}
                                  {last = $1}
     END                          {print last " " sum}' huge_text_file.txt

Solution 2

This type of queries are most easily and flexibly done in SQL. You can use SQLet and SQLite for that purpose:

sqlet.py -d' ' -A file1.txt 'select a1, sum(a2) from A group by a1' | sqlite3

Solution 3

You could a use a few lines of Lua to acheive this. Lua is available on a wide range of platforms including Windows and Linux.

-- Quick and dirty - no error checking, unsorted output

io.input('huge_text_file.txt')

results = {}

for line in io.lines() do
    for text, number in string.gmatch(line, '(%w+)%s+(%d+)') do
        results[text] = (results[text] or 0) + number
    end
end

for text, number in pairs(results) do
    print(text, number)
end

You can sort the output using any sort utility or a few more lines of Lua.

Share:
1,373

Related videos on Youtube

RAM
Author by

RAM

Updated on September 17, 2022

Comments

  • RAM
    RAM over 1 year

    I have a LinkButton in aspx page.

    <asp:TextBox ID="textBoxNote" runat="server" />
        <asp:LinkButton ID="linkButtonUpdateNote" Text="Update" OnClick="ButtonUpdateNoteClicked" runat="server" />  
    

    the click event handler has the following code

     protected void ButtonUpdateNoteClicked(object sender, EventArgs e)
            {
                var note = textBoxNote.Text;
            }
    

    On Postback textBoxNote.Text is empty. I don't get the posted value. How to get the value?

    • Tim Schmelter
      Tim Schmelter over 11 years
      Does the TextBox gets an initial value somewhere? Remember to do that only on the first load and not on postbacks. Otherwise you would override the value always. The initial value does not need to be set on postbacks if you have enabled ViewState(default).
    • RAM
      RAM over 11 years
      I set the value to the textbox only on if it is not postback.
    • chbrown
      chbrown over 8 years
      Duplicate question at stackoverflow.com/q/10286522/424651 with similar answers, but the top-voted answer there is more concise than the accepted answer below (both are awk programs).
  • dnkb
    dnkb about 14 years
    Thanks so much! I had to tweak a little to use TAB as delimiters but it got me the kickstart!
  • Dennis Williamson
    Dennis Williamson about 14 years
    @dnkb: Using a comma in your print statements allows you to set the output field separator and then print will use it automatically. This may be what you did: awk 'BEGIN {OFS="\t"} ($1 == last || last == "") {sum += $2} ($1 != last && last != "") {print last, sum; sum = $2} {last = $1} END {print last, sum}' huge_text_file.txt