SQL like group by and sum for text files in command line?
1,373
Solution 1
In AWK, you could do something like this:
awk '($1 == last) || (last == "") {sum += $2}
($1 != last) && (last != "") {print last " " sum; sum = $2}
{last = $1}
END {print last " " sum}' huge_text_file.txt
Solution 2
This type of queries are most easily and flexibly done in SQL. You can use SQLet and SQLite for that purpose:
sqlet.py -d' ' -A file1.txt 'select a1, sum(a2) from A group by a1' | sqlite3
Solution 3
You could a use a few lines of Lua to acheive this. Lua is available on a wide range of platforms including Windows and Linux.
-- Quick and dirty - no error checking, unsorted output io.input('huge_text_file.txt') results = {} for line in io.lines() do for text, number in string.gmatch(line, '(%w+)%s+(%d+)') do results[text] = (results[text] or 0) + number end end for text, number in pairs(results) do print(text, number) end
You can sort the output using any sort utility or a few more lines of Lua.
Related videos on Youtube
Author by
RAM
Updated on September 17, 2022Comments
-
RAM over 1 year
I have a LinkButton in aspx page.
<asp:TextBox ID="textBoxNote" runat="server" /> <asp:LinkButton ID="linkButtonUpdateNote" Text="Update" OnClick="ButtonUpdateNoteClicked" runat="server" />
the click event handler has the following code
protected void ButtonUpdateNoteClicked(object sender, EventArgs e) { var note = textBoxNote.Text; }
On Postback textBoxNote.Text is empty. I don't get the posted value. How to get the value?
-
Tim Schmelter over 11 yearsDoes the
TextBox
gets an initial value somewhere? Remember to do that only on the first load and not on postbacks. Otherwise you would override the value always. The initial value does not need to be set on postbacks if you have enabledViewState
(default). -
RAM over 11 yearsI set the value to the textbox only on if it is not postback.
-
chbrown over 8 yearsDuplicate question at stackoverflow.com/q/10286522/424651 with similar answers, but the top-voted answer there is more concise than the accepted answer below (both are awk programs).
-
-
dnkb about 14 yearsThanks so much! I had to tweak a little to use TAB as delimiters but it got me the kickstart!
-
Dennis Williamson about 14 years@dnkb: Using a comma in your print statements allows you to set the output field separator and then print will use it automatically. This may be what you did:
awk 'BEGIN {OFS="\t"} ($1 == last || last == "") {sum += $2} ($1 != last && last != "") {print last, sum; sum = $2} {last = $1} END {print last, sum}' huge_text_file.txt