Print lines where first field has only four characters using regex in awk?

bash shell-script awk regular-expression

9,249

Solution 1

Fields in awk are per default delimited by "", this means $1 doesn't contain a space, so the correct regex for $1 is:

awk '$1 ~ /^[a-zA-Z0-9]{4}$/ {print}' file

If you want to keep your original approach you can also just use $0 instead, i.e:

awk '$0 ~ /^[a-zA-Z0-9]{4}\s/ {print}' file

To simplify things you can also use \w instead of explicitly defining word characters, i.e:

awk '$0 ~ /^\w{4}\s/ {print}' file

If you only want to match the space and not something else like TAB you just have to replace \s with "" (without the quotation marks).

Another issue with your original approach are the missing anchors. As you didn't specify either ^ nor $ your pattern can occur anywhere, i.e the pattern would match for Elizabeth Stachelin with beth.

Solution 2

In AWK, you can use regular expression as a pattern like BEGIN or END you often see in AWK script. A simplified code can be like

awk '/^[[:alnum:]]{4}\>/'

This is all you need to meet you needs. You do not need an action, {print} is the default action when a patten matched, which prints the entire record, i.e. the entire line.

[:alnum:] is a synonym to [a-zA-Z0-9] basically, depending on locale. You can also use \w—only it also includes underscore _, it's a shorthand of [[:alnum:]_]:

awk '/^\w{4}\>/'

\> matches the end of a word. By using it, you can match string like John:(###)... correctly, if you have records which do not contain the full names.

Although you are asking AWK, but I would suggest using sed, it runs almost twice as fast as AWK in the case:

sed -n '/^[[:alnum:]]\{4\}\b/p'

\b is \> or \< in AWK. I tested on a 500K lines, 100K lines matched, AWK took around 1.7 seconds, sed only took 0.9 seconds. But the test case is extreme, it's just a nitpick suggestion.

I would also suggest you read man 7 regex as well as man awk and info awk.

Solution 3

The first field is $1, and its length is length($1), so:

awk 'length($1) == 4 {print}'

or more succintly

awk 'length($1) == 4'

What you wrote doesn't work for two reasons. First, you have an extra " " in your regexp, so you're requiring that the fields contains double quote, space, double quote. If you fix that, you get /[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/, which matches a field that contains at least four ASCII letters or digits, but may contain more, so it will match Elizabeth as well as John, but not Tom. You can write /^[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]$/ to anchor the regexp at the start and end, but if what you're after is the length of the field, just write that.

9,249

Ezequiel

Updated on September 18, 2022

Comments

Ezequiel over 1 year

John Goldenrod:(916) 348-4278:250:100:175

Chet Main:(510) 548-5258:50:95:135

Tom Savage:(408) 926-3456:250:168:200

Elizabeth Stachelin:(916) 440-1763:175:75:300

output should contain the lines containing names with only four characters (john,chet) :

awk '$1 ~ /[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]" "/ {print}' file

this doesn't seem to work for me. can i do it without using any of the awk functions.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Use awk results as parameters to another command

Shell: Find and replace word

How to output some data to different cells of an Excel File?

AWK Compare Column 1 from Two Files Print append column to third in output

awk: line 1: syntax error at or near >

Add line to configuration file from a bash script?

How can I search a CSV file for a value, then get another value associated with it?

Using cut/awk/sed with two different delimiters

Modifying a shell variable with regex (bash)

Identifying duplicate fields and print both with awk