Using regular expression in spark
Your regex will only match with word that are composed by a lowercase and then by an uppercase. i.e. aA, bA, rF etc. So it shouln't discard any of the components of your list.
So you may want to change it to this:
[a-zA-Z]*
So it will match any word composed only by letters (both lower and uppercase)
Then regarding the match issue, you are using the wrong method, the proper one to match a regex is something like this:
val pattern = """[a-zA-Z]*""".r
val filtered = rdd.filter(line => !pattern.pattern.matcher(line).matches)
And here the output:
scala> filtered.foreach(println)
12
13
You can check the API for regex here
Related videos on Youtube
Abhishek Allamsetty
Updated on June 04, 2022Comments
-
Abhishek Allamsetty almost 2 years
I have trouble in using regular expression. My sample data is:
12 13 hello hiiii hhhhh this doesnt have numeric so should be removed Even this line should be excluded `12` this line contains numeric shouldn't exclude Hope even this line should be excluded scala> val pattern = "[a-z][A-Z]".r
pattern: scala.util.matching.Regex = [a-z][A-Z]
scala> val b = a.filter(line => !line.startsWith(pattern)) <console>:31: error: type mismatch;
found : scala.util.matching.Regex required: String val b = a.filter(line => !line.startsWith(pattern)) ^
or if I use
scala> val b = a.filter(line => !line.startsWith("[a-z][A-Z]".r)).take(3)
:29: error: type mismatch;
found : scala.util.matching.Regex
required: Stringval b = a.filter(line => !line.startsWith("[a-z][A-Z]".r)).take(3) ^
I am actually not sure how exactly to use regex in spark. Please help me out.
-
Arsinux about 6 yearsThis answer works!