Python Regular Expressions, find Email Domain in Address

49,058

Solution 1

Here's something I think might help

import re
s = 'My name is Conrad, and [email protected] is my email.'
domain = re.search("@[\w.]+", s)
print domain.group()

outputs

@gmail.com

How the regex works:

@ - scan till you see this character

[\w.] a set of characters to potentially match, so \w is all alphanumeric characters, and the trailing period . adds to that set of characters.

+ one or more of the previous set.

Because this regex is matching the period character and every alphanumeric after an @, it'll match email domains even in the middle of sentences.

Solution 2

Ok, so why not use split? (or partition )

"@"+'[email protected]'.split("@")[-1]

Or you can use other string methods like find

>>> s="[email protected]"
>>> s[ s.find("@") : ]
'@gmail.com'
>>>

and if you are going to extract out email addresses from some other text

f=open("file")
for line in f:
    words= line.split()
    if "@" in words:
       print "@"+words.split("@")[-1]
f.close()

Solution 3

Using regular expressions:

>>> re.search('@.*', test_string).group()
'@gmail.com'

A different way:

>>> '@' + test_string.split('@')[1]
'@gmail.com'

Solution 4

You can try using urllib

from urllib import parse
email = '[email protected]'
domain = parse.splituser(email)[1]

Output will be

'mydomain.com'

Solution 5

Just wanted to point out that chrisaycock's method would match invalid email addresses of the form

herp@

to correctly ensure you're just matching a possibly valid email with domain you need to alter it slightly

Using regular expressions:

>>> re.search('@.+', test_string).group()
'@gmail.com'
Share:
49,058

Related videos on Youtube

PatentDeathSquad
Author by

PatentDeathSquad

Hobbyist not too good at programming.

Updated on July 09, 2022

Comments

  • PatentDeathSquad
    PatentDeathSquad almost 2 years

    I know I'm an idiot, but I can't pull the domain out of this email address:

    '[email protected]'
    

    My desired output:

    '@gmail.com'
    

    My current output:

    .
    

    (it's just a period character)

    Here's my code:

    import re
    test_string = '[email protected]'
    domain = re.search('@*?\.', test_string)
    print domain.group()
    

    Here's what I think my regular expression says ('@*?.', test_string):

     ' # begin to define the pattern I'm looking for (also tell python this is a string)
    
      @ # find all patterns beginning with the at symbol ("@")
    
      * # find all characters after ampersand
    
      ? # find the last character before the period
    
      \ # breakout (don't use the next character as a wild card, us it is a string character)
    
      . # find the "." character
    
      ' # end definition of the pattern I'm looking for (also tell python this is a string)
    
      , test string # run the preceding search on the variable "test_string," i.e., '[email protected]'
    

    I'm basing this off the definitions here:

    http://docs.activestate.com/komodo/4.4/regex-intro.html

    Also, I searched but other answers were a bit too difficult for me to get my head around.

    Help is much appreciated, as usual. Thanks.

    My stuff if it matters:

    Windows 7 Pro (64 bit)

    Python 2.6 (64 bit)


    PS. StackOverflow quesiton: My posts don't include new lines unless I hit "return" twice in between them. For example (these are all on a different line when I'm posting):

    @ - find all patterns beginning with the at symbol ("@") * - find all characters after ampersand ? - find the last character before the period \ - breakout (don't use the next character as a wild card, us it is a string character) . - find the "." character , test string - run the preceding search on the variable "test_string," i.e., '[email protected]'

    That's why I got a blank line b/w every line above. What am I doing wrong? Thx.

  • PatentDeathSquad
    PatentDeathSquad about 13 years
    Thanks for the response. Why regex and not regular string methods? I have 40 megs of string with email addresses intermingled with junk text that I am trying to extract. I'm a hobbyist programmer and I try to keep things simple and to play with the regex so I could understand it, so I didn't go into it here. Sorry if that was confusing.
  • PatentDeathSquad
    PatentDeathSquad about 13 years
    Ahh. I see I needed the other '.' Thanks!! (not sure why though).
  • Rachel Shallit
    Rachel Shallit about 13 years
    @AquaT33nFan: "@*" means 0 or more occurrences of "@". "@.*" means one occurrence of "@" followed by 0 or more occurrences of any character (except a newline). In other words, * here is a Kleene star, not a wildcard.
  • Dan Yishai
    Dan Yishai over 2 years
    The splituser function is deprecated. bugs.python.org/issue35891

Related