Is there a good method for parsing the user-agent string?

35,912

Solution 1

For Java, take a look at User-Agent-Utils. It's fairly compact (< 50kB) and has no dependencies.

Note although the latest release is quite recent (1.21, released 2018-01-24), the library's page states:

Warning: This project is end-of-life and will not be updated regularly any longer

And on the github page it says:

EOL WARNING

This library has reached end-of-life and will not see regular updates any longer.

Version 1.21 was the last official release in 2018.

Solution 2

Have a look at the Java library I wrote for this purpose: Yauaa

I made a very simple servlet where you can try it out to see if it gives the answers you are looking for: https://try.yauaa.basjes.nl/

It is Apache 2 licensed and published into Maven so using it in a Java application is really easy. It is currently used in production on one of the busiest websites of the Netherlands (where I work).

See this blog about this https://techlab.bol.com/making-sense-user-agent-string/

Solution 3

  1. Is the structure of the User-Agent well defined? If yes - where can I find it exactly? (From my understanding of the RFC there is not much standardization here).

No, the structure of an User-Agent string is not standardized but is very similar between different agents. Although they are very similar, it is still necessary to use multiple patterns for detection.

  1. Assuming the question for #1 is No - is there a proper way to parse it to get the info I need?

You can try the library UADetector. It is a wrapper for the User-Agent-Database of user-agent-string.info.

  1. Is there a better way to get the info I need other than the User-Agent string?

I would not say it is a better or worse way, but another way to detect user agents is the client-side use of JavaScript to collect informations about the User-Agent and submitting it via hidden HTML inputs or XmlHttpRequest to your backend. It all depends on what you want to identify. For accurate detection of webcrawlers JavaScript won't be able to help.

Share:
35,912
RonK
Author by

RonK

Programming since 1993. Developing software since 2003. I love programming, I do it whenever and wherever I can. I am a java developer but I play with Scala and Python quite often. I mostly do "server side", but I like playing with client code when I have the chance. Doing big-data for the past few years - it's fun when you do stuff that isn't a word count ... Books I read and loved: The Pragmatic Programmer Clean Code Clean Coder Hitch Hiker's Guide to the Galaxy Movies I saw more than enough (and will continue to see!): Monty Pythons: The Holy Grail Space Balls The Princess Bride

Updated on February 20, 2020

Comments

  • RonK
    RonK about 4 years

    I have a Java module that receives the User-Agent string from an end user's browser needs to behave slightly differently depending on the type of browser, the version of the browser and maybe even the operating system. E.g.: {"FireFox", "7.0", "Win7"}, {"Safari", "3.2", "iOS9"}

    I understood that the User-Agent string can vary in its format for the exact same configuration due to different plug-in installations etc.

    My questions:

    1. Is the structure of the User-Agent well defined? If yes - where can I find it exactly? (From my understanding of the RFC there is not much standardization here).
    2. Assuming the question for #1 is No - is there a proper way to parse it to get the info I need?
    3. Is there a better way to get the info I need other than the User-Agent string?

    Important note - I'm talking about a web-app, so my data collection abilities are limited to javascript.