What is a User-Agent?
A user-agent is a string of text, numbers, and symbols that specify the details of a client accessing a server. The client could be a browser, crawler bot, scraper bot, email client, download manager, or web app.
The server, on the other hand, houses the resources that the clients want to access. For example, a web server contains the HTML, CSS, and JavaScript files, images, and other media content that make up a webpage.
When a client accesses a server, it sends an HTTP request header containing its user-agent string. The specific information included in the user-agent string varies by client but typically includes:
- The client’s application name
- The application’s version
- The operating system and its version
- The device type (such as mobile or desktop)
For example, this is a browser’s user-agent string:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
This is what the same user-agent string looks like in the HTTP request header the browser sends to a web server:
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Importance of User-Agents
Just as we would want to identify strangers who visit our homes, servers use user-agents to recognize the clients visiting them. This becomes helpful in multiple situations, like when a server needs to identify, apply specific configurations, or determine the sort of content to deliver to the client.
This makes user-agents crucial for analytics and tracking. For example, some servers use the user-agent to identify the type of device and then return a mobile page if the device is a mobile device or a desktop site if it is a desktop.
Many websites also set up robots.txt files that specify the crawler bots that can or cannot crawl their site. In this case, a crawler bot like Googlebot o Binbot will access the robots.txt file to see whether it contains rules instructing them not to crawl the site.
User-agent strings are also helpful for troubleshooting issues the visitor encountered while trying to access the server. In this case, it allows developers to diagnose the configuration issues and fix bugs that may have prevented a client from accessing a resource on the server.
Limitation of the User-Agent
User-agents are not a reliable method of identifying the client that accesses your server. Some browsers, bots, and download managers may use a fake user-agent string to pose as another client.
This is called user-agent spoofing and is quite common, especially with search engine crawlers. Many bad bots misidentify themselves by pretending to be Googlebot or Bingbot. This allows them to crawl your site and use up your bandwidth without you noticing.
User-agents do not also identify the visitor accessing your site. So, you cannot tell who the user is from their user-agent, even if it is accurate and was not spoofed. It is even normal for multiple people to have the same user-agent string.