Home > Articles > How Tech Companies Track Users

How Tech Companies Track Users

This paper serves as an introduction to techniques used by companies to track their user base, while also providing additional info regarding the ramifications for technology users, the legality of tracking, and techniques to defend oneself from companies. The main motivation behind its development is to bring awareness to the hidden ways companies track their users throughout the internet and the broader world.

Another goal is to promote methods that can be used to hinder or prevent tracking of users by companies. Some web application consumers may not be inclined to remove tracking services, but it is important to be aware of such techniques in any case.

The authors’ contributions include introducing the issue and providing background information regarding the logic and justification of tracking users. Additionally, insight into four key methods companies use to track users is shared, including the use of HTTP Cookies, Tracking Pixels, Smart Devices, and Cross-Device Tracking. Some of these various strategies break down into smaller sub-methods, such as functional cookies, analytics cookies, advertising cookies, deterministic tracking, and probabilistic tracking. Each of these tracking methods will be explained in depth in the third section of the paper.

Continuing onward with contributions, the next addition to the paper involves key takeaways from these methods for web users. In other words, what does it mean for the user that companies track them? There are various detriments and positives to the online tracking experience, and they will be explored in further detail in section four.

Following risks of user tracking, section five will then discuss the legality of it. Several crucial laws will be highlighted to draw attention to certain areas or demographics that companies cannot exploit, in an effort to bring awareness to users.

Lastly, the paper sheds light regarding possible solutions to tracking that consumers can leverage. Possible options include opting out of personalized advertisements, utilizing an ad blocker or VPN, or using a DNS ad blocker such as a Pi-hole. Each method will be thoroughly explained in section six. Following that, section seven will conclude matters with closing remarks. Overall, this paper seeks to educate readers on the possible ways in which companies can track them on the internet, while arming them with defense mechanisms against such tactics.

Background

Large technology companies have a multitude of ways to track their user base through sites owned by them, and sites associated with them. The authors elected to write about this topic to bring awareness to the issue of user tracking, while also providing further information into its legality and possible solutions users may leverage.

Additionally, the user may wonder why companies engage in these sorts of data collecting actions. The main driver is revenue, which is a large motivator for any number of business decisions. For instance, Google’s parent company Alphabet “generated around $257 billion in revenue in the 2021 fiscal year. Of that $257 billion, nearly 81%, or $209 billion, stemmed from its Google Advertising product suite.” These products include Google Search, YouTube Ads, and Network Members sites. An even more striking example is Meta, whose primary products include social media giants Instagram, Facebook, and WhatsApp. “98.66% of their revenue in 2019 was driven by advertising, which demonstrates a clear motivation for the company to continue collecting and repurposing user data.”

At the same time, the user experience is also a contributing factor to companies desire to collect data. Consider a user who touches certain parts of the screen in an app on their phone. The developing company can use this information to streamline the app’s fluidity and increase user engagement, thus increasing revenue in tandem.

As data is collected by technology companies, it is grouped into profiles for each user detected. These profiles can then be leveraged to track users across multiple related devices, such as those found in the Apple ecosystem. A user can login to a website on their iPad before continuing onward on their Macbook or iPhone. Cross-device tracking will be examined closely in section three.

In addition to this form of tracking, companies can also hire mercenary third party companies who collect and sell user data. This may be particularly appealing to smaller companies who do not possess the resources to engage in their own data collection services.

Nearly all companies attempt to collect user data to support their business needs, with the notable exception of a few outliers such as DuckDuckGo. It is important to note, however, the aforementioned fact that not all companies can create a tracking system of their own volition. Outsourcing via third party companies is a popular alternative and a big reason why almost all corporations can collect some semblance of their user’s data.

Tracking Methods

There are many different methods that can be used to track users on the internet. The most commonly used one, and one with a longest history, is the usage of cookies to monitor and track users on the internet. Lou Montulli, the inventor of the cookie, alongside many other internet innovations, named the cookie after a “magic cookie”, defined as “something passed between routines or programs that enables the receiver to perform some operation; a capability ticket or opaque identifier.” And that was initially their intended use: they were first used to verify that a user had visited the Netscape site, enabled e-commerce websites to remember your cart, and allowed sites to remember preferences. However, this original intent started to unravel as companies discovered that you could also use this technology to label and identify users.

Overall, we can classify cookies into 3 general types: Functional cookies, Analytics cookies, and Advertising cookies. Functional cookies, such as a shopping cart, or user preferences, are necessary for site function and generally harmless. Analytical cookies are used by sites to monitor and understand user behavior to improve their site experience. However, advertising cookies are used by companies that post ads on sites, in order to track users across sites and target their advertisements to them.

The way that these cookies function is both simple and very smart. Most 3rd party advertising cookies consist only of a unique ID for that user. When a user visits a site that has ads from that provider running on it, the code that is used to fetch the ads first looks at the unique ID in the cookie to find the information that the company has on the user, then fetches what it believes to be the most relevant ad for that user. Finally, the code logs information about the site visit (such as an article title, site URL, or product name) and stores that information to help determine future ads.

Another commonly used tracking method is the tracking pixel, also known as a marketing pixel. This is called a pixel because it is typically a 1x1 image, which is embedded in sites or emails to track user behavior. When the graphic is loaded, a request is sent to the server where the image is stored. Based on the request, the site can then log information about the user that requested the pixel, similarly to how cookies work. Instead of receiving an image, the user is actually sending their data to that site.

Why would someone use a tracking pixel though? The answer is that unlike a cookie, which is relatively easy to block using anonymous browsing, users don’t typically get to choose whether or not to load a tracking pixel, unless they want to disable images across all websites that they browse. This is also why Outlook or other email clients don’t display images by default: most advertising and spam email has embedded tracking pixels that can be used to track engagement, or in the case of spam, confirm that a target has been reading their email and encourage them to continue sending emails.

Similarly to cookies, there are also different kinds of tracking pixels. The first is the retargeting pixel, which as mentioned above is primarily used by online advertisers to target ads to a user, resulting in the commonly observed behavior of ads “following” you across different sites and social media platforms. Another type is a conversion pixel. This type of pixel is primarily used to track the results of an ad campaign, and is typically embedded in order confirmation forms on sites to verify that a user has bought a product.

By combining these forms of tracking, along with some other forms that we won’t get to in this article, companies attempt to perform cross-device tracking, combining those previous forms of tracking to form a complete profile of you across all of your devices. Sometimes this could include your PC vs your laptop, or even smart TVs. Even more concerning, ads can follow you from your home computer to your work computer, resulting in potentially embarrassing or unexpected advertising.

There are two primary types of cross-device tracking: deterministic and probabilistic. Deterministic tracking uses unique user identifiers, such as user login data, to verify that two devices are being used by the same user. However, not all sites, and in fact almost every site on the web, doesn’t require you to log in. When the site doesn’t require a login, companies instead leverage AI/ML algorithms with data such as device type, operating system, user behavior, and IP address to infer whether devices belong to the same user.

One thing that is common to the previous methods is that they are primarily used on “free” sites that use advertising to sustain themselves. As a result of this, oftentimes users are relatively accepting of tracking being used, since they are not paying for the services themselves. However, one more controversial way that corporations track us is through the use of smart home devices purchased by users. Such devices include smart home speakers such as Amazon Alexa, IoT doorbells such as the Nest product suite, or even smart light bulbs like the Phillips Hue. They collect a wide range of data about users, ranging from obvious data such as video and audio data to smart TVs that monitor your viewing habits, smart light bulbs that track sleep and heart rate, and smart vacuum cleaners that record and store mapping information on your house. Though this information may not always be used for targeting advertising, there are other hazards to this information being collected. Companies such as Amazon may have workers listen to conversations with Alexa to manually transcribe and annotate them, which may make some users uncomfortable. Additionally, these devices represent points of vulnerability to cyber attacks. Not only can they be compromised locally, but in the case of a breach of one of the systems storing the user information, hackers could gain deeply compromising information from video and audio streams along with the rest of the stored data.

Risks and Implications for the User

The more user information tracked by companies increases, so does the risk of large amounts of collected data landing into the wrong hands, or being accessed without user permission. Data breaches, for example, can lead to collected information that companies have on their users being used for Identity theft, credit card fraud, financial losses, and more. In instances where hackers get access to names, addresses, emails, and phone numbers of a person, it can become extremely dangerous and lead to real life consequences. Vtech is a company that makes tablets and electronic devices designed for children. In 2015, the Hong Kong based company experienced a data breach that exposed thousands of children’s information; their names, genders, parents, and even addresses, were revealed. Even though Vtech later claimed to have everything under control. Data obtained from breaches is a valuable asset on the ‘dark web. For example, a verified username/email and password combination can be sold for a lot given that there is a chance the same combination can be used to unlock the target’s other accounts.

In 2006, the American Civil Liberty Union (ACLU) filed a case against the U.S government. They wanted to challenge the NSA’s warrantless wiretapping; a program introduced by the bush administration after the events of 9/11. The case was later dismissed under the state secrets privilege, making it hard to challenge invasion of privacy by the government.Government surveillance is one of the biggest concerns around information privacy discussions. It has been shown that companies collaborate with governments to provide their user’s information and communications.

For the most part, the NSA mostly cares about meta-data. That is, they want to see who sent the communication, to whom it was sent, when and from where it was sent. They use this to find suspicious communication, and surveil their targets. Although that information doesn’t sound like much, the NSA still keeps all the meta-data it gets up to a year. This can help governments build a map of people whose information was collected and figure out who belongs in which social circles, know their locations, and more private information.

Several methods are used to obtain this information from companies. National Security Letters, administrative subpoenas issued by the FBI, allow the government to request information from companies about their users’ communications. Sometimes though, companies are asked or even coerced into installing backdoors in their own systems for government access. One such case became public in February 2016, when the U.S government asked Apple to provide a backdoor to the Iphone, claiming they only needed access to one particular target’s information. The company refused to provide the technology, and posted a statement on their official website saying that “while the government may argue that its use would be limited to this case, there is no way to guarantee such control.”

Although governments claim to use surveillance for national security, it isn’t always the case. Sometimes, the targets of surveillance happen to be dissidents or critics of governments. In 2013, the Electronic Frontier Foundation( EFF) found spyware on Irina Petrusha’s devices. She was a Kazakhastanian journalist and political activist. The malware was later linked back to the government of Kazakhstan. She had become her government’s target after publishing evidence exposing cases of corruption in the country.

Laws and Privacy Protection

With the amount of risk involved in having our information collected, it is also important to know which laws protect users and what exactly they protect against. There are a few such as the Child Online Privacy Protection Act (COPPA), California Consumer Privacy Act (CCPA), ePrivacy directive, and General Data Protection Regulation (GDPR). GDPR is often described as” one of the biggest shake-ups of data protection laws in a generation.” It was put in place to protect citizens of the European Union’s personal data. That is, usernames, emails, phone numbers, and any sort of information that can be linked to an individual. Even though GDPR is a EU regulation, it extends beyond companies in the EU. Non-EU companies that process data from a EU citizen (this includes Google, Instagram, and many more), must also comply with GDPR. User’s information should be processed legally and the user must know what their data will be used for, and that must be communicated clearly in a way users can understand. Only necessary information should be recorded and it can only be kept for a necessary amount of time. Additionally, it requires companies to keep user data safe. under the GDPR, user consent became of paramount importance. It gives users the right to decide what information gets to be kept about them, and have it deleted when they wish.

The regulation became enforceable in May 2018, and there still isn’t enough data to show how big of an impact it has had so far. However, a few studies have shown that although the GDPR has increased overall transparency on the web, the standards it set aren’t fulfilled by the majority of affected parties. Many companies do not reveal whether they use Data At Rest Encryption, and fail to say whether they report data breaches to the right authorities.

Mitigation Techniques

Even though there are regulations that are working to protect us from abuse, the best way to defend from both legal and illegal attacks on your privacy is to take your privacy into your own hands. There is no way to completely prevent tracking without compromises, but the following solutions strike a good balance between prioritizing privacy and minimizing impact to your internet browsing.

One common solution to help block tracking is to install an ad-blocking extension on your web browser of choice. Some browsers, such as Firefox or Brave, have integrated ad and tracker blocking technology, whereas with Chrome you will need to install an extension to block ads for you. This will help greatly on many sites, but this technique is detectable by sites, and some of them have implemented “ad blocker blockers” which detect your ad blocking and will prevent you from using the site until it is disabled. Additionally, an upcoming change to Chrome extensions called Manifest V3 will severely hinder or entirely prevent many ad blockers from working as effectively as they do now, further limiting the ability of these extensions.

Another option is to set up a DNS based ad blocking solution. These are custom DNS servers that block the sites that host and serve ads to devices. When they receive a request from one of these sites, they just discard the request, instead of passing it on to the site. This protection is better than the extension-based blockers, because these errors surface to the browser as if the sites are misconfigured or down, which makes it more difficult to detect and block. However, these require technical know-how to set up, because they are typically self hosted solutions such as pi-hole that require manual configuration of your home internet’s router and hosting on a constantly running server.

A final option that is frequently advertised is using a VPN provider. VPN stands for Virtual Private Network, and when you connect to a VPN, all of your internet traffic is redirected through that server before being sent to your destination, and is encrypted while going to the VPN server. This means that entities such as your ISP or people monitoring unencrypted public wifi can’t track your internet activity or sell it to other companies. This is a dimension of protection that can’t be replicated by the above two solutions, because even with ad blocking, your ISP can see all traffic that is sent by you. One important caveat, however, is that with this solution you end up trusting your traffic to the VPN provider that you choose. If this VPN is untrustworthy, they might end up selling your data to advertisers anyways, or, if they store that traffic, they may be forced to produce that data for the government if they are subpoenaed.

Though there is no perfect solution to preventing tracking on the internet, these three solutions in tandem provide a reasonable amount of protection for most users, while not limiting their ability to use the internet.

Concluding Remarks

Overall, it is evident when parsing through this paper that user tracking will continue to exist and become increasingly more prevalent in the short term future. As such, the author’s hope in expanding upon this topic is to give a technology user in today’s day and age the opportunity to, in essence, hide themselves if they desire. As more and more people in the industry discover how valuable user data really is, the hope is that the pendulum may even swing in the user’s favor. Instead of freely handing over access to one’s search history or day-to-day dietary habits, consider opting to use one of the mitigation techniques highlighted in the preceding section. If enough consumers neglect to share their information, it could swing the pendulum back in favor of large technology company’s user bases.

This is mere speculation, though, so until that comes to fruition it is also beneficial to remain vigilant and privy to the laws that work in one’s favor. Refer back to section five to read more about these legalities.

When deciding whether or not to accept the terms and conditions on a webpage to allow for tracking, keep in mind the possible benefits and detriments of doing so. Section four expanded upon these in depth, offering readers a chance to perform their own analysis of which aspects matter to them the most when deciding on a course of action.

Most fundamentally of all, ensure that the different tactics companies use to track users are at the forefront of one’s knowledge base. Being unaware of exactly which methods a company is using to track a user simply makes it easier for them to do so, and as a result further incentives tracking as the ease of entry is lower. Verifying that one is cognizant of the four methods described in section three will allow technology users to feel confident going into any websites or apps that they may feel track their whereabouts.

In this paper, the aforementioned techniques used by companies to track their user base were introduced, while additional info regarding the implications for technology users, the legality of tracking, and techniques to defend oneself from companies were amply discussed. It is the author’s hope that in doing so, awareness was brought to users with respect to the often secretive ways in which companies track their users throughout the internet.


Sources