IN DEPTH: Should We Fear Big Brother, Big Data or Both?

orwellShould you fear big brother or big data?  Or both?  Increasingly, industry observers have realized that the Internet has been taken over by giant data companies who have or will soon lose any regard for their customers.  Calls for regulation are getting louder.   The European Union and California have taken steps towards trying to protect Internet users.   But real solutions continue to be elusive because the Big Brother and Big Data are becoming indistinguishable…….and this is very worrisome.

We strongly urge you to begin breaking Big Data and Big Brothers’ grip over your personal data by using alternative search engines.   As we’ll explain below, currently the only way to wrest Big Data’s control over YOU is for YOU to use various sources for your information.  A good place to start is with DuckDuckGo.  But first, what is the problem with Big Data?


“Big Brother” was a fictional character in George Orwell’s 1949 dystopian novel 1984.  But “Big Brother” has morphed into a fictional character increasingly used by Big Data to separate you from your hard-earned money.    They portray him as the epitome of authoritarian government watching every move of every citizen.   They offer you information or devices that will help protect you from this dreaded authority.   However — and here’s where it gets interesting — there really is a “big brother” that has crept into the lives of most Americans and he isn’t the government.   It turns out that he is many of the same companies who are warning you about “Big Brother”.   They are him….or vice versa.    Confused?  You should be.   That’s part of their objective.

Big Data companies who are collecting and mining your personal information.   There are a slew of real and potential abuses by these largely unregulated companies who collect personal data and use it in ways that you’d never imagine….or even permit they had asked for your permission.   But they don’t and will not.    Collecting data about customers is virtually as old as marketing itself, but the trillions of data points now available online make it a sophisticated piece of weaponry.   Companies such as Google, Facebook and Microsoft are the highly visible members of Big Data.  But some of the bigger data companies are literally unknown to the public:  Acxiom, CoreLogic, Nielsen, Datalogix, Experian, Intelius and Equifax are but a few of these giga-giants.

These companies collect and sell your personal data.  VERY personal data.   For example, your health data (including what websites you use to search about medical matters), purchase behavior data, travel, vehicle, financial, court and public record data, demographics and even your social security, drivers licence and birthdays.  (See exhaustive list below).

These data miners can map a consumer’s journey across the web and potentially even augment their findings with Facebook data collected by apps that will tell people the minutest details about their likes, dreams, interests and activities.  Advertisers can enlist the services of a startup such as Tapad, which can follow users onto their mobile devices and tablets. Traditional data brokers sell offline data culled from public records and survey results to marketers, who then can overlay it with their purchase data and the data they’ve already mined online as well as with public records and private aggregated data from Big Data companies.  Because they are not regulated, little is known about this shadowy online industry.

Originally, the Internet was viewed as a level playing field where everyone’s voice would be heard, ungoverned by national laws and free from the need to make money.  Almost five years ago, one insider — Jaron Lanier – went public about his well-documented concerns but his proposed solutions were rejected as unworkable.   The Net’s promise of anonymous discourse has devolved into abusive trolling and a haven for cybercriminals (and hostile state attacks).  Fake news, whether created for ideology or profit, runs rampant. Four out of 10 adult internet users said in a Pew survey that they had been harassed online.  An expert consensus appears to be emerging; the Internet is a 30-year-old mess.

Lanier warned that Internet commerce has been built upon a foundation of content creation, distribution and monetization on the internet that threatens to stifle innovation and independent thought.  In his 2013 treatise “Who Owns the Future”,  Lanier says the public has been seduced by the ” Siren Servers” (Google, Facebook, Amazon and a few other behemoths) that accumulate and control consumer data without paying people who provide all of this “free” information.   He argues that the Web has become a Faustian model in which users of social media who have succumbed to a trap in which they willingly give their personal data and imaginative content in exchange for “free” services.


Some new regulations have gone into effect in Europe and California (takes effect in 2020) to deal with privacy concerns.   California’s effort is found in  AB375, the California Consumer Privacy Act of 2018, the California equivalent of GDPR that mirrors the EU law in many ways.  The law will give the state’s 40 million residents the right to view the data that companies hold on them, make corrections to it, and request that it be deleted and not sold to third parties. Any company that holds data on more than 50,000 people is subject to California’s law, and each violation carries a fine of $7,500. Consumers will begin to see a “Do Not Sell My Personal Information” link on websites.  They’ll also have recourse if data is not accurate.    But the issues surround Big Data are far more complex than data accuracy.

Recently, Apple CEO Tim Cook described some of these bigger concerns in his the keynote speech at the 40th International Conference of Data Protection and Privacy Commissioners in Brussels.  Describing it as “unchecked surveillance”, he points to processes in which enduring digital profiles are being created.  Ultimately, these data unregulated companies “know you better than you may know yourself” due to proprietary algorithms that serve up increasingly extreme content.  Cook called for four regulatory priorities:

  1. data minimization — “the right to have personal data minimized”, saying companies should “challenge themselves” to de-identify customer data or not collect it in the first place
  2. transparency — “the right to knowledge”, saying users should “always know what data is being collected and what it is being collected for, saying it’s the only way to “empower users to decide what collection is legitimate and what isn’t”. “Anything less is a shame,” he added
  3. the right to access — saying companies should recognize that “data belongs to users”, and it should be made easy for users to get a copy of, correct and delete their personal data
  4. the right to security — saying “security is foundational to trust and all other privacy rights”

Yet, Cook doesn’t go far enough.   As noted by one of the Net’s original designers, Sir Tim Berners-Lee,  the problem is far more ominous than privacy and transparency.  Berners-Lee wrote, in part:

“The web that many connected to years ago is not what new users will find today. What was once a rich selection of blogs and websites has been compressed under the powerful weight of a few dominant platforms. This concentration of power creates a new set of gatekeepers, allowing a handful of platforms to control which ideas and opinions are seen and shared.These dominant platforms are able to lock in their position by creating barriers for competitors. They acquire startup challengers, buy up new innovations and hire the industry’s top talent. Add to this the competitive advantage that their user data gives them and we can expect the next 20 years to be far less innovative than the last. What’s more, the fact that power is concentrated among so few companies has made it possible to weaponise the web at scale. In recent years, we’ve seen conspiracy theories trend on social media platforms, fake Twitter and Facebook accounts stoke social tensions, external actors interfere in elections, and criminals steal troves of personal data. We’ve looked to the platforms themselves for answers. Companies are aware of the problems and are making efforts to fix them — with each change they make affecting millions of people. The responsibility — and sometimes burden — of making these decisions falls on companies that have been built to maximise profit more than to maximise social good. A legal or regulatory framework that accounts for social objectives may help ease those tensions.”

This is the guy who is credited with helping create the Internet!  Indeed, Berners-Lee is responding to the fact that Google now accounts for about 87% of online searches worldwide. Facebook has more than 2.2 billion monthly active users – more than 20 times more than MySpace at its peak. Together, the two companies (including their subsidiaries Instagram and YouTube) rake in 60% of digital advertising spend worldwide.   His concern is that concentrating power in the hands of gatekeepers that will result in a “control over which ideas and opinions are seen and shared”.  Without “socially-minded” regulation, the promise of the Internet will not be realized.  Worse yet, the innovations spawned by the Internet will be stifled.


Google, now handles 63% of all U. S. search queries and over 93% of U. S. searches on mobile devices, according to The Statistics Portal’s July, 2018, report. Add Microsoft’s Bing and Oath (formerly Yahoo), and three corporations control 98% of U. S. search results, according to Bob Rankin.  This consolidation of search services is highly problematic, as these companies have the ability to control what we know.  By regulating who we can find and what we can learn about them, they can control what we can think about and, in many ways, what we think.  Ominously, Google recently dumped its 18-year prime directive: “Don’t Be Evil”.   That can’t be good..  It might even be evil.

Most consumers are likely blithely ignorant of the fact that Google and some other major search engines have largely blocked information about cryptocurrency exchanges.   And due to “fake news” concerns,  Google and Facebook have begun culling hundreds of blogs and websites from the search results.   Voices are already being silenced, even if they are not particularly welcomed voices.   In  countries headed by authoritarian regimes, such as China, Russia and Turkey, search results and Net access are being routinely scrubbed by these search companies.

Large data companies, such as YouTube and Facebook and Twitter,  argue that they aren’t publishers, but more like the phone company, providing a service to people.  However, increasingly, these companies are monitoring and then cutting off phone calls – or your phone service – when they detect the wrong sort of content.  And they are under more and more pressure to let the police and intelligence services monitor what goes on, given a court order.  This is the surveillance society, as per Tim Cook.

So what can we do?  You might remember how oil companies resisted vehicle emissions controls and any kind of carbon tax.  Or, if you’ve got some gray hairs, you’ll recall tobacco companies insisting that medical research correlating cigarettes and cancer was flawed.  You’ll begin to hear the same warnings and denials from the Big Data companies.   As Cook mentioned in his keynote address:  “some companies will “endorse reform in public and then resist and undermine it behind closed doors.They may say to you our companies can never achieve technology’s true potential if there were strengthened privacy regulations.”   He’s right.  These big companies have the ability to put up big resistance to any kind of regulation.

Enter the movement for Search Decentralization. Standard centralized search engines have a single point of control and are run by the search engine companies. In contrast, decentralized search engine is a search engine that performs the same basic function of a search engine, but without a single point of control. There are a number of decentralized, peer-to-peer search engines that have either been proposed or that have actually been built. These search engines include Infrasearch, Opencola, YaCy, FAROO, and others. The goal of these decentralized search engines is to create an alternative search engine option that is preferable to centralized search engines for many people.  However, they haven’t access to the large databases accumulated by Google, Bing Oath and Facebook.

Blockchain technology holds the promise of opening up the massive databases to competing, open-source search engines.  All of these databases would be encrypted and stored on a blockchain. Instead of a search engine company owning this data, users own it, and they control access to it with a private key.  The scenario envisioned by some reformers is that if users decide to share their private keys with companies, then these companies can access the data and market to the user based on that information. However, if users do not decide to share the data, then companies will not be able to access it and use it to create marketing schemes.

Unfortunately, consumers become extremely reliant upon and used to using Google, Oath, and Bing for the majority of their searches. So, it may take time for people to adjust to new search engines, and for the concept of decentralized search engines to take hold.  For now, there are no decentralized search engines that can compete with the Big 3.   But there are alternative search engines that are worth considering.   If nothing else, they’ll reduce Big Data’s control over your personal information.


A number of alternative search services have begun to emerge.   Some of the ones you should consider include:

DuckDuckGo is probably the most well-known alternative search engine. Its CEO, Gabriel Weinberg, said, “if the FBI comes to us, we have nothing to tie back to you.” Searches are sourced mostly from Yahoo.  It is user-friendly due to features like ‘zero-click’ information (all your answers are found on the first result page), infinite scroll and prompts to clarify your questions. Also the ad spam is much less than Google. Thanks to its association with Mozilla, Vivaldi and Apple, the site has grown steadily since its inception, going from an average 79,000 daily searches in 2010, to 23.5 million daily and 16 billion total searches as of April 2018.  One of its more attractive features is something called Bangs.  Bangs allow you to search third-party sites directly from DuckDuckGo. Say you wanted to search Google would let you perform a site search by entering “”. Using DDG’s bangs you enter “!muo”.  Searching a site with any of the thousands of available bangs takes you directly to the site, rather than the search engine’s results. If you do find yourself missing Google’s tailored results, then adding ‘!g’ before your query will take you directly there.

StartPage uses results from Google, which is a good thing if you prefer Google’s result without the tracking. Ixquick, which is an independent search engine that uses its own results, developed StartPage to include results from Google. Its features include a proxy service, URL generator, and HTTPS support and it remembers your settings in a privacy friendly way.  This site is very popular in Europe.  Not so much in the U.S.

Blekko has a unique interface serves results by category. It uses a thing called “slashtags”- which is a text tag preceded by a ‘/’ slash character, just like “hashtags” in Twitter, to search in its database with the related keywords in categories.  It was developed by ex-Googlers, it presents itself as the ‘spam free search engine’. Even though it logs user specific information, it claims to delete these logs within 48 hours.

Gibiru sources its search results from a modified Google algorithm. Gibiru’s CEO, Steve Marshall, announced in a press release that his service is exactly what Google was early on. It provides reliable search results without all the tracking that Google does today.

Searx is a private metasearch engine that is completely open source and which culls information from 70 different search engines. This lets anyone using it to create their own instance. However, if you use your own instance, your search results will be the only one served so it diminishes the privacy.  Importantly, It does not store your search data being an open source metasearch engine at the same time.  Its results page offers many search categories, which helps narrow down your search results. It has IT tab, with results from sites like Stack Overflow and GitHub, which is definitely useful for developers.

This much is known:  the impacts of Big Data are far-reaching and sometimes shocking.   The New York Times wrote a widely circulated story about an angry father storming into his local Target after his teenage daughter received coupons for maternity items. The retailer’s data-crunchers had tried to discern  which of its customers were pregnant so that it could more effectively market pregnancy-products.  The company used data that came from shoppers’ purchase histories, not from online snooping and inadvertently exposed the fact that the teenager was pregnant.  This is just one of many such examples of unanticipated consequences from unregulated data usage.

The power of the megadata retailers, like Google, Amazon and Facebook, along with the giga-giants such as Acxiom and Experian is frightening.  They sorely need to have their consolidated power checked.  Regulation is inevitable, but will be largely inadequate.  Recently, the Federal Trade Commission called on Congress to pass legislation that would give consumers access to their data with brokerages, but no one expects the Republican-controlled Congress to act.   In the meantime, YOU have the power to loosen the these companies’ control over your personal data.   It begins with using alternative search engines…..and keeping the pressure on your elected officials to protect you from Big Data and Big Brother abuses.




Identifying Data

• Name
• Previously Used Names
• Address
• Address History
• Longitude and Latitude
• Phone Numbers
• Email Address

Sensitive Identifying Data

• Social Security Number
• Driver’s License Number
• Birth Date
• Birth Dates of Each Child in Household
• Birth Date of Family Members in Household

Demographic Data

• Age
• Height
• Weight
• Gender
• Race & Ethnicity
• Country of Origin
• Religion (by Surname at the Household Level)
• Language
• Marital Status
• Presence of Elderly Parent
• Presence of Children in Household
• Education Level
• Occupation
• Family Ties
• Demographic Characteristics of Family Members in Household
• Number of Surnames in Household
• Veteran in Household
• Grandparent in House
• Spanish Speaker
• Foreign Language Household (e.g., Russian, Hindi, Tagalog, Cantonese)
• Households with a Householder who is Hispanic Origin or Latino
• Employed – White Collar Occupation
• Employed – Blue Collar Occupation
• Work at Home Flag
• Length of Residence
• Household Size
• Congressional District
• Single Parent with Children
• Ethnic and Religious Affiliations

Court and Public Record Data

• Bankruptcies
• Criminal Offenses and Convictions
• Judgments
• Liens
• Marriage Licenses
• State Licenses and Registrations (e.g.,Hunting, Fishing, Professional)
• Voting Registration and Party Identification

Social Media and Technology Data

• Electronics Purchases
• Friend Connections
• Internet Connection Type
• Internet Provider
• Level of Usage
• Heavy Facebook User
• Heavy Twitter User
• Twitter User with 250+ Friends
• Is a Member of over 5 Social Networks
• Online Influence
• Operating System
• Software Purchases
• Type of Media Posted
• Uploaded Pictures
• Use of Long Distance Calling Services
• Presence of Computer Owner
• Use of Mobile Devices
• Social Media and Internet Accounts including: Digg, Facebook, Flickr, Flixster, Friendster, hi5, Hotmail, LinkedIn, Live Journal, MySpace, Twitter, Amazon, Bebo, CafeMom, DailyMotion, Match, myYearbook,, Pandora, Photobucket, WordPress, and Yahoo

Home and Neighborhood Data

• Census Tract Data
• Address Coded as Public/Government Housing
• Dwelling Type
• Heating and Cooling
• Home Equity
• Home Loan Amount and Interest Rate
• Home Size
• Lender Type
• Length of Residence
• Listing Price
• Market Value
• Move Date
• Neighborhood Criminal, Demographic, and Business Data
• Number of Baths
• Number of Rooms
• Number of Units
• Presence of Fireplace
• Presence of Garage
• Presence of Home Pool
• Rent Price
• Type of Owner
• Type of Roof
• Year Built

General Interest Data

• Apparel Preferences
• Attendance at Sporting Events
• Charitable Giving
• Gambling – Casinos
• Gambling – State Lotteries
• Thrifty Elders
• Life Events (e.g., Retirement, Newlywed,Expectant Parent)
• Magazine and Catalog Subscriptions
• Media Channels Used
• Participation in Outdoor Activities (e.g., Golf, Motorcycling, Skiing, Camping)
• Participation in Sweepstakes or Contests
• Pets
• Dog Owner
• Political Leanings
• Assimilation Code
• Preferred Celebrities
• Preferred Movie Genres
• Preferred Music Genres
• Reading and Listening Preferences
• Donor (e.g., Religious, Political, Health Causes)
• Financial Newsletter Subscriber
• Upscale Retail Card Holder
• Affluent Baby Boomer
• Working-Class Moms
• Working Woman
• African-American Professional
• Membership Clubs – Self-Help
• Membership Clubs – Wines
• Exercise – Sporty Living
• Winter Activity Enthusiast
• Participant – Motorcycling
• Outdoor/Hunting & Shooting
• Biker/Hell’s Angels
• Santa Fe/Native American Lifestyle
• New Age/Organic Lifestyle
• Is a Member of over 5 Shopping Sites
• Media Channel Usage – Daytime TV
• Bible Lifestyle
• Leans Left
• Political Conservative
• Political Liberal
• Activism & Social Issues

Financial Data

• Ability to Afford Products
• Credit Card User
• Presence of Gold or Platinum Card
• Credit Worthiness
• Recent Mortgage Borrower
• Pennywise Mortgagee
• Financially Challenged
• Owns Stocks or Bonds
• Investment Interests
• Discretionary Income Level
• Credit Active
• Credit Relationship with Financial or Loan Company
• Credit Relationship with Low-End Standalone Department Store
• Number of Investment Properties Owned
• Estimated Income
• Life Insurance
• Loans
• Net Worth Indicator
• Underbanked Indicator
• Tax Return Transcripts
• Type of Credit Cards

Vehicle Data

• Brand Preferences
• Insurance Renewal
• Make & Model
• Vehicles Owned
• Vehicle Identification Numbers
• Vehicle Value Index
• Propensity to Purchase a New or Used Vehicle
• Propensity to Purchase a Particular Vehicle Type (e.g., SUV, Coupe, Sedan)
• Motor Cycle Owner (e.g., Harley, Off-Road Trail Bike)
• Motor Cycle Purchased 0-6 Months Ago
• Boat Owner
• Purchase Date
• Purchase Information
• Intend to Purchase – Vehicle

Travel Data

• Read Books or Magazines About Travel
• Travel Purchase – Highest Price Paid
• Date of Last Travel Purchase
• Air Services – Frequent Flyer
• Vacation Property
• Vacation Type (e.g., Casino, Time Share, Cruises, RV)
• Cruises Booked
• Preferred Vacation Destination
• Preferred Airline

Purchase Behavior Data

• Amount Spent on Goods
• Buying Activity
• Method of Payment
• Number of Orders
• Buying Channel Preference (e.g., Internet, Mail, Phone)
• Types of Purchases
• Military Memorabilia/Weaponry
• Shooting Games
• Guns and Ammunition
• Christian Religious Products
• Jewish Holidays/Judaica Gifts
• Kwanzaa/African-Americana Gifts
• Type of Entertainment Purchased
• Type of Food Purchased
• Average Days Between Orders
• Last Online Order Date
• Last Offline Order Date
• Online Orders $500-$999.99 Range
• Offline Orders $1000+ Range
• Number of Orders – Low-Scale Catalogs
• Number of Orders – High-Scale Catalogs
• Retail Purchases – Most Frequent Category
• Mail Order Responder – Insurance
• Mailability Score
• Dollars – Apparel – Women’s Plus Sizes
• Dollars – Apparel – Men’s Big & Tall
• Books – Mind & Body/Self-Help
• Internet Shopper
• Novelty Elvis

Health Data

• Ailment and Prescription Online Search Propensity
• Propensity to Order Prescriptions by Mail
• Smoker in Household
• Tobacco Usage
• Over the Counter Drug Purchases
• Geriatric Supplies
• Use of Corrective Lenses or Contacts
• Allergy Sufferer
• Have Individual Health Insurance Plan
• Buy Disability Insurance
• Buy Supplemental to Medicare/Medicaid
Individual Insurance
• Brand Name Medicine Preference
• Magazines – Health
• Weight Loss & Supplements
• Purchase History or Reported Interest in Health Topics including: Allergies, Arthritis, Medicine Preferences, Cholesterol, Diabetes,
Dieting, Body Shaping, Alternative Medicine, Beauty/Physical Enhancement, Disabilities, Homeopathic Remedies, Organic Focus, Orthopedics, and Senior Needs

*Compiled and Published by Quartz

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.