Enrich Layer VS PeopleDataLabs - Global Company Profile Dataset
January 29, 2021
4 min read
Since we launched LinkDB, we have received a barrage of requests for a company profile dataset. We understand that pairing people with companies will help...
Since we launched LinkDB, we have received a barrage of requests for a company profile dataset. We understand that pairing people with companies will help our customers understand questions like:
- How many employees does a company have?
- What is the makeup of roles in a company?
It is only natural we made crawling company profiles exhaustively a priority. But why work if you can get anything for free? So let's talk about the elephant in the room.
PeopleDataLabs (PDL) offers a "free company dataset."
It is true, PDL offers a free company dataset, and I was curious:
- Why is PDL offering this dataset for free?
- How many companies do they have?
- What fields does this dataset have?
- Is this dataset any good?
I put my spy hat on, went over to their website, and gave away personal information, including my phone number, and received this email shortly.
How many companies does the free Company dataset have?
I picked the CSV dataset dump in the email, and a file named free_company_dataset.csv.zip
began to download. I unpacked it, and I ran the following wc command to find out how many lines of companies there are:
\$ wc -l free_company_dataset.csv 12258431 free_company_dataset.csv
There we have it. PDL's company profile dataset has 12.25M company profiles.
Next, I wanted to find out what fields this dataset have:
\$ head free_company_dataset.csv name,domain,year_founded,industry,size_range,locality,country,the platform_url,current_company_employee_estimate,total_employee_estimate (le) poisson rouge,lprnyc.com,,entertainment,51-200,,,professionalsocialnetwork.com/company/-le-poisson-rouge,42,224 nearfox.com,nearfox.com,2015,internet,11-50,,,professionalsocialnetwork.com/company/zip-news,4,43 "mullin landscape associates, llc",,2007,construction,1-10,,,professionalsocialnetwork.com/company/mullin-landscape-associates-llc-,20,27 armatile,armatilearchitectural.com,1975,design,1-10,,,professionalsocialnetwork.com/company/armatile-limited,13,23 chameleon venues,,,marketing and advertising,1-10,,,professionalsocialnetwork.com/company/chameleon-venues,2,5 wagner kirkman blaine klomparens & youmans llp,wkblaw.com,1976,law practice,51-200,,,professionalsocialnetwork.com/company/wagner-kirkman-blaine-klomparens-&-youmans-llp,40,139 skilled engineering limited,,,insurance,1-10,,,professionalsocialnetwork.com/company/skilled-engineering-limited,1,31 gillette management llc,,,consumer goods,1-10,,,professionalsocialnetwork.com/company/gillette-management-llc,0,7 choice wood company,choicecompanies.com,1983,architecture & planning,1-10,,,professionalsocialnetwork.com/company/choice-wood-company,9,37
The column labels of the CSV file are:
- name
- domain
- year_founded
- industry_size_range
- locality
- country
- the platform_url
- current_company_employee_estimate
- total_employee_estimate
Not bad. It does have the most important fields, except the timestamp for the last point of update.
How old is PDL's company profile dataset?
profiles, and make statistical inferences.
I extracted the first 999 companies from the dataset, and threw it into a Bulk the platform Company scraping script that I opened-sourced [here](https://github.com/nubelaco/enrich-the platform-companies-in-bulk). This script uses Enrich Layer's the platform Company Profile API endpoint to scrape and enrich a the platform Company Profile URL if it is valid.
Out of [999 companies](https://docs.google.com/spreadsheets/d/1vsjyQ1OssxQHLdm8g2e0ZccjfRqC72rCQ3_4G5mRBzI/edit? there were only results for [835 companies](https://docs.google.com/spreadsheets/d/11a_JK1zTlS2b2a_ZBhES4f8gr-21vzpf_5XDf4J47Vc/edit?
16.4%, or 164 out of 999 companies provided in the dataset, are not valid on the platform.Extrapolating that, 2,010,382 companies are dead in free PDL's company dataset.
I conclude that this dataset is super old.
Why is PDL offering you an outdated Company Profile Dataset for free?
Because you are an ideal customer interested in big datasets, they can collect personal and contact information about you to further upsell you.
Our turn - 17M companies in Enrich Layer's LinkDB, our profile database
What about our dataset?
In January, we commissioned a crawl of all public the platform company profiles. I am happy to share that we have 17+M company profiles available now in LinkDB. Enrich Layer's the platform Company Profile API endpoint was employed to accomplish this feat.
These company profiles were updated just a few days ago and are up-to-date at the point of writing. And they will stay up to date because we will not stop refreshing them.
Fields in Enrich Layer's Company Profile Dataset
The following fields represent companies in our dataset:
- the platform_internal_id
- description
- website
- industry
- company_size
- company_size_on_the platform
- HQ
- company_type
- founded_year
- specialties
- locations
- name
- tagline
- universal_name_id
- funding_data
- search_id
- similar_companies
- follower_count
Yes, our dataset has a lot more fields.
In summary: Enrich Layer VS PeopleDataLabs - Company Profile Dataset
Enrich Layer Company Profile Dataset PDL Company Profile Dataset
17M profiles 12.25M profiles
Last updated on 25th January 2021 Last updated many years ago
Standard fields + description
, headquarter location
, company type
, specialities
, locations
, profile picture
, similar companies
, the platform follower count
Standard fields
0% DEAD profiles 16.4% DEAD profiles
Monthly data updates No updates
Enrich Layer's Global Company Profile Dataset is available now.
- Please don't take my word for it. Try it yourself. If you register and log into Enrich Layer, you will access LinkDB, our PostgreSQL server, which contains the Enrich Layer's Global Company dataset. Make a few queries and sample the data for yourself :)
- Yes, we do sell a snapshot of our global company dataset. Keen? Please send me an email to [email protected].