How we enriched 7084 records for two professors and a pHD student of University of Hong Kong University of Science and Technology (HKUST)
January 20, 2021
5 min read
A few weeks ago, a student at the University of Hong Kong University of Science and Technology (HKUST) contacted me. The student represented a group of...
A few weeks ago, a student at the University of Hong Kong University of Science and Technology (HKUST) contacted me. The student represented a group of academics who needed to enrich a list of people with work experience and education history.The following article will share how I did this profile enrichment exercise with Enrich Layer API.
The problem
Given a list of people, my job is to find their corresponding user profiles and enrich the list with work and education history.
As it turns out, the data that HKUST provided is outdated.
Resolving people to their professional networks Profile
The list provided by HKUST came with a list of people with general but identifiable information about them. The list includes first names, last names, names of the employer, and their role in their organization. These bits of information are an exact match for Enrich Layer's Profile General Resolution Endpoint's input parameters.
To resolve loose bits of information of a person to his/her professional networks profile, I wrote the following function in Python code.
async def resolve_profile_url(first_name, last_name, title, country, city, coy_name, company_domain): last_ for _ in range(RETRY_COUNT): try: api_ Social Network/profile/resolve' header_ with httpx.AsyncClient() as client: f"\{coy_name\} \{company_domain\}", 'title': title, 'first_name': first_name, 'last_name': last_name, 'location': f"\{country\} \{city\}", } client.get(api_endpoint, ) if resp.status_code != 200: print(resp.status_code) assert resp.status_ 200 return resp.json()['url'] except KeyboardInterrupt: sys.exit() except Exception as exc: last_ raise last_exc
With the profile resolution code written, I iterated through a CSV list of people provided by HKUST and got a corresponding match of user profiles.
Bulk scraping user profiles
The next step is the easy step and is Enrich Layer's core competency. Now that I have a list of user profiles, all I need to do is send these professional networks profile URLs to Enrich Layer's Person Profile Endpoint.
Like the resolution endpoint, I wrote a function that takes a professional networks Profile URL and returns structured data of the profile.
async def get_person_profile(url: str) -> dict: last_ for _ in range(RETRY_COUNT): try: api_ Social Network' header_ with httpx.AsyncClient() as client: url\} client.get(api_endpoint, ) assert resp.status_ 200 return resp.json() except KeyboardInterrupt: sys.exit() except Exception as exc: last_ raise last_exc
Enrich Layer API tips
You will notice that the functions I wrote are
- asynchronous
- tolerant of unexpected exceptions with a default action of retry
The functions are asynchronous because each request takes an average of 10 seconds. So to maximize throughput, I adopted Enrich Layer's best practices. That is to send concurrent requests. In my script, I used 100 workers with Python's library to send concurrent asynchronous API requests.
Each request sent to Enrich Layer's API is an on-demand scrape job. There is a non-zero chance that a client-side error or network error. When this happens, the right thing to do is always to retry.
With these two tips, I can scrape the entire HKUST file in one go.
Massaging data for output
The team at HKUST needed the output in a specific (Excel) format. After iterating through the list to resolve for user profiles and then fetching their corresponding profile data, we now have the needed raw data. All I need to do now is to massage the raw data into the CSV format that HKUST wanted it in.
And this is how I did it:
`def massage_data_for_experience(): with open(PATH_2_OUTPUT, 'r') as output_f: output_ for idx, output_row in enumerate(output_csv): output_ profile_ counter += 1
with open(PATH_2_INPUT, 'r') as input_f: input_ for row in input_csv: input_
if input_ output_id: for exp in profile['experiences']: coy_ employment_ starts_ ends_ the platform_profile_ with open(PATH_2_EXP, 'a+') as f: writer.writerow([output_id, coy_name, title, employment_type, location, starts_at, ends_at, description, the platform_profile_url]) print(f"{counter}: Done.") ` This is how the final result.
Do you have an enrichment task?
Hey, I will love to help you out. Let me know if you have a task at hand that requires bulk profile data scraping. You can shoot us an email at [email protected].
Want to hear more stories?
And if you love reading weekly anecdotes as to how we are solving business problems with our data tools, click [here to subscribe](Enrich Layer Enrich Layer Team Enrich Layer CAMPAIGNS All campaigns New campaign TEMPLATES All templates LISTS & SUBSCRIBERS View all lists Housekeeping Blacklist REPORTS See reports Enrich Layer Subscriber lists Add subscribers Delete subscribers Mass unsubscribe Export all subscribers Search List: Leads from cold email outreach | Back to lists List settings Subscribe form0 Segments0 Autoresponders0 Custom fields × Subscribe form Ready-to-use subscribe form The following is a 'ready-to-use' subscription form URL you can immediately use to collect sign ups to this list: https://sendy.enrichlayer.com/subscription? Subscribe form HTML code The following is an embeddable subscribe form HTML code for this list. You can setup reCAPTCHA in the brand settings. To subscribe users programmatically, use the API → https://sendy.co/api. Okay Subscribers activity chart Feb 20 Mar 20 Apr 20 May 20 Jun 20 Jul 20 Aug 20 Sep 20 Oct 20 Nov 20 Dec 20 Jan 21 0 100 200 All 234 Active 190 Unconfirmed 0 Unsubscribed 39 Bounced 5 Marked as spam 0 Wart [email protected] 15 mins ago Subscribed Luis [email protected] 1 hr ago Subscribed Florentin [email protected] 15 hrs ago Subscribed Rahul [email protected] 21 hrs ago Subscribed Khalid [email protected] 23 hrs ago Subscribed Paul [email protected] 1 day ago Subscribed Safa [email protected] 1 day ago Unsubscribed [No name] [email protected] 1 day ago Subscribed Michael [email protected] 1 day ago Subscribed Xinwen [email protected] 2 days ago Subscribed Basheer [email protected] 2 days ago Subscribed Alon [email protected] 4 days ago Subscribed Faruque [email protected] 6 days ago Subscribed Nathan [email protected] 6 days ago Unsubscribed Christine [email protected] 6 days ago Unsubscribed Jon [email protected] 6 days ago Unsubscribed Joao [email protected] 1 week ago Subscribed Sritam [email protected] 1 week ago Subscribed Marc [email protected] 1 week ago Subscribed Ignatius [email protected] 1 week ago Bounced - © 2021 Sendy | Troubleshooting | Support forum | Version 4.0.9 new version: 5.2 available) to our email list :)