professional networks profiles have become a powerful place to gather data on individuals. Different parties will have their own usage of professional networks profiles, such as professional networks data mining, profile research or leads generation.
There are many methods out there to procure the information we need. In this tutorial, we’ll be using one of the more simple ways to obtain the data we need, which is to use Python to simulate a Google search and get ahold of the URLs returned from the search results. Further processing can be done, either manually or via automation, as we discuss in our other tutorial, How to Build the platform Automation Tools with Python With a Code Example.
The main reason for this is to get familiar with one of the most common methods of web scraping, which is browser-based scraping. For those unfamiliar, this is where we simulate human behaviour on the web by using a browser. We will utilize a tool called Selenium, which is an open-source testing framework for web applications, which allows us to start a browser through the script. Its usage goes far beyond testing, however, as we will soon demonstrate.
Setup:
Before we start, we need to set up the project and download the dependencies we need:
Step 1
Let’s simulate a Google search. We will be making use of one of the more unknown Google search features, which is Google search operators, also referred to as advanced operators. These are special characters and commands that help to provide a more strict criterion for our search term, hence narrowing down our search results. You can read here for a comprehensive list of operators.
`def create_search_url(title, location, *include): base_ x: "%22" + x + "%22"
result += base_url result += quote(title) + "+" + quote(location)