Not long ago I started expanding my online network with professionals relevant to my field. This was not because I was seeking a job. I have always been curious to know what kind of business problems people are trying to solve using data, what the data architecture of a known company looks like, or even how the work-life balance of a specific business role is. In the same way, people share entertaining content from their personal life on a social media platform, they also share valuable content from their professional life on a professional social media platform. Interacting with professional content helps me get to know my field better, stay up to date with the latest industry trends or even discover potential mentors.
The platform I am using is Linkedin but I found it too limited in discovering professionals relevant to my field without using paid services. So I put together a method to discover professionals of specific seniority and field as well as to extract their details. As an example, in this guide we will explore how to find all senior data engineers (or better all that identify themselves as senior data engineers) in Greece and collect their details in a CSV file.
All web pages as well as Linkedin Profiles are indexed by Google. As a result, Linkedin Profiles are searchable via Google. The profiles that we are interested in are all Senior Data Engineer profiles in Greece.
There are a lot of problems with this search.
Most of these problems occur because the above query in reality translates to “Find a page with linkedin AND senior AND data AND engineer AND greece, somewhere in it“. So next we need to be more specific using operators (check out the official documentation).
What we improved here:
- site operator: Limits the results to a particular site or pattern. In our case, we limited the results to LinkedIn profiles from LinkedIn Greece website (pattern : gr.linkedin.com/in/). - intitle operator: Find pages with particular words or phrase in the title. We asked google to return results with senior data engineer phrase in the title. - double quotes "": Search for the exact match. We asked google to return results with the exact senior data engineer in the title.
Again we can observe the following problems:
We expected to find more senior data engineers in Greece.
Finally, we can further improve our query by searching for: **site:gr.linkedin.com/in/ intitle:"data engineer" AND (lead OR senior OR head) **
What we improved here:
Now it seems that we achieved our goal since the results are closer to what we expected. The next step is to gather the results into a single CSV file.
To retrieve the results we need to build automation in Python, to gather everything into a CSV file. To achieve that we will use Playwright which is an open-source framework for browser automation developed by Microsoft in 2020.
You can get the full code here.
First, we need to install the required packages including Playwright and its dependencies.
pip install pandas
pip install playwright
python -m playwright install
Then to retrieve our results from the query we have previously defined as : site:gr.linkedin.com/in/ intitle:"data engineer" AND (lead OR senior OR head) we need to run the following script.
python google_retriever.py
A Google Chrome (Chromium) instance will open and the results from each page will incrementally be gathered into a file named profiles.csv. The fields that we will keep are the following.
― All fields may not be available for each result.
After successfully retrieving all results, Google Chrome (Chromium) instance will close and we will get the following output.
By now, every profile available on Google should be located in profiles.csv.
― There will be times when Google will block us, then we have to manually bypass the captcha or use a 3rd party service.
In this guide, we showed a way to collect the details of specific professional profiles so we can expand our network. Specifically, we explored how to improve the accuracy of our query in Google Search using operators and how to make use of a web automation framework in python to collect the query results.
If you enjoy reading stories like this and want to support me as a writer, please subscribe!