Analysis of the June 2021 data dump containing 700 million Linkedin accounts

Table of contents :

Introduction

On June 22, 2021, an aggregate of data concerning 700 million LinkedIn accounts was offered for sale on a forum. While this aggregate does not contain passwords, it contains a large amount of public data available on Linkedin user profiles.

A data aggregate very similar to this one had already been made public in April 2021, and contained 500 million accounts. Linkedin had published a statement regarding this data aggregate at the time, indicating that it was containing only public data accessible on the profiles of its users.

The data aggregate of June 22, 2021 contains information very similar to those present in the data aggregate of April 2021, except that it now contains 700 million accounts. It was posted on a forum, and is sold there for $ 5,000. The Linkedin company published a press release regarding the data aggregate of June 22, 2021, stating that it contains only public data accessible on the profiles of its users.

We will now dive into the data contained in this aggregate. The data presented below has been anonymized, but exactly reproduces the content found in the data aggregate (see this example file, containing anonymized data of a user account in the aggregate of data).

Analysis of the data present in the aggregate

The data aggregate is presented as a list of JSON structures, separated by line breaks like this:

{"id":"ok50x4jDJULSRUyHWx3DzV_0000","full_name":"john doe","first_name":"john",...}
{"id":"Do4v+vBouxldLUfBrLAFPj_0000","full_name":"john doe","first_name":"john",...}
{"id":"YRLmYhSnfJtjEZ5bXxKANC_0000","full_name":"john doe","first_name":"john",...}

The JSON structure of each row corresponds to a user, and contains a large amount of information that we will explore in detail in the following sections.

General informations

At the very beginning of each JSON structure we find general information about the LinkedIn account of the user. These are personal data containing the name, first name, gender and date of birth of the user:

{
  "id": "pQrFO2uEETAb0WxOT9mdLh_0000",
  "full_name": "john doe",
  "first_name": "john",
  "middle_initial": "j",
  "middle_name": "peter",
  "last_name": "doe",
  "gender": "male",
  "birth_year": "1985",
  "birth_date": "1985-07-13",
  ...
}

Social networks

Then we find all the informations relating to social networks linked to this profile. These informations contains the full link to the user profile as well as the username on the platform. The social networks we found in the data aggregate are LinkedIn, Facebook, Twitter, and GitHub:

{
  ...
  "linkedin_url": "linkedin.com/in/john-doe-b93431ea",
  "linkedin_username": "john-doe-b93431ea",
  "linkedin_id": "36950119",
  "facebook_url": "facebook.com/john.doe",
  "facebook_username": "john.doe",
  "facebook_id": "100002605726632",
  "twitter_url": "twitter.com/john.doe",
  "twitter_username": "john.doe",
  "github_url": "github.com/john.doe",
  "github_username": "john.doe",
  ...
}

Industry and company information

After the social networks associated with the user profile, we receive all the information relating to the position he currently occupies. Find us in particular in his role, his email and his professional phone, as well as all the information on his current company (size, date of foundation, social networks, geographical location):

{
  ...
  "industry": "computer software",
  "job_title": "security engineer",
  "job_title_role": "engineering",
  "job_title_sub_role": "software",
  "job_title_levels": ["manager", "senior"],
  "job_company_id": "evilcorp",
  "job_company_name": "evilcorp",
  "job_company_website": "evilcorp.us",
  "job_company_size": "1001-5000",
  "job_company_founded": "1995",
  "job_company_industry": "government administration",
  "job_company_linkedin_url": "linkedin.com/company/evilcorp",
  "job_company_linkedin_id": "123456",
  "job_company_facebook_url": "facebook.com/evilcorp",
  "job_company_twitter_url": "twitter.com/evilcorp",
  "job_company_location_name": "washington, district of columbia, united states",
  "job_company_location_locality": "washington",
  "job_company_location_metro": "washington",
  "job_company_location_region": "district of columbia",
  "job_company_location_geo": "46.7,-74.76",
  "job_company_location_street_address": "46 St Peter's road",
  "job_company_location_address_line_2": "3rd floor",
  "job_company_location_postal_code": "dh3 1rb",
  "job_company_location_country": "united states",
  "job_company_location_continent": "north america",
  "job_last_updated": "2019-07-01",
  "job_start_date": "2012-07-03",
  "job_summary": "Long text describing my job",
  ...
}

Company geolocation data

Following the information on the user’s current company, we find its company geolocation information:

{
  ...
  "location_name": "washington, district of columbia, united states",
  "location_locality": "washington",
  "location_metro": "washington, district of columbia",
  "location_region": "district of columbia",
  "location_country": "united states",
  "location_continent": "north america",
  "location_street_address": "46 St Peter's road",
  "location_address_line_2": "3rd floor",
  "location_postal_code": "dh3 1rb",
  "location_geo": "46.7,-74.76",
  "location_last_updated": "2020-10-01",
  ...
}

Number of LinkedIn connections and salary

Then we find the number of LinkedIn connections as well as the user’s estimated salary:

{
  ...
  "linkedin_connections": 768,
  "inferred_salary": "55,000-70,000",
  ...
}

Number of years of experience and profile description

Then we find the number of years of experience of the user, as well as the general description of its profile:

{
  ...
  "inferred_years_experience": 15,
  "summary": "Long text describing my profile",
  ...
}

Email addresses and phone numbers

The following section contains all the phone numbers and emails associated with the user’s account:

{
  ...
  "phone_numbers": [
    "+14839950305",
    "+13549658422",
    "+14131851328",
    "+14676644426",
    "+14855411518"
  ],
  "emails": [
    {
      "address": "john.doe@gmail.com",
      "type": "personal"
    },
    {
      "address": "johndoe@hotmail.com",
      "type": "personal"
    }
  ],
  ...
}

Interests and skills

After that we find the user’s list of centers of interest as well as the list of skills:

{
  ...
  "interests": [
    "sport",
    "science",
    "photography",
    "running",
    "nature"
  ],
  "skills": [
    "microsoft office",
    "python",
    "scrapping",
    "security",
    "pentesting"
  ],
  ...
}

Geolocation data

We then find the user’s geolocation data:

{
  ...
  "location_names": [
    "washington, district of columbia, united states"
  ],
  "regions": [
    "district of columbia, united states"
  ],
  "countries": [
    "united states"
  ],
  "street_addresses": [],
  ...
}

Professional experiences

After the profile location data, we find the list of the user’s professional experiences:

{
  ...
  "experience": [
    {
      "company": {
        "name": "evilcorp",
        "size": "1001-5000",
        "id": "evilcorp",
        "founded": "1995",
        "industry": "government administration",
        "location": {
          "name": "washington, district of columbia, united states",
          "locality": "washington",
          "region": "district of columbia",
          "metro": "washington",
          "country": "united states",
          "continent": "north america",
          "street_address": "46 St Peter's road",
          "address_line_2": "3rd floor",
          "postal_code": "dh3 1rb",
          "geo": "46.7,-74.76",
        },
        "linkedin_url": "linkedin.com/company/evilcorp",
        "linkedin_id": "123456",
        "facebook_url": "facebook.com/evilcorp",
        "twitter_url": "twitter.com/evilcorp",
        "website": "evilcorp.us"
      },
      "location_names": [],
      "end_date": "2015-01",
      "start_date": "1996-01",
      "title": {
        "name": "chief executive officer",
        "role": null,
        "sub_role": null,
        "levels": []
      },
      "is_primary": true,
      "summary": "Long description of my job"
    }
  ],
  ...
}

Training courses

After the profile’s geolocation data, we find the list of the user’s professional experiences:

{
  ...
  "education": [
    {
      "school": {
        "name": "washington university",
        "type": "post-secondary institution",
        "id": "9ud0aPNCMhSRuPHi7mg3p9_0",
        "location": {
          "name": "canada",
          "locality": "washington",
          "region": "washington",
          "country": "united states",
          "continent": "north america"
        },
        "linkedin_url": "linkedin.com/school/washington-university",
        "facebook_url": "facebook.com/washington-university",
        "twitter_url": "twitter.com/washington-university",
        "linkedin_id": "123456",
        "website": "washington-university.us",
        "domain": "washington-university.us"
      },
      "end_date": "1986",
      "start_date": "1983",
      "gpa": null,
      "degrees": [],
      "majors": [
        "computer security"
      ],
      "minors": [],
      "summary": null
    }
  ],
  ...
}

This part contains a list of social media profiles attached to this user. This list contains his LinkedIn profile, but also profiles on other social media:

{
  ...
  "profiles": [
    {
      "network": "facebook",
      "id": "100002605726632",
      "url": "facebook.com/john.doe",
      "username": "john.doe"
    },
    {
      "network": "linkedin",
      "id": "36950119",
      "url": "linkedin.com/in/john-doe-b93431ea",
      "username": "john-doe-b93431ea"
    },
  ],
  ...
}

Certifications

The data also contains the list of certifications passed by the user, with the name, the certification issuing body, the certification date and the expiration date of the certification:

{
  ...
  "certifications": [
    {
      "organization": "offensive security",
      "start_date": None,
      "end_date": None,
      "name": "OSCE"
    },
    {
      "organization": "offensive security",
      "start_date": None,
      "end_date": None,
      "name": "OSCP"
    }
  ],
  ...
}

Languages

Finally we find the list of languages known by the user, as well as his level of mastery for each of them:

{
  ...
  "languages": [
    {"name": "french", "proficiency": 5},
    {"name": "dutch", "proficiency": 5},
    {"name": "english", "proficiency": 3},
    {"name": "german", "proficiency": 3}
  ],
  ...
}

Conclusion

Although this aggregate of data contains a lot of information on a very large number of users, it seems that it does not contain any information not already publicly present on the profiles (except emails).

The information that is present in this aggregate of data:

  • Full name
  • The gender
  • Profile description
  • The training course
  • The list of professional experiences
  • The list of interests
  • The list of skills
  • The list of certifications
  • Company location data
  • Email addresses
    • Personal
    • Professionals
  • Telephone numbers
  • Usernames on social networks

Information that is not present in this aggregate of data:

  • Passwords for Linkedin accounts

The potential impacts of this data aggregate are mainly linked to the publication of emails. The personal and professional email addresses of the users present in this aggregate of data may be targeted by phishing campaigns.

Passwords are not present in this data aggregate, however, it is highly recommended that you change your passwords on LinkedIn and other platforms that use the same password. Indeed, thanks to the information provided in the data aggregate, an attacker can potentially find the password of users by using custom password lists. The attacker can then attempt to bruteforce the password of the LinkedIn account or on other services used by the user. It is also possible to use this data to ask to reset the password of a user’s account on a site, by answering security questions.

References