Task A1 of NSA Codebreaker 2022

09 Dec 2022

Task A1 - Initial access - (Log analysis)

We believe that the attacker may have gained access to the victim’s network by phishing a legitimate users credentials and connecting over the company’s VPN. The FBI has obtained a copy of the company’s VPN server log for the week in which the attack took place. Do any of the user accounts show unusual behavior which might indicate their credentials have been compromised?

Note that all IP addresses have been anonymized.

Downloads:

Access log from the company’s VPN server for the week in question. (vpn.log)

Prompt:

Enter the username which shows signs of a possible compromise.

Reading the Log

Upon downloading the file and opening it in a text-editing software, we discover several important features:

The log contains comma-separated values
The log comes from an openvpn-server
The log includes users, times, durations, IP addresses, ports, bytes, and errors for every connection

raw log

Let’s start by taking advantage of the first feature – comma separated values. We can open the file as a spreadsheet by changing the extension from *.log to *.csv.

csv log

I start by converting the csv into a table, saving it in the *.xlsx format, and looking for unique values. It appears that the errors are only credential and user issues, so I don’t think that will be of much use. We are looking for an account compromise, so we’ll be ignoring the errors and focusing on successful access attempts.

errors filtered

Parsing New Data

Our biggest hint is from the first sentence of the prompt:

We believe that the attacker may have gained access to the victim’s network by phishing a legitimate users credentials and connecting over the company’s VPN.

A phishing attack would only happen while a user is logged in and clicking on emails. Once the malicious link is clicked, the account would likely transmit sensitive information immediately. Since we don’t have access to network or intrusion detection / prevention system logs, we can assume that the attack is detectable with just the information we have.

Let’s assess which of the data fields are most viable to look at first:

Username - Useful for filtering logins but not useful on its own.
Start Time - Useful for identifying when access occurred.
Duration - Useful for identifying when access ended.
Services - Not useful.
Active - Not useful.
Auth - Only useful for identifying login failure.
Real Ip - Could be useful for geolocation and finding similar / dissimilar locations
Vpn Ip - Could be useful to find IP re-use
Port - Not useful.
Bytes Total - Could be useful to find data exfiltration
Error - Only useful for determining the cause of a failed login

If we try to filter the Start Time field, we notice that the date is merely a string. We’ll need to parse it to a number-based date field in our spreadsheet. The formula should be similar across most spreadsheet apps

=DATE(LEFT([@[Start Time]],4),MID([@[Start Time]],6,2),MID([@[Start Time]],9,2)) + TIME(MID([@[Start Time]],12,2),MID([@[Start Time]],15,2),MID([@[Start Time]],18,2))

I’ve created a new column named start which is now sortable by time.

Sort by Time

Next, let’s calculate the End of each user’s connection, using the Duration field. It looks like the value is recorded in seconds, so we’ll need to do some math so we can add it to the Start field. In this case, because adding 1 would mean we are adding a whole day, we’ll need to divide the Duration field by 86400 to convert the seconds into days. For example:

seconds = 832
day = seconds / 86400
day = 0.00962962963

Let’s create a formula that converts seconds to days and adds the value to the start time:

=[@Start]+([@Duration]/86400)

If we insert this formula right after the Start column, this is what it looks like:

Making the End

Finding the Intrusion

With these two fields parsed, let’s hide everything we don’t need and search sort by Username. We want to analyze the user traffic to see if someone logged in twice at the same time. Typically, if someone steals your credentials, they will start a second connection while you are still logged in.

Search for Overlaps

We discover that only one user logged on twice simultaneously. Ryan.X first logged in from 08:05 AM to 13:34 PM on February 2nd, but he also logged in from 09:31 AM to 09:55 AM the same day. The IP addresses are also different, with the first session being 172.18.34.65 and the second session being 172.27.235.116. This is clearly a second login, which is what we were looking for.

When we enter Ryan.X into the NSA Codebreaker site, we get a green banner, indicating Ryan.X is the correct answer!

Task A1 Success

Badge:

Badge A1

BONUS: Automated Script

Here is a super fast way to exploit this task:

#!/usr/bin/env python3

import pandas as pd
from datetime import datetime

# Import VPN Log
logfile = "vpn.log"
# Read the log to a Pandas dataframe as if it were a CSV
log = pd.read_csv(logfile)
# Remove all records of unsuccessful connections
log = log[log["Duration"] > 0]
# Remove all fields that we don't need
log = log[["Username", "Start Time", "Duration", "Real Ip"]]
# Convert the "Start Time" from string to datetime format
log["Start Time"] =  pd.to_datetime(log["Start Time"], infer_datetime_format=True)
# Calculate the "End Time" by adding "Duration" in seconds to the "Start Time"
log["End Time"] = log["Start Time"] + pd.to_timedelta(log["Duration"], unit='S')
# Create a list of unique Usernames
users = log["Username"].unique()

# Enumerate through the Usernames
for user in users:
    # Filter the log by the current user
    df = log[log["Username"] == user]
    # Sort the entries by Start Time
    df.sort_values(by=['Start Time'])
    # For each VPN entry
    for i in range(0, len(df) - 1):
        # Check if the user logged in again before logging out
        if df.iloc[i+1]["Start Time"] < df.iloc[i]["End Time"]:
            # Print a list of users that logged in twice simultaneously
            print(df.iloc[i]["Username"])