Task A1 of NSA Codebreaker 2022
Task A1 - Initial access - (Log analysis)
We believe that the attacker may have gained access to the victim’s network by phishing a legitimate users credentials and connecting over the company’s VPN. The FBI has obtained a copy of the company’s VPN server log for the week in which the attack took place. Do any of the user accounts show unusual behavior which might indicate their credentials have been compromised?
Note that all IP addresses have been anonymized.
Downloads:
Prompt:
- Enter the username which shows signs of a possible compromise.
Reading the Log
Upon downloading the file and opening it in a text-editing software, we discover several important features:
- The log contains comma-separated values
- The log comes from an
openvpn-server
- The log includes users, times, durations, IP addresses, ports, bytes, and errors for every connection
Let’s start by taking advantage of the first feature – comma separated values. We can open the file as a spreadsheet by changing the extension from *.log
to *.csv
.
I start by converting the csv into a table, saving it in the *.xlsx
format, and looking for unique values. It appears that the errors are only credential and user issues, so I don’t think that will be of much use. We are looking for an account compromise, so we’ll be ignoring the errors and focusing on successful access attempts.
Parsing New Data
Our biggest hint is from the first sentence of the prompt:
We believe that the attacker may have gained access to the victim’s network by phishing a legitimate users credentials and connecting over the company’s VPN.
A phishing attack would only happen while a user is logged in and clicking on emails. Once the malicious link is clicked, the account would likely transmit sensitive information immediately. Since we don’t have access to network or intrusion detection / prevention system logs, we can assume that the attack is detectable with just the information we have.
Let’s assess which of the data fields are most viable to look at first:
- Username - Useful for filtering logins but not useful on its own.
- Start Time - Useful for identifying when access occurred.
- Duration - Useful for identifying when access ended.
- Services - Not useful.
- Active - Not useful.
- Auth - Only useful for identifying login failure.
- Real Ip - Could be useful for geolocation and finding similar / dissimilar locations
- Vpn Ip - Could be useful to find IP re-use
- Port - Not useful.
- Bytes Total - Could be useful to find data exfiltration
- Error - Only useful for determining the cause of a failed login
If we try to filter the Start Time
field, we notice that the date is merely a string. We’ll need to parse it to a number-based date field in our spreadsheet. The formula should be similar across most spreadsheet apps
=DATE(LEFT([@[Start Time]],4),MID([@[Start Time]],6,2),MID([@[Start Time]],9,2)) + TIME(MID([@[Start Time]],12,2),MID([@[Start Time]],15,2),MID([@[Start Time]],18,2))
I’ve created a new column named start
which is now sortable by time.
Next, let’s calculate the End
of each user’s connection, using the Duration
field. It looks like the value is recorded in seconds, so we’ll need to do some math so we can add it to the Start
field. In this case, because adding 1
would mean we are adding a whole day, we’ll need to divide the Duration
field by 86400
to convert the seconds into days. For example:
seconds = 832
day = seconds / 86400
day = 0.00962962963
Let’s create a formula that converts seconds to days and adds the value to the start time:
=[@Start]+([@Duration]/86400)
If we insert this formula right after the Start column, this is what it looks like:
Finding the Intrusion
With these two fields parsed, let’s hide everything we don’t need and search sort by Username
. We want to analyze the user traffic to see if someone logged in twice at the same time. Typically, if someone steals your credentials, they will start a second connection while you are still logged in.
We discover that only one user logged on twice simultaneously. Ryan.X
first logged in from 08:05 AM to 13:34 PM on February 2nd, but he also logged in from 09:31 AM to 09:55 AM the same day. The IP addresses are also different, with the first session being 172.18.34.65
and the second session being 172.27.235.116
. This is clearly a second login, which is what we were looking for.
When we enter Ryan.X
into the NSA Codebreaker site, we get a green banner, indicating Ryan.X
is the correct answer!
BONUS: Automated Script
Here is a super fast way to exploit this task:
#!/usr/bin/env python3
import pandas as pd
from datetime import datetime
# Import VPN Log
logfile = "vpn.log"
# Read the log to a Pandas dataframe as if it were a CSV
log = pd.read_csv(logfile)
# Remove all records of unsuccessful connections
log = log[log["Duration"] > 0]
# Remove all fields that we don't need
log = log[["Username", "Start Time", "Duration", "Real Ip"]]
# Convert the "Start Time" from string to datetime format
log["Start Time"] = pd.to_datetime(log["Start Time"], infer_datetime_format=True)
# Calculate the "End Time" by adding "Duration" in seconds to the "Start Time"
log["End Time"] = log["Start Time"] + pd.to_timedelta(log["Duration"], unit='S')
# Create a list of unique Usernames
users = log["Username"].unique()
# Enumerate through the Usernames
for user in users:
# Filter the log by the current user
df = log[log["Username"] == user]
# Sort the entries by Start Time
df.sort_values(by=['Start Time'])
# For each VPN entry
for i in range(0, len(df) - 1):
# Check if the user logged in again before logging out
if df.iloc[i+1]["Start Time"] < df.iloc[i]["End Time"]:
# Print a list of users that logged in twice simultaneously
print(df.iloc[i]["Username"])