A Study of Machete Cyber Espionage Operations in Latin America

by Veronica Valeros, Maria Rigaki, Sebastian Garcia, Kamila Babayeva

image.jpg

Reports on cyber espionage operations have been on the rise in the last decade. However, operations in Latin America are heavily under-researched and potentially underestimated. In this research we analyse and dissect a cyber espionage tool known as Machete. The results presented in this work are based on the collection, reversing and analysis of Machete samples from 2011 to 2019. The large collection of samples allowed us to analyse the malware’s evolution in detail and track changes in its functionality and structure, including modifications introduced as late as January 2019.

Our research shows that Machete is operated by a highly coordinated and organized group that focuses on Latin American targets. We describe the five phases of the APT operations from delivery to exfiltration of information and we show why Machete is considered a cyber espionage tool. Furthermore, our analysis indicates that the targeted victims belong to military, political or diplomatic sectors. The review of the almost eight years of Machete operations shows that it is probably operated by a single group whose activities may be state-sponsored. Machete is still active and operational to this day.

Content

  • Introduction

  • Previous Work

  • Methodology

  • Malware Operations

  • Malware Capabilities

  • Malware Infrastructure

  • Malware Evolution

  • Analysis of Targets

  • Conclusions

  • Acknowledgements

image.jpg

Introduction

Cyber espionage is understood as the act of obtaining restricted information without permission using software tools, such as malware. While traditional espionage [1] activities are difficult to detect and study as they are typically covert operations, cyber espionage operations have been more often disclosed and studied. There is however a lack of research in this area in Latin America. Very few cyber espionage campaigns have been discovered and studied in the last decade in this region.

Nowadays cyber espionage is conducted by groups often referred to as Advanced Persistent Threats (APT). APT is the technical term used to identify economic or politically motivated groups, that conduct cyber attacks persistently and effectively against a specific target. What distinguishes APTs from traditional attacks are their clear goals, specific targets, and long-term, highly organized campaigns [2].

In this research we present an in-depth analysis of the espionage activities of an APT group in Latin America through the analysis of one of its cyber espionage tools known as Machete or Ragua. Reports about Machete have been published previously in [3] and [4], however their results are based on a subset of Machete samples, leaving unanswered questions about its long term operations, the functionality of the malware in detail and the analysis of the attackers' capabilities and operations as a whole. We aim to provide an extensive overview of the malware and actors by analyzing their operations from their beginning until today. 

Our research is based on a large corpus of Machete malware binaries that span eight years of operation. The malware corpus was processed, reverse engineered, and dissected in order to obtain the malware configurations, command-and-control servers, and decoy documents used in the campaigns. This information was used to identify the profile of the targets, the regions affected by the malware, and details of the malware infrastructure.

We show that the group behind Machete fits the description of an APT as it is running a long term operation, it attacks specific targets and aims at strategic benefits. Based on the information extracted from the malware binaries we were able to understand that Machete is targeting political and military related victims. The victims appear to be located in Central and South America, and are primarily Spanish speakers. Furthermore, our research indicates that there is likely one group operating the malware, due to the sharing of encryption keys, and overlapping network infrastructure.

The main contributions of this research include:

  • An in-depth analysis of Machete that complements and goes beyond previous reports. The analysis is based on the largest collection of Machete binaries to date, which is three times as large as reported in previous work. The time span of the analyzed samples is also broader and allowed the study of attackers methods in time.

  • The most comprehensive collection to date of Machete hashes and decoy documents spanning eight years of operations.

  • Discovery of new functionality based on reverse engineering analysis of Machete samples. Until now it was unknown that Machete was able to perform lateral movement on an infected organization. Our analysis showed that Machete can propagate via infection of USB drives. The use of Dropbox as exfiltration method is also first reported in this research.

  • A qualitative analysis of the decoy documents based on language, topics, and countries that sheds light on potential victims, and helps highlight the interests of the attackers when choosing their targets.

  • A case study on the operations of an APT group that is active in a largely understudied part of the world, namely Latin America.

Previous Work

This paper focuses on the cyber espionage activities of an APT group conducted in Latin America. The study of this group is conducted by the analysis of one of its tools known as Machete or Ragua

Machete was first reported in 2014 [3], and subsequently in 2017 [4]. These reports give a general overview of the malware functionality, however, they both focus on a small corpus of malware samples. Both reports provide number of victims and countries, however, no information is provided to verify these claims. There is no supporting information of where, when, and how data about the victims was collected. Additionally, while many of the provided hashes of the samples on these reports are publicly available, not all those samples can be accessed to verify the analysis.

Machete was not the first cyber espionage campaign in Latin America. In 2012, a report [5] described a targeted attack dubbed Operation Medre. This targeted attack attempted to steal AutoCAD files from victim computers. The victims were primarily located in the country of Peru, but other countries of the region were also targeted. The type of espionage by this APT group is considerably different than the group behind Machete. 

In 2015, a report [6] uncovered an APT group dubbed Packrat targeting Latin American politicians, journalists, and others. This group is believed to have been operating since 2008. There are similarities on how this actor operates, the type of infrastructure used, the use and themes of decoy documents, however, there are strong differences on the tools used for espionage, in this case, known existing malware. There’s no evidence that these two actors are the same. In 2018 a new report disclosed the use of remote access tools for espionage in Latin America [7]. While this attacks presents some similarities with the Packrat group, the targeted audience and the tactics used are substantially different.

In early 2019, researchers uncovered a new ATP group dubbed APT-C-36 [8] targeting companies and government agencies in Colombia. The tactics and techniques used differ significantly from those used by Machete.

Several studies focus on the delivery mechanism used by APT groups. In [9], researchers present a study of "decoy documents". The study shows how documents were socially engineered to match native language, regional and thematic interests of the targets. This seems to be a common factor in all the APTs targeting Latin America. Other studies on targeted attacks against NGOs [10] or individuals [11] show also the preference of attackers in using socially engineered suspicious links and documents.

Methodology

For this research we used malware binaries, or samples, that contain Machete. In the first stage of the research we aimed to obtain valid Machete samples to analyse. These samples were reverse engineered and studied to determine the malware characteristics, how they targeted their victims, and how the malware evolved over time.

This research is based on a corpus of 105 Machete binaries and 63 decoy documents. The following steps were carried out in order to obtain and create this large corpus of validated samples: first, we searched for and collected all possible Machete samples from public and private repositories; second, we manually verified that these samples were Machete samples; third, we identified the structure and nesting of files to differentiate between first-stage and second-stage binaries, their parents, and the individual modules; fourth, we reversed engineered each file to obtain the source code of the malware written in Python, its configuration files, encryption keys, and the decoy documents used, when available.

In order to obtain samples of Machete, we first relied on hashes from previous work [3, 4]. A total of eight hashes were available from the first report, and 27 initial decoys from the second report. However, not all of these were publicly available. An initial analysis of the files led to the identification of specific characteristics in the binaries, such as number of Portable Executable (PE) sections, PE comments, file structure, and the final modules’ source code. These characteristics, combined with specific anti-virus signatures, IP addresses and domain names, were used to expand the search and identify more potential Machete samples in public and private repositories. Every binary matching our indicators was downloaded for further manual classification by a human analyst. At the end of this stage the corpus of samples at our disposal had tripled the number of files reported in previous work.

The samples positively classified as Machete were further processed. This workflow is illustrated in Figure 1.

  1. The first step in this process consisted of identifying the parent. The malware is typically distributed as an email attachment, therefore the attachment is considered a parent.

  2. Next, the parent was uncompressed to identify the first stage of Machete.

  3. In the third step, the first-stage malware was uncompressed to extract the second-stage malware and the second-stage decoy document.

  4. The second-stage malware was further uncompressed to identify the malware libraries, modules and configuration files.

  5. Next, the modules were reverse engineered from PE files to Python compiled code.

  6. Finally, the Python compiled modules were decompiled to obtain the source code of the malware, in Python.

Figure 1: Machete has a nested structure. The parent is what the victim receives (1). This file contains a first executable file (2). The first executable file contains the payload and the decoy document (3). The payload consists of six to eight mod…

Figure 1: Machete has a nested structure. The parent is what the victim receives (1). This file contains a first executable file (2). The first executable file contains the payload and the decoy document (3). The payload consists of six to eight modules (4), which can be further reverse engineered to obtain their source code (6).

The reverse engineering of the modules consisted of obtaining the Python compiled code from the PE module using a tool called unpy2exe [12]. Once the Python compiled code (.pyc) was obtained, we used a tool called uncompyle6 [13] to obtain the Python source code. This process is known as decompilation.

In most cases the Python source code was obfuscated using a tool such as pyobfuscate [14], which makes the source code difficult to read by renaming variables to nonsensical names and adding dummy clauses to increase ‘noise’. Reversing the obfuscation is a process known as deobfuscation. In this case, deobfuscation was still possible, given that external library calls were not obfuscated and the code was still able to run. No other anti-analysis or anti-debugging techniques were used among the examined samples.

At the end of this process, an exhaustive corpus of 105 stage 2 Machete samples, along with 63 decoy documents and the source code of all modules for every sample, was available to continue the analysis and research. The list of SHA256 hashes of decoy documents is shown in Appendix A; the list of SHA256 hashes of stage 2 Machete samples is shown in Appendix B.

MALWARE OPERATIONS

Machete is a piece of malicious software designed for Windows operating systems (32-bit). It is distributed as a Portable Executable file compressed as a ZIP or RAR file. Machete is written in Python.

APT operations are highly coordinated and organized. They typically follow a common structure, often known as a kill chain [15]. Machete APT operations are no exception. The five phases of the operation are illustrated in Figure 2, namely: delivery, installation, action on objectives, lateral movement and exfiltration. This section describes these phases, the encryption used to protect the stolen information, and finally, why we consider Machete to be a cyber espionage tool.

Figure 2: Machete operations are structured in five phases: delivery, installation, action on objectives, lateral movement and exfiltration.

Figure 2: Machete operations are structured in five phases: delivery, installation, action on objectives, lateral movement and exfiltration.

Delivery

There are four known methods for distributing Machete: (i) as a malicious attachment in a phishing email, (ii) as a linked file (URL) in a phishing email, (iii) as an executable file in an infected USB drive, and (iv) via web injections. The first two methods are the most likely to be used for the initial compromise according to previous work [3, 4].

The third method of delivery, discovered during this investigation, is commonly used by attackers in order to jump air-gapped secured systems and to move laterally within an already compromised organization. No exploits or zero-day vulnerabilities are needed for the delivery of this malware.

Installation

Targeted victims are lured into downloading and opening the Machete malware via well-crafted social engineering techniques. As previously mentioned, no vulnerabilities are exploited in the operating system in order to execute the malware. Once the victim clicks to open the decoy file, the malware is executed and the decoy document is displayed to the victim.

Action on objectives

Machete is an espionage tool designed to steal information such as keystrokes, clipboard content, screenshots, web camera captures, audio from the computer’s microphone, system information and geolocation of the target. These functionalities are described in full in the next section.

Lateral movement

Compromised victims can be used to spread the malware further within the same organization. Through reverse engineering of the samples, we discovered that Machete has specific instructions on how to spread automatically via USB drives. In the presence of an external drive, Machete will copy itself to the drive, then proceed to copy any important documents to the computer in order to steal them. This spreading gives attackers the ability to strengthen their foothold in an organization, maximizing their effectiveness.

Exfiltration

We use the term ‘exfiltration’ to refer to the act of ‘unauthorized copying and transmission of information by any means’, as defined in [16]. Machete has three main methods of exfiltrating the stolen information from its victims. First, it uploads the collected data to a designated File Transfer Protocol (FTP) server. While the documents are encrypted, the FTP communication is not encrypted. The FTP server is secured with a username and password, however the credentials can be found either in the source code of the malware or, in newer versions of the malware, in a configuration file.

The second method for exfiltrating the information from the victim is via USB devices. Machete is able to recognize special USB devices, and if they are present the malware will copy the collected information to the USB drive. This latter feature suggests that the attackers may have physical access to some of the victim computers.

The third method for data exfiltration was observed only in a few cases. In those cases, Machete was using Dropbox [17] as an exfiltration server.

Encryption

Machete encrypts the files using symmetric encryption. In particular, it uses the Advanced Encryption Standard (AES) algorithm, and the encryption key is embedded in the source code of the malware. (Encryption is further discussed in the ‘malware infrastructure’ subsection.)

Machete cyber espionage capabilities

Machete shares some capabilities with other APT groups such as remote access tools, and information-stealing malware. This raises some questions, such as ‘Why is Machete different?’ and ‘What makes it an espionage tool?’

Machete differentiates itself from other malware thanks to its combined capabilities. The recording of audio, capture of web camera photographs and collecting of documents from the victim’s computer over long periods of time is something that can be directly linked to possible extortion, surveillance or espionage activities.

The intent and specific data-stealing functionality is what defines Machete as a cyber espionage tool. The information stolen does not appear to have monetary value for the threat actors. The tool is not designed to steal credit card information, usernames and passwords – the sort of information that anyone can easily profit from. The information stolen by Machete seems valuable only to actors that would use this information themselves, along with their own information, to influence decisions and gain strategic advantage at a government or political level.

MALWARE CAPABILITIES

The purpose of Machete is to steal information about its victims, specifically documents or data they may possess and information about their current behaviour. During the analysis of Machete samples over a period of eight years we observed the APT group adding and removing functionality. Considering the complete malware corpus, we found functions designed to steal the following information:

  • System information: who the target is and information about the computer being used.

  • Geolocation: where the target is located.

  • Keystrokes: what the target writes.

  • Clipboard content: what the target copies and pastes.

  • Screen captures: what the target is seeing on the screen.

  • Web camera captures: who or what is in front of the computer’s web camera field of view.

  • Audio: what the victim is saying, or conversations from the surrounding environment.

  • Documents: specific documents in the target’s computer.

In the following subsections each function is described in detail.

System information

The type and amount of information collected from the targets varies as well as the method. In early versions of Machete, the extraction was field by field using the platform [21] library, which is part of the Python standard library. As illustrated in Figure 3, the information collected consisted of public IP, operating system name, operating system release, the computer network name (node), system version, architecture and processor. The public IP is retrieved after contacting the C&C server. Information about local network cards, local IPs and MAC addresses is obtained using the Windows command ipconfig. The collected information is stored in a text file for later exfiltration.

Figure 3: Gathering system information using the platform library.

Figure 3: Gathering system information using the platform library.

This Python library was later replaced with the Windows systeminfo function. In later versions of Machete the information collected in this step was reduced considerably.

Geolocation

There are fewer than a dozen samples of Machete that incorporate the functionality to geographically locate the target using Wi-Fi MAC addresses. The oldest sample to implement this functionality is from 2012, and the latest one is from 2019. The malware has two modes of retrieving geo location: the first uses only the MAC address, while the second uses MAC address, channel, and signal strength. In both cases the malware first collects the information from the infected device using the Windows netsh command. There are several attempts at determining the geo location based on the information available using a Google API. The information the malware is retrieving is the accuracy, latitude, longitude, and a link to Google Maps.

Figure 4: Gathering geolocation information.

Figure 4: Gathering geolocation information.

It is important to note here that the link to Google Maps is regionalized to Argentina (com.ar). The link is created as follows:

http://maps.google.com.ar/maps?f=q&source=s_q&hl=en&geocode=&q=%s+%s

Keystrokes

For the stealing of keystrokes, the malware has a keylogger functionality in one of its modules. The malware defines a series of keys in which it is interested, and defines what to do when one of these keys is pressed in the operating system. Machete creates a log file formatted as Hyper Text Markup Language (.htm). The log contains the date and time of the keystrokes, name of the opened application, and the keystrokes typed by the user. In this manner, the attackers not only know what the victim was typing, but also when and where. This provides context and an added value to the stolen information.

This module has suffered minor changes since its first development. One change highlight occurred in mid-2012 when the names of the key IDs were re-written. In this change a typo was introduced that has been present in all the Machete samples that have come since then. The keys 160 and 161 were renamed from lshift and rshift to Shitf(Izq) and Shitf(Dcha).

Clipboard content

Similarly, the malware has a clipboard monitor, which logs everything that the victim copies in a special log file named ‘Clip.html’. The malware logs the clipboard content, data and time, and the name of the open window.

Screen captures

Another vital functionality of Machete is screen capture. Machete is able to take screenshots from the victim computer. These images are indexed by date and time, and provide high value to the attackers in their intelligence analysis due to their rich and detailed content.

Web camera captures

Another of the functions is web camera capture, in which the malware routinely takes pictures using the web camera of the computer (if such a device exists). The web camera capture can tell the attackers who is using the computer, what the victim looks like, or reveal the identity of other individuals. Not only that, it also shows when the computer is unattended, which can be vital if the attackers have physical access to the infected machine.

Audio

Machete is also able to record sound using the microphone of the computer. This functionality has been added and removed multiple times. The Python library pyaudio is used to record the audio in .WAV files. The malware then uses the LAME MP3 audio encoder to convert the files to MP3.

Documents

The last core functionality is document stealing. The malware is able to find, encrypt, and steal interesting documents found on the victim computer. Attackers define files as interesting based on the type of document. In the oldest sample of Machete from 2011, the malware was looking for a small collection of file extensions: .doc, .docx, .xls, .xlsx, .ppt, .pptx, .jpg, .pgp, .skr and .asc. The interest in retrieving new documents increased, and the list expanded to include the additional file extensions: .db, .mdb, .pkr, .gpg, .drw, .lpt, .shp, .rte, .sda, .odp, .sxi, .odt, .sxw, .ods, .sxc, .odg, .sxd, .odb, .odf, .sxm, .txt, and specific files such as key3.db and signons.sqlite used by Firefox to store user credentials. The files are encrypted before they are exfiltrated.

Apart from well-known document formats, is worth noting the attackers’ interest in databases (.db, . mdb, .odb), encryption files (.gpg, .pgp, .asc, .skr, .pkr), maps and design files (.drw, .lpt, .shp, .sda, .sxd), and business applications’ source code (.os).

MALWARE INFRASTRUCTURE

The analysis of Machete and its functionality also provided some insights into the attackers’ own operational capabilities. With the information obtained from the samples we tried to answer the following questions:

  • What are the attackers’ capabilities in terms of infrastructure?

  • Are there multiple attackers that use Machete? • How did the malware evolve over time?

Infrastructure

Machete exfiltrates information primarily via FTP servers. This information is embedded in the Machete samples, namely: domain name, username and password, routes, version, and encryption key. Table 1 provides a summary of the above information along with the software versions found in the malware. Passwords are not displayed for security reasons. Figure 5 shows the distribution of the FTP users, routes and malware versions among the malware samples analysed.

Table 1: Attacker infrastructure summary.

Table 1: Attacker infrastructure summary.

Table 1: Attacker infrastructure summary (contd.).

Table 1: Attacker infrastructure summary (contd.).

Figure 5: Distributon of (a) FTP servers; (b) FTP users; (c) malware versions; and (d) FTP routes.

  • FTP servers. Machete used multiple FTP server domains for its campaigns. For the most part the domain names do not overlap with the malware versions. A total of 14 FTP server domains were identified. Machete used dynamic DNS services such as ddns.net or serveblog.net in many cases. Dynamic DNS gives flexibility to their operations.

  • FTP credentials. From 2013, the group used one set of usernames per FTP server. However, in older samples the attackers were using the same username across multiple FTP servers. The same password was reused in multiple accounts. Attackers evolved to use multiple versions as a way to separate functionality and/or targets.

  • FTP routes. The most used FTP servers contain several routes. These are folders in the FTP server where the stolen information is stored. Every Machete sample has a route defined. Routes are often named after Colombian cities or areas, e.g. Buenaventura, Guajira, Huila, etc. This might be an indication of where the attack operators are stationed.

  • IPaddresses.Atotalof13uniqueIPaddresseshavebeenfoundassociatedwithMacheteFTP servers since its origin. These IPs were obtained via a public passive DNS service [18]. Passive DNS is a system that stores DNS resolution data along with time period. Further analysis showed that different domain names shared IP addresses, even during overlapping periods of time.
    The earliest date that an FTP domain was seen is early 2012. Before this time, attackers relied on IP addresses.

  • Encryption keys. Machete uses symmetric encryption to encrypt the stolen data before exfiltrating it. The encryption key is embedded in the binary. Three distinct AES keys were shared among all analysed samples. This is a strong indication that there was only one group involved in the malware operation as it is very unlikely that different organizations would choose to encrypt their documents with the same keys and store them on the same servers.

MALWARE EVOLUTION

The oldest Machete binary in our corpus dates back to 2011. This is confirmed by the sample’s submission date to VirusTotal [20], which was also in the same year. The most recent sample of Machete was observed in early 2019. Through our reverse engineering and code analysis we have been able to identify three major changes in the last eight years of Machete activity. The first major change was to split Machete functionality into different smaller modules – this happened in early 2011, bringing more flexibility to the malware. The second major change was in 2014 when the obfuscation of the Python source code was added in an effort to bypass detection. The third major change was in early 2019 when the unpacking of the malware code changed considerably, also to hinder detection efforts.

Machete’s authors used multiple versions of the malware mainly to differentiate between campaigns and targets. Modules and functionality were added or removed over time. However, the underlying code structure did not change dramatically until late 2018.

The initial versions of the malware showed how its authors were putting together the malware, building up new functionality, testing libraries, fine tuning parameters. Modularity was added later. In early samples the malware had hard-coded IPs and credentials for the FTP server to exfiltrate data. The IPs were later changed to dynamic DNS domain names, but were still in plain text in the code. The malware later evolved to obfuscate the credentials, and later stored them in a text file, obfuscated. All the changes observed gave Machete modularity and flexibility, allowing new campaigns to be created with the minimum modification of the code.

In terms of anti-analysis techniques, the vast majority of the samples were obfuscated at the Python source code level, probably as an attempt to slow down malware analysts. Earlier versions of Machete were not obfuscated at all.

ANALYSIS OF TARGETS

To understand the purpose of Machete, and the goals of the actors operating it, we analysed the targeted victims. The analysis of possible victims is performed through the analysis of the decoy documents used. Decoy documents were obtained from Machete parent samples, which contained both decoy and malware. By looking at the decoys used in the campaigns through the years we try to infer the profiles of the targets. Information about victims obtained from previous work [3, 4] was not taken into account as it was impossible to validate it.

Manual annotation of decoy documents

A total of 75 unique decoy documents were identified, each of which was used in one or more malicious campaigns. The decoy documents were embedded and compressed in the Machete malware – however, they were only used as decoys and not as an exploitation tool. In this section we present an analysis of 40 documents observed between 2013 and 2018.

We manually analysed each decoy document used by Machete, noting the language, country, dates, and theme covered. The first phase consisted of noting the type of document used (PDF, Word, images, etc.). The second phase consisted of identifying the language used in the documents, to establish a target group. The third phase focused on assigning each document to a theme category based on the content (political, economic, military, etc.). The fourth phase consisted of identifying the country or countries targeted by the document.

Type of decoy documents

The type of documents used were predominantly Microsoft Word documents, followed by PDF, PowerPoint documents, and JPG images (see Figure 6). Two versions of Microsoft documents were used: .docx and .doc. The .docx format was introduced in Word 2007. PDF documents were the second most commonly used type of document. In third place, different Microsoft PowerPoint documents were used: .pps, .ppt and .pptx. Lastly, one image in JPG format was used.

Figure 6: Breakdown of decoy documents used by Machete.

Figure 6: Breakdown of decoy documents used by Machete.

The majority of the documents appear to be documents that have been stolen and re-purposed for the spear-phishing attacks. They could have been manually crafted for the purpose of the attacks, but there are indicators which point to our first hypothesis. In particular, many of the Microsoft Word documents still contained metadata about date of creation, author, organization, and last time printed, which appeared to be real. Typically, a manually crafted document will have this data removed or replaced with fake data. This doesn’t seem to be the case in the corpus of documents we analysed. A subgroup of documents contained information, names, stamps and references that would also be hard to fake.

Languages, countries, regions

Spanish speakers are the primary targets of Machete, but not the only ones as previous work suggested [4]. Of all the decoy documents analysed, one document was written in Portuguese and the rest were written in Spanish. All Spanish- and Portuguese-speaking countries could be targeted by these decoys, irrespective of the country or region. Spanish is the official language in 21 countries, the majority of which are located in Central and South America [19].

After careful analysis of the content of the documents, it was possible to observe that each one made specific reference to certain countries. It was possible then, again through careful analysis, to identify the affected countries in each document. The content of each document was read and examined by native Spanish speakers in order to identify the theme and the country. For instance, in a military personnel reassignment decoy document, all personnel mentioned were from Venezuela. In this case, the annotated country was Venezuela. This process was repeated for all decoy documents that contained enough context. The annotated country is just an indication of where the document was stolen or crafted from, but it doesn’t limit the target audience, as any military official from neighbouring countries would also be interested in this information.

In Figure 7 we show the number of decoy documents per country, while in Figure 8 we emphasize the geographical regions from which the documents were stolen. While previous work mentioned targeted victims outside Latin America, we could not confirm that by looking at the data obtained from the Machete samples.

Figure 7: Number of decoys per country used by Machete.

Figure 7: Number of decoys per country used by Machete.

Figure 8: The content of the decoy documents show countries targeted by Machete.

Figure 8: The content of the decoy documents show countries targeted by Machete.

Topics

The themes of the decoy documents indicate that the potential victims are heavily interested in political topics at national levels, and in military information, ranging from the movement of troops to personnel reassignments. Other decoys appealed to the sense of fear in the victim, using themes such as debt collection and legal subpoenas. In a minority of cases the attackers used generic themes such as sexual content to lure victims into opening the documents; in this case the targets were primarily male. Each document focuses on a specific theme or topic that is highly alluring and attractive for the targets. The number of documents per topic used by Machete is shown in Figure 9.

Figure 9: Breakdown of topics used by Machete in decoy documents.

Figure 9: Breakdown of topics used by Machete in decoy documents.

Decoy documents used in targeted attacks must have certain characteristics. According to the work of [10], attackers use documents that are (1) believable, (2) enticing and (3) conspicuous. The decoy documents used in this espionage activity are believable, due to their nature: they are real, existing documents, not crafted or artificially created. The documents are enticing, as the topics covered are highly attractive and lure victims into opening them. Finally, they are conspicuous as they attract attention and are easily observable by the victim.

Dates

The creation date was extracted from the metadata of each document. Metadata can be altered, so this date is taken as an initial reference of document creation. These dates show that the documents were created in the years 2000, 2006, 2011, 2013, 2014, 2015, 2016 and 2017.

CONCLUSIONS

This paper has presented an analysis of eight years of operations of an APT group targeting Latin America using a cyber espionage tool known as Machete. Spear phishing through the use of real and enticing documents seems the most effective way to compromise their targets. The functionality of Machete has fluctuated considerably in the last eight years, however the main core functionality of the malware remains: keylogging, screen capture, and document stealing. The oldest Machete sample was observed in early 2011, which suggests that the group’s activities started earlier. Machete is still active today.

Our analysis of decoy documents showed that the targeted victims are mainly located in Latin America. However, in this work we could not arrive at the same conclusions as were drawn in previous work regarding the number of victims and their countries. Additionally, the majority of the decoy documents are written in Spanish, but there is a minority of documents in Portuguese, confirming that the victims are located all across Latin America. The documents’ topics are mostly military and political in nature, which points to military and politically motivated targets.

Our investigation suggests that APT sophistication is directly related to the socioeconomics of the targeted regions. Machete is sophisticated considering the region in which it operates. Compared to other APTs it does lack sophistication in terms of the programming language used, the lack of vulnerabilities exploited, and its anti-analysis techniques. However, failing to investigate threats like this based on their apparent lack of sophistication leaves victims in the dark and unprotected.

Our research has also shown that Machete does not rely on zero-day exploits. This confirms previous research that also shows that APT groups rely more on spear-phishing techniques.

Machete continues to evolve and new malware samples are being observed every month. As part of our future work we plan to continue monitoring it and reporting on its activities in order to help stop this threat.

ACKNOWLEDGEMENT

The authors would like to thank Ross Gibb for his assistance with malware reversing; Reversing Labs for access to their malware repository; and Luciano Martins for providing additional Machete samples. This research was partially supported by an Avast Foundation grant for the protection of civil society.

REFERENCES

[1]  Espionage. https://www.mi5.gov.uk/espionage.

[2]  Chen, P.; Desmet, L.; Huygens, C. A study on advanced persistent threats. In IFIP International Conference on Communications and Multimedia Security. Springer, 2014, pp.63–72.

[3]  El machete. https://securelist.com/el-machete/66108/.

[4]  El machete malware attacks cut through latam. https://threatvector.cylance.com/enus/home/ el-machete-malware-attacks-cut-through-latam.html.

[5]  Acad/medre. 10000s of autocad designs leaked in suspected industrial espionage. https://www.welivesecurity.com/media\files/white-papers/ESET\ACAD\Medre\A\ whitepaper.pdf.

[6]  Packrat, Seven Years of a South American Threat Actor. https://citizenlab.ca/2015/12/ packrat-report/.

[7]  CannibalRAT targets Brazil. https://blog.talosintelligence.com/2018/02/cannibalrat-targets- brazil.html.

[8]  APT-C-36: Continuous Attacks Targeting Colombian Government Institutions and Corporations. https://ti.360.net/blog/articles/apt-c-36-continuous-attacks-targeting- colombian-government-institutions-and-corporations-en/.

[9]  Le Blond, S.; Gilbert, C.; Upadhyay, U.; Gomez-Rodriguez, M.; Choffnes, D. R. A broad view of the ecosystem of socially engineered exploit documents. in NDSS, 2017.

[10]  Le Blond, S.; Uritesc, A.; Gilbert, C.; Chua, Z. L.; Saxena, P.; Kirda, E. A look at targeted attacks through the lense of an NGO. in USENIX Security Symposium, 2014, pp.543–558.

[11]  Marczak, W. R.; Scott-Railton, J.; Marquis-Boire, M.; Paxson, V. When governments hack opponents: A look at actors and technology. In USENIX Security Symposium, 2014, pp.511–525.

[12]  unpy2exe - Extract .pyc files from executables created with py2exe. https://github.com/ matiasb/unpy2exe.

[13]  uncompyle6 - A cross-version python bytecode decompiler. https://github.com/rocky/ python-uncompyle6.

[14]  pyobfuscate- Python source code obfuscator. https://github.com/astrand/pyobfuscate.

[15]  Hutchins, E.M.; Cloppert, M.J.; Amin, R.M. Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. 2011. Leading Issues in Information Warfare & Security Research, 1(1), p.80.

[16]  Bowen, B. M.; Hershkop, S.; Keromytis, A. D.; Stolfo, S. J. Baiting inside attackers using decoy documents. In Security and Privacy in Communication Networks, Y. Chen, T. D. Dimitriou, and J. Zhou, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp.51– 70.

[17]  Dropbox.https://www.dropbox.com.

[18]  Risk IQ. https://community.riskiq.com/.

[19]  The world factbook. https://www.cia.gov/library/publications/the-world-factbook/fields/402. html.

[20]  VirusTotal. https://www.virustotal.com.

[21]  Platform – Access to underlying platform’s identifying data. https://docs.python.org/2.7/ library/platform.html.