diff options
Diffstat (limited to 'doc/sws.md')
-rw-r--r-- | doc/sws.md | 205 |
1 files changed, 205 insertions, 0 deletions
diff --git a/doc/sws.md b/doc/sws.md new file mode 100644 index 0000000..e1d7c66 --- /dev/null +++ b/doc/sws.md @@ -0,0 +1,205 @@ +# Proone Software Design Spec +This document is part of **Proone Worn Project**. For overview, refer to +[README.md](/README.md). + +* TODO structure + * TODO workers and functions + * TODO CNC TXT REC + * TODO dvault +* TODO IPv6 +* TODO classes + +## Subsystems +### Heartbeat +**Heartbeat** is a subsystem of Proone that consists of a backdoor and CNC +mechanism on infected devices. **The Heartbeat protocol** is an point-to-point +or a broadcast framing protocol that works over a transport stream such as +TCP/IP. The protocol is documented separately in **[Protocol +Spec](proto.md)**. The overview of the protocol is followed below. + +Heartbeat subsystem takes up a large portion of the Proone code base. The +subsystem mainly works as a format in DNS TXT records and a TCP/IP framing +protocol. A complete heartbeat connection consists of an **authoritive end** and +a **submissive end**. + +In the Heartbeat protocol, a request is usually initiated by one initiating +frame by one end and the other end responds to it by one or more response +frames. The protocol also employs the concept of "protocol upgrade" like that of +WebSocket in which both request and response frames can be streams of frames. + +A request-response session is distinguished by a message id number. A message id +number is generated by the end which initiated the session and is used for the +duration of the session. The idea behind having message id number is to make the +protocol pipe-lineable so that simple request-response pairs can be processed in +parallel. This is merely a future-proof design and does not play a significant +role. + +Unlike conventional botnets, Proone instances(aka "bots") are controlled by TXT +DNS records containing one or more request frames of an authoritive end. In this +scheme, a request is initiated by Proone instances acting as a submissive end +quering and reading the contents of the TXT records. Any response data resulted +in the process is discarded. The heartbeat protocol binary is represented in +base64 encoding because most DNS management software do not accept binary data +for the value of TXT records although [the RFC +spec](https://datatracker.ietf.org/doc/html/rfc1035#section-3.3) does not impose +such restriction. + +Only public DNS servers which support DNS over TLS are used to counter lawful +interception. The reason being, the DNS protocol is not encrypted and ISPs or +law enforcfements can easily filter out TXT REC CNC traffic simply by doing +plain-text string search. A TLS library is used to implement the SSH attack +vector, so using the library for another purpose was an enticing choice. Proone +queries public DNS servers directly rather than using system functions. This +eliminates the chance of letting ISP DNS servers giving false results. Using +public DNS servers is also beneficial since law enforcements would have to take +down the domain itself as it would be difficult to convince the operators of +public DNS servers to block a recursive query to a particular name server. +Another benefit is not having to run CNC servers for simple tasks like running +shell scripts. + +There are 2 recommended applications. One typical application is having a +`PRNE_HTBT_OP_HOVER`(Hand Over Operation) request frame in TXT records so that +instances will connect to servers running authoritive htbt implementations for +furthur instructions. The second application is having a +`PRNE_HTBT_OP_RUN_CMD`(Run Command Operation) frame or a +`PRNE_HTBT_OP_RUN_BIN`(Run Binary Operation) containing a simple minified shell +script for instances to run. + +Using CNC TXT records to transfer a large amount of data is possible but not +recommended. In theory, doing `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation) +with CNC TXT REC is possible. However, For Proone instances, quering TXT +records, decoding base64 data and running a slave heartbeat client is costly +operation. It's not a simple task and prone to failure. + +### Use Cases +To stop all Proone instances, issue command `kill -9 0` or `reboot -nf` with +detach flag unset. To disable all hosts, issue command `half -nf`. + +In order to do things of complexity, it's recommended to implement an +authoritive server implementation and command Proone instances to take orders +from the servers running the implementation. Load balancing can be done at the +DNS level using techniques like round-robin DNS or GeoDNS. Once a Proone +instance connects to an authoritive server, the server can fully utilise the +heartbeat protocol to do the tasks described below. + +Shell scripts can be run on Proone hosts with `PRNE_HTBT_OP_RUN_BIN`(Run Binary +Operation) as long as the script contains a shebang line at the very start of +the script. Note that most embedded devices run lightweight shells like +Ash(BusyBox) and Toysh(Toybox)[^1]. The best is strategy is targetting Bourne +shell, which has been a default shell for the majority of systems(historically). + +To make hosts run an arbitrary binary executable, `PRNE_HTBT_OP_HOST_INFO`(Host +Info Operation) can be used to query the archeticture type of the host to select +a suitable binary for upload. + +To replace the Proone binary, `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation) +can be used. The binary format for the operation is specified in a [separate +document](proto.md). Upon successful upload, the Proone instance will attempt to +`exec()` to the new binary after **binary recombination**(explained in a +separate section) is performed. All this is done in the parent process. In the +event of failure, Proone continues to operate with the existing binary. The only +way to check the result of the operation is through reestablishing the +connection to the Proone instance and querying the version of the binary through +`PRNE_HTBT_OP_HOST_INFO` request. + +The protocol leaves room for implementing M2M mechanisms. A Proone instance +checks if the target host is already infected by attempting to connect to a +**local back door**(or simply, **LBD**) on the target host. An LBD port is +served by a submissive Heartbeat client. The future versions of Proone can +utilise the LBD port to update the binary of the target instance if old one is +encountered. **proone-htbtclient** can be used to examine and maintain the +Proone instance via this port. + +### Binary Archive and Data Vault +Proone aims to be a decentralised botnet. To spread without binary distribution +servers, Proone carries all the executables of arch types it supports. For this, +a special file structure is designed. + +The **Data Vault**("**DVault**") is a binary block containing large and +sensitive data necessary for operation of Proone. DVault is a kempt version of +the data table of Mirai. DVault also helps reduce the size of Proone. Each +executable contains the *.data* section. If there's a long string in the +program, the value of the string will end up in each *.data* section of the +executables. Compression leviates this issue but there's a limit because the +size of data dictionary blocks can only get big. Having a custom *.data* section +for large data solves this issue at the cost of the size of code for fetching +and unmasking values from DVault. This implies that, in some cases, storing +static values in the *.data* section of an ELF is more efficient[^2]. Another +purpose of DVault is masking sensitive data like `PRNE_DATA_KEY_CNC_TXT_REC` and +`PRNE_DATA_KEY_CRED_DICT` so that they're not revealed when `strings` command is +run on the executable or when the process is core dumped. DVault is loaded when +Proone initialises. The loaded contents remain in memory masked and unmasked +only when needed. + +The contents of DVault are XORed with a 256 byte array of random numbers +generated on each compilation. This process makes it impossible to compress the +DVault binary block because of high entropy. Therefore it's not recommended to +use DVault to store exceptionally large values. This issue may be solved by +compressing the value separately at the cost of CPU time. + +The **Binary Archive**("**BA**") is a binary block containing compressed +executables and an index of the executables. + +## Requirements +### Targetting Wide Range of Devices and Kernel Configurations +A number of methods has been employed in efforts to target a wide range of Linux +devices. The assumption is that there are still devices running old images of +Linux and targetting these devices means coding up to the standard of old POSIX +specs and testing under old versions of Linux(namely 2.6.x). +`_POSIX_C_SOURCE=200112L` macro is defined to meet this requirement. Note that +using this macro does not give you an error when you accidentally use APIs not +in the 200112L standard. The compiler will only give you a warning and your code +will compile just fine. If you happen to use a function that the kernel of the +host does not support, the syscall will fail with `ENOSYS`. If the feature +requiring the new API can be silently switched off at runtime, removal of the +macro is recommended. + +The Linux kernel is highly configurable. Pesudo file systems and the device file +system may not be present on a Linux host since they can be disabled. Disabling +any of these file systems is unusual for PCs but practical on embedded devices. +Proone do not assume that these file systems are available on the host and try +to run without using them if not available. + +### Running Lean +Proone is designed under the assumption that honouring other processes on the +system will decrease the change of getting caught by system administrators. + +Proone is compartmentalised so that it's somewhat immune to syscall fails. This +design is to counter `ENOMEM` as it runs lean on lean embedded systems. This +implies that proone can be initialised "half-complete". For example, +it can be initialised with all the workers running except the Heartbeat worker. +In this case, proone will be able to infect other devices on the network +while unable to respond to CNC TXT REC. Another notable case would be an +instance running without the Recon worker. It will respond to the CNC TXT REC +and serve the local backdoor connections while unable to infecting the other +devices on the network. Proone does not reattempt to start the workers it failed +to run on start. The assumption is that the system is already running with its +memory full to the brim and it's futile to wait for resource it failed to claim +as it's likley that the other services on the system will claim the reource at +some point. + +Proone does cooperative multitasking by using **Pthsem** library. This is one +of many efforts to "run lean" whereby restricting CPU usage to one logical +thread. This may seem as a huge missed opportunity if Proone scores infecting +itself onto a beefy multi-core system. Keep in mind that Proone is designed to +run on resource-scarce embedded devices. Most poorly-designed vulnerable devices +will be single core, anways. The strategy is getting the most small-powered +devices infected rather than having a few infected high-performance systems. + +### Volatile Operation +TODO + +## Dependencies +The dependencies for Proone have been kept to absolute necessities. **libssh2** +is used for the SSH brute force vector. Coupled with libssh2's SSL backend +is **Mbedtls** for TLS connection to public name servers and the Heartbeat +protocol. **zlib** is used to implement binary archive. All the libraries are +compiled with default configurations. **Pthsem** is used for threading. + +**libyaml** and **mariadb-connector-c-devel** is required for **hostinfod** +build. YAML has been chosen for the configuration file format and MariaDB for DB +backend. + + +[^1]: Maybe in the future when Toybox gains marketshare? +[^2]: i.e. representing values in code: `int value = 123;` |