aboutsummaryrefslogtreecommitdiff
path: root/doc/sws.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sws.md')
-rw-r--r--doc/sws.md205
1 files changed, 205 insertions, 0 deletions
diff --git a/doc/sws.md b/doc/sws.md
new file mode 100644
index 0000000..e1d7c66
--- /dev/null
+++ b/doc/sws.md
@@ -0,0 +1,205 @@
+# Proone Software Design Spec
+This document is part of **Proone Worn Project**. For overview, refer to
+[README.md](/README.md).
+
+* TODO structure
+ * TODO workers and functions
+ * TODO CNC TXT REC
+ * TODO dvault
+* TODO IPv6
+* TODO classes
+
+## Subsystems
+### Heartbeat
+**Heartbeat** is a subsystem of Proone that consists of a backdoor and CNC
+mechanism on infected devices. **The Heartbeat protocol** is an point-to-point
+or a broadcast framing protocol that works over a transport stream such as
+TCP/IP. The protocol is documented separately in **[Protocol
+Spec](proto.md)**. The overview of the protocol is followed below.
+
+Heartbeat subsystem takes up a large portion of the Proone code base. The
+subsystem mainly works as a format in DNS TXT records and a TCP/IP framing
+protocol. A complete heartbeat connection consists of an **authoritive end** and
+a **submissive end**.
+
+In the Heartbeat protocol, a request is usually initiated by one initiating
+frame by one end and the other end responds to it by one or more response
+frames. The protocol also employs the concept of "protocol upgrade" like that of
+WebSocket in which both request and response frames can be streams of frames.
+
+A request-response session is distinguished by a message id number. A message id
+number is generated by the end which initiated the session and is used for the
+duration of the session. The idea behind having message id number is to make the
+protocol pipe-lineable so that simple request-response pairs can be processed in
+parallel. This is merely a future-proof design and does not play a significant
+role.
+
+Unlike conventional botnets, Proone instances(aka "bots") are controlled by TXT
+DNS records containing one or more request frames of an authoritive end. In this
+scheme, a request is initiated by Proone instances acting as a submissive end
+quering and reading the contents of the TXT records. Any response data resulted
+in the process is discarded. The heartbeat protocol binary is represented in
+base64 encoding because most DNS management software do not accept binary data
+for the value of TXT records although [the RFC
+spec](https://datatracker.ietf.org/doc/html/rfc1035#section-3.3) does not impose
+such restriction.
+
+Only public DNS servers which support DNS over TLS are used to counter lawful
+interception. The reason being, the DNS protocol is not encrypted and ISPs or
+law enforcfements can easily filter out TXT REC CNC traffic simply by doing
+plain-text string search. A TLS library is used to implement the SSH attack
+vector, so using the library for another purpose was an enticing choice. Proone
+queries public DNS servers directly rather than using system functions. This
+eliminates the chance of letting ISP DNS servers giving false results. Using
+public DNS servers is also beneficial since law enforcements would have to take
+down the domain itself as it would be difficult to convince the operators of
+public DNS servers to block a recursive query to a particular name server.
+Another benefit is not having to run CNC servers for simple tasks like running
+shell scripts.
+
+There are 2 recommended applications. One typical application is having a
+`PRNE_HTBT_OP_HOVER`(Hand Over Operation) request frame in TXT records so that
+instances will connect to servers running authoritive htbt implementations for
+furthur instructions. The second application is having a
+`PRNE_HTBT_OP_RUN_CMD`(Run Command Operation) frame or a
+`PRNE_HTBT_OP_RUN_BIN`(Run Binary Operation) containing a simple minified shell
+script for instances to run.
+
+Using CNC TXT records to transfer a large amount of data is possible but not
+recommended. In theory, doing `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation)
+with CNC TXT REC is possible. However, For Proone instances, quering TXT
+records, decoding base64 data and running a slave heartbeat client is costly
+operation. It's not a simple task and prone to failure.
+
+### Use Cases
+To stop all Proone instances, issue command `kill -9 0` or `reboot -nf` with
+detach flag unset. To disable all hosts, issue command `half -nf`.
+
+In order to do things of complexity, it's recommended to implement an
+authoritive server implementation and command Proone instances to take orders
+from the servers running the implementation. Load balancing can be done at the
+DNS level using techniques like round-robin DNS or GeoDNS. Once a Proone
+instance connects to an authoritive server, the server can fully utilise the
+heartbeat protocol to do the tasks described below.
+
+Shell scripts can be run on Proone hosts with `PRNE_HTBT_OP_RUN_BIN`(Run Binary
+Operation) as long as the script contains a shebang line at the very start of
+the script. Note that most embedded devices run lightweight shells like
+Ash(BusyBox) and Toysh(Toybox)[^1]. The best is strategy is targetting Bourne
+shell, which has been a default shell for the majority of systems(historically).
+
+To make hosts run an arbitrary binary executable, `PRNE_HTBT_OP_HOST_INFO`(Host
+Info Operation) can be used to query the archeticture type of the host to select
+a suitable binary for upload.
+
+To replace the Proone binary, `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation)
+can be used. The binary format for the operation is specified in a [separate
+document](proto.md). Upon successful upload, the Proone instance will attempt to
+`exec()` to the new binary after **binary recombination**(explained in a
+separate section) is performed. All this is done in the parent process. In the
+event of failure, Proone continues to operate with the existing binary. The only
+way to check the result of the operation is through reestablishing the
+connection to the Proone instance and querying the version of the binary through
+`PRNE_HTBT_OP_HOST_INFO` request.
+
+The protocol leaves room for implementing M2M mechanisms. A Proone instance
+checks if the target host is already infected by attempting to connect to a
+**local back door**(or simply, **LBD**) on the target host. An LBD port is
+served by a submissive Heartbeat client. The future versions of Proone can
+utilise the LBD port to update the binary of the target instance if old one is
+encountered. **proone-htbtclient** can be used to examine and maintain the
+Proone instance via this port.
+
+### Binary Archive and Data Vault
+Proone aims to be a decentralised botnet. To spread without binary distribution
+servers, Proone carries all the executables of arch types it supports. For this,
+a special file structure is designed.
+
+The **Data Vault**("**DVault**") is a binary block containing large and
+sensitive data necessary for operation of Proone. DVault is a kempt version of
+the data table of Mirai. DVault also helps reduce the size of Proone. Each
+executable contains the *.data* section. If there's a long string in the
+program, the value of the string will end up in each *.data* section of the
+executables. Compression leviates this issue but there's a limit because the
+size of data dictionary blocks can only get big. Having a custom *.data* section
+for large data solves this issue at the cost of the size of code for fetching
+and unmasking values from DVault. This implies that, in some cases, storing
+static values in the *.data* section of an ELF is more efficient[^2]. Another
+purpose of DVault is masking sensitive data like `PRNE_DATA_KEY_CNC_TXT_REC` and
+`PRNE_DATA_KEY_CRED_DICT` so that they're not revealed when `strings` command is
+run on the executable or when the process is core dumped. DVault is loaded when
+Proone initialises. The loaded contents remain in memory masked and unmasked
+only when needed.
+
+The contents of DVault are XORed with a 256 byte array of random numbers
+generated on each compilation. This process makes it impossible to compress the
+DVault binary block because of high entropy. Therefore it's not recommended to
+use DVault to store exceptionally large values. This issue may be solved by
+compressing the value separately at the cost of CPU time.
+
+The **Binary Archive**("**BA**") is a binary block containing compressed
+executables and an index of the executables.
+
+## Requirements
+### Targetting Wide Range of Devices and Kernel Configurations
+A number of methods has been employed in efforts to target a wide range of Linux
+devices. The assumption is that there are still devices running old images of
+Linux and targetting these devices means coding up to the standard of old POSIX
+specs and testing under old versions of Linux(namely 2.6.x).
+`_POSIX_C_SOURCE=200112L` macro is defined to meet this requirement. Note that
+using this macro does not give you an error when you accidentally use APIs not
+in the 200112L standard. The compiler will only give you a warning and your code
+will compile just fine. If you happen to use a function that the kernel of the
+host does not support, the syscall will fail with `ENOSYS`. If the feature
+requiring the new API can be silently switched off at runtime, removal of the
+macro is recommended.
+
+The Linux kernel is highly configurable. Pesudo file systems and the device file
+system may not be present on a Linux host since they can be disabled. Disabling
+any of these file systems is unusual for PCs but practical on embedded devices.
+Proone do not assume that these file systems are available on the host and try
+to run without using them if not available.
+
+### Running Lean
+Proone is designed under the assumption that honouring other processes on the
+system will decrease the change of getting caught by system administrators.
+
+Proone is compartmentalised so that it's somewhat immune to syscall fails. This
+design is to counter `ENOMEM` as it runs lean on lean embedded systems. This
+implies that proone can be initialised "half-complete". For example,
+it can be initialised with all the workers running except the Heartbeat worker.
+In this case, proone will be able to infect other devices on the network
+while unable to respond to CNC TXT REC. Another notable case would be an
+instance running without the Recon worker. It will respond to the CNC TXT REC
+and serve the local backdoor connections while unable to infecting the other
+devices on the network. Proone does not reattempt to start the workers it failed
+to run on start. The assumption is that the system is already running with its
+memory full to the brim and it's futile to wait for resource it failed to claim
+as it's likley that the other services on the system will claim the reource at
+some point.
+
+Proone does cooperative multitasking by using **Pthsem** library. This is one
+of many efforts to "run lean" whereby restricting CPU usage to one logical
+thread. This may seem as a huge missed opportunity if Proone scores infecting
+itself onto a beefy multi-core system. Keep in mind that Proone is designed to
+run on resource-scarce embedded devices. Most poorly-designed vulnerable devices
+will be single core, anways. The strategy is getting the most small-powered
+devices infected rather than having a few infected high-performance systems.
+
+### Volatile Operation
+TODO
+
+## Dependencies
+The dependencies for Proone have been kept to absolute necessities. **libssh2**
+is used for the SSH brute force vector. Coupled with libssh2's SSL backend
+is **Mbedtls** for TLS connection to public name servers and the Heartbeat
+protocol. **zlib** is used to implement binary archive. All the libraries are
+compiled with default configurations. **Pthsem** is used for threading.
+
+**libyaml** and **mariadb-connector-c-devel** is required for **hostinfod**
+build. YAML has been chosen for the configuration file format and MariaDB for DB
+backend.
+
+
+[^1]: Maybe in the future when Toybox gains marketshare?
+[^2]: i.e. representing values in code: `int value = 123;`