Figure 1 presents an overview of our architecture. The first component of our analysis platform is the firmware data store, which stores the unmodified firmware files that have been retrieved either by the web crawler or that have been submitted through the public web interface. When a new file is received by the firmware data store, it is automatically scheduled to be processed by the ana- lysis cloud. The analysis cloud consists of a master node, and a number of worker and hash cracking nodes. The master node distributes unpacking jobs to the worker nodes (Figure 2), which unpack and analyze firmware images. Hash cracking nodes process password hashes that have been found during the analysis, and try to find the corresponding plaintext passwords. Apart from co- ordinating the worker nodes, the master node also runs the correlation engine and the data enrichment system modules. These modules improve the reports with results from the cross-firmware analysis. The analysis cloud is where the actual analysis of the firmware takes place. Each firmware image is first submitted to the master node. Subsequently, worker nodes are responsible for unpacking and analyzing the firm- ware and for returning the results of the analysis back to the master node. At this point, the master node will submit this information to the reports database. If there were any uncracked password hashes in the analyzed firmware, it will additionally submit those hashes to one of the hash cracking nodes which will try to recover the plaintext passwords. It is important to note that only the results of the ana- lysis and the meta-data of the unpacked files are stored in the database. Even though we do not currently use the extracted files after the analysis, we still archive them for future work, or in case we want to review or enhance a specific set of analyzed firmware images. The architecture contains two other components: the correlation engine and the data enrichment system. Both of them fetch the results of the firmware analysis from the reports database and perform additional tasks. The correlation engine identifies a number of “interesting” files and tries to correlate them with any other file present in the database. The enrichment system is responsible for enhancing the information about each firmware image by performing online scans and lookup queries (e.g., de- tecting vendor name, device name/code and device cate- gory). In the remainder of this section we describe each step of the firmware analysis in more detail so that our exper- iments can be reproduced.