The FOSSi Foundation is happy to introduce our Google Summer of Code Class of 2021 projects. This year we are grateful that we have been granted eleven slots by Google to support projects and students. We are thankful for all mentors who volunteered to supervise students, and we’re looking forward to a great summer working together on Free and Open Source Silicon projects.

These projects are our “GSoC Class of 2021”. Please give our students a warm welcome!

Virtual FPGA Lab (Bala Dhinesh)

Mentored by Kunal Ghosh, Ákos Hadnagy, and Steve Hoover

Field-Programmable Gate Array(FPGA) is a hardware circuit that a user can program to carry out logical operations. FPGAs are beneficial for prototyping application-specific integrated circuits (ASICs) or processors. The advantage of FPGA being energy-efficient, flexible to reprogram, support parallelism, decreased latency made them widely used in many applications. But the flexibility of FPGAs comes at the price of the difficulty of reprogramming the circuit. FPGA’s are a bit costly and difficult to learn for beginners. Also, students don’t have access to physical FPGA Lab classes in their curriculum amidst this pandemic situation. So there is a massive demand in having an alternative option of having an online simulator for FPGA curriculum development. This project aims to solve the problem by taking advantage of the VIZ Visualization feature in the Makerchip platform and provide visualizations of basic peripherals of an FPGA, thereby mimicking the physical lab experience.

Bring up CV32E40P AI accelerator on FPGA (Veronia Iskandar)

Mentored by Jeremy Bennett and William Jones

As Artificial Intelligence and Machine learning have become ubiquitous in many fields, so too has the need for hardware that effectively supports AI operations. In the world of open source, the RISC-V instruction set architecture (ISA) has been gaining much attention in industry and academia as an open source and (open standard) way of producing any custom instruction set. Indeed, one particular advantage of RISC-V is the ability to add custom instruction extensions to the processor targeting specific applications. This project builds upon a previously-implemented joint project between Embecosm and Southampton university, which recently developed an open source ISA extension for a RISC-V core to accelerate neural network inference acceleration. A CV32E40P RISC-V core was extended with an accelerator to handle vector instructions to speed up the inference of neural networks. This project saw great success, showing a 5-fold increase in performance on the TinyMLPerf benchmark. The verification of the AI vector extensions project currently exists only as a Verilator model. This limits the measurement of performance to cycle counts, and does not provide insight to any impact on clock speed in actual silicon. Furthermore, the current project has no pipeline and restricts all operations to be “single cycle”. This project aims to tackle these issues, by bringing up the existing work on the Nexys A7 FPGA platform. In summary, this work includes the verification of a RISC-V-based AI accelerator and its implementation on an FPGA. This will demonstrate how well the existing work translates into the real-world results, and give insights into what it takes to create hardware that optimally performs AI and Machine Learning operations.

Multi-Level TLB Support for Ariane Core (Nazerke Turtayeva)

Mentored by Nils Wistoff and Jonathan Balkind

With sweeply increasing computing speed, CPUs might spend most of their lifetime waiting for instruction than actual computation. Especially, a double reference to main memory, first, for address fetching, then for accessing is long-delayed. Translation Lookahead Buffers(TLBs) can help to keep the frequently used address translations near the core. Hence, they amortize the latency of memory access and increase performance. This proposal aims to enhance the Ariane (CVA6) TLB structure to a multi-level. It will extend the core with L2 TLB on top of the existing L1 TLBs. Here, Ariane (CVA6) is the application class and Linux bootable RISC-V CPU Core developed by ETH Zurich and the University of Bologna Researchers.

WARP-V manycore in the Cloud (Vineet Jain)

Mentored by Ákos Hadnagy, Steve Hoover, and Shivam Potdar

WARP-V is a highly configurable, adaptable, and flexible open-source CPU core generator that supports various ISA like MIPS, RISC-V and can support even custom ones. It can be tuned as per requirement from Low Power, Frequency 1-cycle FPGA implementation to High-Frequency 6-cycle ASIC Design. It is designed using an emerging “Transaction-Level” methodology, TL-Verilog. Recently, FPGAs gained their popularity at the data centers to perform tasks in the area of cloud computing and accelerating compute-intensive cloud workloads, but it requires access to expensive tools, hardware and demands a wide range of expertise over it. 1st CLaaS(Custom Logic as a Service) is an emerging generic framework that helps in eliminating these up-front costs by providing support for developing custom hardware accelerators and integrating them with web applications and cloud infrastructure, bringing it within reach of everyone. This Project harnesses these advantages of WARP-V and 1st CLaaS by providing support for the multicore NoC design, build a custom kernel interface, and accelerate the complex computation by deploying it on the cloud. It will not only lead to a highly configurable many-core hardware accelerator in the cloud but will also drive the industry toward FPGA-accelerated web applications and cloud computing, also demonstrating the flexibility of TL-Verilog to motivate the industry toward better design methodology.

M-extension support for SERV (Zeeshan Rafique)

Mentored by Olof Kindgren and Stefan Wallentowitz

SERV is a bit-serial RISC-V CPU which is famous because of it’s optimized area. It supports RV32I instruction set with few control and status registers to run the basic RTOS on it, currently it can run Zephyr OS. It has a RISC-V formal interface for verification and it is formally verified with RISC-V compliance tests. It has an ALU that performs one bit operation which means a single instruction could take 32 or more cycles to write back. This project will add M-extension support with SERV. The following steps will be considered while implementing the logic in hardware, the first step is to extend the decoder of SERV to decode the multiplication and division instructions. The second step is to make a simple ready/valid interface which allows SERV and Multiplication and Division Unit(MDU) to communicate with each other, it will take the operands from the SERV and perform the operation and then send back the result of the operation to SERV, the third step is to implement the hardware for M-extension and finally the verification of the whole system with integrated M-extension.

FuseSoC Integration of BaseJump STL (Adithya Sunil)

Mentored by Olof Kindgren, Dan Petrisko, and Michael Taylor

BaseJump STL is to the hardware world as C++ STL is to the software world. It is a comprehensive hardware library for SystemVerilog that seeks to contain all commonly used hardware primitives. FuseSoC is a package manager and set of build tools for reusable hardware building blocks facilitating the sharing of designs between projects and reusing open IP cores. The objective of this project is to provide FuseSoc with a full-featured standard template library, so that new projects can directly reuse these hand-optimized IP cores rather than starting from scratch. FuseSoC makes use of core files that reference the provider, file sets and default targets allowing for the reuse of IP cores in the process of creating, building and simulating SoC solutions. This project will involve porting SystemVerilog modules as well as the testing infrastructure provided by BaseJump STL to work with FuseSoC.

TensorCore Extension for Deep Learning (Nitin Mishra)

Mentored by Steve Hoover and Theodore Omtzigt

Deep Learning continues to be a key application for a range of services nowadays - from Advanced Driver Assistance Systems(ADAS) in autonomous vehicles, to data centers and embedded applications like smart phones and drones. The core operation in these Deep learning architectures are fused dot products and matrix multiplication. Owing to the huge memory and compute demands of these applications, numerous hardware architectures have been proposed to handle the memory and compute requirements of these DeepNeural Networks (like Conv2d, FC, etc.) such as GPUs, custom accelerators like TPU and custom ISAs or ISA extensions. This project is focused on using the ISA extension technique. The objective of this project is to add a tensor core extension to a RISC-V processor like WARP-V and customize an existing spec to work as a deep learning accelerator. This project will also serve as a proof-point for a Deep Learning research platform to experiment with tensor operators and custom number systems. The focus on delivering a fused-dot product operator is key in this quest. We would explore and learn what the timing and resource utilization would be of such designs.This design can be later used/modified to study tensor core based designs to accelerate Deep learning workloads at greater scale.

Formal verification of mor1kx (Harshitha S)

Mentored by Stafford Horne and Stefan Wallentowitz

The mor1kx processor is an implementation of the OpenRISC 1000 CPU Architecture. With the recent maturing of Yosys formal verification tools it is now possible for us to provide formal verification for the OpenRISC cores like mor1kx. This project will be to start to formally verify the subsystems of the mor1kx OpenRISC implementation. The goal of this project to to verify the mor1kx implementation is correct, find and fix bugs and make the CPU implementation more stable.

SkyParrot: Preparing BlackParrot for Open-Source Fabrication using Google-Skywater 130nm PDK (Lakshmi S)

Mentored by Dan Petrisko and Michael Taylor

BlackParrot is a tiny, modular, Linux-capable, open-source multicore that encourages external contributions and strives for infrastructure agnosticism. It has been previously fabricated successfully on commercial technologies TSMC 40nm and GlobalFoundaries 12nm. Last year, Google released the Google-Skywater 130nm Open-Source PDK which opened new windows in the field of Semiconductor Design and Manufacturing, enabling any individual to design their own IC and have a chance at getting it fabricated onto a chip. To be able to successfully prepare and fabricate a reliable, Linux-capable open-source processor core like Black Parrot using the 130nm Google-Skywater technology would be a key milestone for the fully open-source technology node. The aim of this project is to achieve this: to get a single-core implementation of BlackParrot fabrication ready, on the Google-Skywater 130nm node.

Block-Based Circuit Design (Ninad Jangle)

Mentored by Steve Hoover, Gayatri Mehta, and Adam Ratzman

Block-Based Circuit Design is a solution to counter the complexity of Circuit Design. We aim to develop and deploy a Block-Based TL-Verilog solution for developers and new entrants to the sphere. Powered by Blockly from Google, it will deliver a simple, concise, and intuitive gateway to Circuit Design.

Parallelising Verilog RTL Simulations Using MPI (Guillem Lopez Paradis)

Mentored by Jonathan Balkind and Stefan Wallentowitz

This project will focus on a current problem in the Hardware community: The speed of RTL Simulations. These types of simulations are a necessary step in any hardware design. It is intrinsically a hard problem, and even commercial simulators run slow compared to the real hardware, which can be several orders of magnitude faster. On the open-source side, Verilator is one of the most famous players and offers a competitive simulation speed time compared to the closed-source ones. We will base our work on this tool. This project will consider that modern processors rely on Network-On-Chip (NoC) to communicate several cores in the same chip. The idea would be to use this hardware partition into consideration and make a stand-alone simulation per core and communicate through OpenMPI. This could potentially enable the simulation of a 1000-core processor, something currently unmanageable.