You can also find out more about the following: EIP-4844Ethereum clients require the ability to calculate and verify KZG obligations. Instead of each client rolling out their own Researchers and developers of cryptography came together to create c-kzg-4844The library is a small C library that has bindings for other languages. The goal was to develop a robust, efficient cryptographic library for all clients. The Protocol Security Research Team at the Ethereum foundation had the chance to review and improve the library. This blog will cover some of the things we do to secure C projects.
Fuzz
Fuzzing, a dynamic testing technique, involves random inputs that are used to find bugs in programs. LibFuzzer The following are some examples of how to get started: afl++ There are two popular frameworks to fuzze C projects. Both engines are evolutionary, in-process, and coverage-guided. For c-kzg-4844, we used LibFuzzer is a great fit for us, as we are already integrated with LLVM Project’s other offerings.
This is the fuzzer you’ve been looking for verify_kzg_proofThe c-kzg4844 has the following functions:
#include "../base_fuzz.h" The COMMITMENT_OFFSET is set to 0. static const size_t Z_OFFSET = COMMITMENT_OFFSET + BYTES_PER_COMMITMENT; static const size_t Y_OFFSET = Z_OFFSET + BYTES_PER_FIELD_ELEMENT; static const size_t PROOF_OFFSET = Y_OFFSET + BYTES_PER_FIELD_ELEMENT; size_t static const INPUT_SIZE=PROOF_OFFSET x BYTES_PER_PROOF int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { InitializeIf (size == INPUTSIZE)(); if (size == INPUT_SIZE) { bool ok verify_kzg_proof( &ok, (const Bytes48 *)(data + COMMITMENT_OFFSET), (const Bytes32 *)(data + Z_OFFSET), (const Bytes32 *)(data + Y_OFFSET), (const Bytes48 *)(data + PROOF_OFFSET), &s ); } Return 0 }
The output is shown here. It would stop running if there was an error. You should be able reproduce the issue.
It’s possible to use differential fuzzing. This is a technique where you fuzze two or three implementations of a single interface, and then compare the results. If the outputs are different for a certain input and you expect them to be identical, then you know that something is wrong. Ethereum is known for its use of this technique, as we prefer to have different implementations. This diversification adds an extra layer of security, as you know that even if one implementation is flawed, the other may not be affected by the same problem.
For KZG libraries, we developed kzg-fuzz This differentially fuzzes the c-kzg4844 (through Golang bindings). go-kzg-4844. So far there hasn’t really been any difference.
Coverage
The next step was to use llvm-profdata The following are some examples of how to get started: llvm-cov You can generate a test coverage report by running the tests. This is an excellent way to verify that code has been executed.”covered”See the test. Check out the coverage You can find an example of this report in the Makefile of c-kzg4844.
This target will be run when (i.e., CoverageIt produces a table which gives a high level overview of the amount of time each function has been executed. The exported functions (static functions) are shown at the top of the table, while the non-exported functions are displayed at the bottom.
The table shows a lot of yellow and some red. You can check the HTML file to see what’s being executed.coverage.htmlThis webpage shows the entire document (including all of its sub-documents) that was created. This page displays the entire source The file highlights the non-executed codes in red. Most of the nonexecuted code in this project deals with error cases that are difficult to test, like memory allocation failures. As an example, consider the following non-executed codes:
The function checks, at the beginning, that the trust setup is large enough for a pair-check. This is because there’s no test case that provides an invalid setup. Because we only use the correct setup for testing, the results of is_monomial_form The error value is never returned and the constant value is always the same.
Profile
Although we don’t recommend it for all projects because c kzg 4844 is a performance-critical tool, we believe that profiling its exported functions can help identify inefficiencies and determine how long the function takes to run. This can identify any inefficiencies that could cause nodes to be DoS. This was done using gperftools Google Performance Tools instead llvm-xray We found it more user-friendly and feature-rich.
This is a simple profile example. my_function. Profilers work by checking the instructions being executed at regular intervals. The profiler may not notice a fast-moving function. You may have to call the function several times in order to reduce this risk. In this case, we’ll callIf (i%2 ==0)If you want to know more about, click here. my_function 1000 times.
#include <gperftools/profiler.h> int task_a(int n) { if (n <= 1) return 1; return task_a(n - 1) * n; } int task_b(int n) { if (n <= 1) return 1; return task_b(n - 2) + n; } void my_function(void) { for (int i = 0; i < 500; i++) { if (i % 2 == 0) { task_a(i); } else { task_b(i); } } } int main(void) { ProfilerStart("example.prof"); for (int i = 0; i < 1000; i++) { my_function(); } ProfilerStop(); Return 0 }
Use ProfilerStart(“<filename>”) The following are some examples of how to get started: ProfilerStop() Mark the parts of your program that you want to profile. It will then write the profiling data to a disk file when recompiled and executed. You can then use pprof This data can be visualized.
The graph is generated using the above command:
Here is a larger example of one c-kzg 4844 function. This is the profile for compute_blob_kzg_proof. As you can tell, this function spends 80% of its time performing Montgomery multiplicities. This is to be expected.
Reverse
View your binary using a reverse engineering tool, such as Ghidra You can also find out more about IDA. These tools will help you to understand how low-level machine codes are translated from high-level constructs. It is helpful to look at your code in this manner, just as reading a document in a different typeface will make your brain interpret sentences differently. Also, it’s useful to know what kind of optimizations are made by your compiler. Rarely, the compiler may optimize something that it considers unnecessary. Watch out for it, this happened in C-kzg4844. some of the tests were being optimized out.
If you look at a decompiled code, there will be no variable names, complex types or comments. These information are not stored in binary code when it is compiled. You will have to reverse-engineer this. You’ll notice that many functions are inlined, several variables declared are consolidated into a buffer and the order in which checks are performed is changed. This is just a compiler optimization and it’s generally okay. It is possible to improve your results by building your binary with DWARF information.
This is an example. blob_to_kzg_commitment Initial appearance is similar to Ghidra
You can do it with a little workYou can also add comments or rename variables. After a couple of minutes, this is how it would look:
Static Analysis
Clang The Clang Static AnalyzerThis is a great static analysis tool, and it can find many bugs that the compiler would miss. The name suggests “static” It examines the code but does not execute it. This is slower but faster than the compiler “dynamic” Analysis tools that execute code
Here’s an example where you forget to include free. Arr The compiler will not identify this, even with all warnings enabled. This code is valid, so the compiler will not detect it even if all warnings are enabled.
#include <stdlib.h> int main(void) { int* arr = malloc(5 * sizeof(int)); Arr[5] = 42; Return 0 }
It is important to note that the word “you” means “you”. unix.Malloc The checker will help you identify the item. Arr wasn’t freed. It’s a little misleading but makes sense when you think about it. The analyzer noticed the memory wasn’t freed after reaching the return statement.
But not all findings are so simple. Clang Static Analyzer discovered this finding in c kzg 4844, when it was first introduced to the project.
It was possible for an unexpected input to shift the value by 32 bits, which is an undefined behavior. The solution to this problem was to limit input with CHECK(log2_pow2(n) != 0) This was not possible. You did a great job, Clang Static Analysis!
Sanitize
Santizers, also known as dynamic analysis tools, are tools that instrument (add instructions to) programs and can highlight issues during execution. These tools are useful in identifying common memory-related mistakes. Clang has several built-in sanitizers. Here are the four that we find to be most useful and easiest to use.
Address
AddressSanitizer (ASan) can detect memory errors such as out-of bounds accesses and double-free. It also detects memory leaks.
This is the same example that was used earlier. It forgot to free Arr This will set the 6th item in a 5-element array. This is a simple example of a heap-buffer-overflow:
#include <stdlib.h> int main(void) { int* arr = malloc(5 * sizeof(int)); Arr[5] = 42; Return 0 }
If you are a member of -fsanitize=address The following error message will be displayed when the code is executed. This is a good indication (a 4-byte written in). You can also read more about it here.). This binary can be disassembled to determine the exact instruction (at main+0x84The problem is caused by ).
Here’s another example of a free item that finds many uses:
#include <stdlib.h> int main(void) { int *arr = malloc(5 * sizeof(int)); free(arr); Return arr[2]; }
It will tell you that a 4 byte read from freed memory is at main+0x8c.
Memory
MemorySanitizer (MSan), is a detector that detects uninitialized values. Here’s an example of a simple program that returns (and reads) a value which is uninitialized:
int main(void) { int data[2]; return data[0]; }
If you are a member of -fsanitize=memory If you execute the code, it will display an error message.
Undefined Behavior
UndefinedBehaviorSanitizer (UBSan), which detects undefined behaviors, refers to situations where the behavior of a program is unpredictable and the standard language does not specify it. This includes accessing memory that is out of bounds, dereferencing invalid pointers, reading variables without initialization, and overflowing a signed integer. Here we will increment INT_MAX What is the definition of undefined behavior?
#include <limits.h> int main(void) { int a = IN_MAX Return a + 1 }
If you are a member of -fsanitize=undefined It will then output the following message, which tells us where the issue is and the conditions:
Thread
ThreadSanitizer (TSan), detects data race, which can happen in multi-threaded applications when two or multiple threads access a memory location shared at the same time. This creates an unpredictable situation that can result in undefined behavior. Here is an example where two threads increase a global Counter-terrorism variable. It’s possible that both threads could increment the variable simultaneously, as there are no locks or semaphores.
#include <pthread.h> int counter = 0, void *increment(void *arg) { (void)arg; for (int i = 0; i < 1000000; i++) counter++; Return NULL } int main(void) { pthread_t thread1, thread2; pthread_create(&thread1, NULL, increment, NULL); pthread_create(&thread2, NULL, increment, NULL); pthread_join(thread1, NULL); pthread_join(thread2, NULL); Return 0 }
If you are a member of -fsanitize=thread If you execute the code, it will display an error message.
This error message informs us of a data race. Two threads are the Increase your income by a few hundred dollars The same 4 bytes are written to at the same time. The memory even tells you how much space is available. Counter-terrorism.
Valgrind
Valgrind The Memcheck built-in tool is the most popular way to identify memory leaks and errors.
The following image is the result of running Valgrind’s tests on c-kzg4844. The red box shows a valid result for A “conditional jump or move [that] depends on uninitialized value(s).”
The following is a list of the most popular ways to contact us identified an edge case You can also find out more about the following: expand_root_of_unity. If the wrong root was provided for unity or width, the loop could break before The following are some of the most recent and popular posts on our website.[width] The value was initialized. In this case, the final test would depend on an initialized value.
static C_KZG_RET expand_root_of_unity( fr_t *out, const fr_t *root, uint64_t width ) { You can find out more about it here.[0] = FR_ONE; You can find out more about it here.[1] = *root; for (uint64_t i = 2; !fr_is_one(&out[i - 1]); i++) { CHECK(i <= width); blst_fr_mul(&out[i], &out[i - 1], root); } CHECK(fr_is_one(&out[width])); Return C_KZG_OK }
Security Review
It’s now time to have a security audit by a reputable group. After the development is stabilized, your codebase has been thoroughly tested and you’ve reviewed it manually multiple times, this will be the right time for a security assessment. It’s not a seal of approval but it will show that your project has at least some security. Remember that there is no perfect security. There is always the risk that vulnerabilities will exist.
Ethereum Foundation signed a contract for go-kzg4844 as well as c-kzg4844. Sigma Prime To conduct a review of security. They were able to produce this report With 8 findings. This vulnerability was found in go-kzg 4844. The BLS12-381 is the library go-kzg 4844 uses. gnark-cryptoThis bug allowed G1 or G2 points that were invalid to be decoded successfully. This bug could have caused a consensus error (a disagreement among implementations) to occur in Ethereum if it had not been corrected.
Bug Bounty
Consider setting up a bug-bounty program if a vulnerability within your project can be exploited to gain profit, as it is with Ethereum. It allows anyone, including security researchers, to report vulnerabilities in exchange for cash. This usually only applies to findings that prove the possibility of an exploit. If you offer reasonable bug bounty payments, then bug finders are more likely to notify you about the bug instead of exploiting or selling it. After the first security audit is completed, we recommend starting your bug-bounty program. Ideally, the cost of the security audit would be less than the bug-bounty payouts.
You can also read our conclusion.
A multi-faceted approach is required to develop robust C projects. This is especially true in the domain of critical blockchains and cryptocurrencies. Due to the inherent weaknesses of the C programming language, it is important that software be resilient. Our experiences and findings will hopefully help you to create resilient software. work With c-kzg 4844, you can gain valuable insights and learn best practices that will help others who are embarking on similar project.
“This article is not financial advice.”
“Always do your own research before making any type of investment.”
Source: blog.ethereum.org