Diving into Jetson Nano
I recently purchased a NVIDIA Jetson Nano developer kit. I wanted to buy this for two reasons specifically
- Its been 2 years working as an MLE, I work with GPU’s on a daily basis. Its time I touch one for real. Feels good to hold it in my hands. surprisingly most of the volume of the board is occupied by the heatsink. (The thing they use to cool down the GPU , we can see its folded metal sheet, to increase the surface area probably, so that cooling down happens faster. there is also space to attach a fan on top of it.)
- I wanted to write Custom GPU kernels. Have always seen this in the Job descriptions of really amazing MLE positions and was thus very curios about it. I still dont know that for what situations would i want to write my own custome GPU kernel and what speedup can i expect from it. I wanted to learn this so that i can contribute to tinygrad as well. To do this its not necessary to own a GPU, but I can be a very compulsive buyer sometimes . Also money spent on learning is not an expenditure , but an investment. I’ll definitely get 100x returns from this investment
For these reasons , I decided to buy the tiniest commercially available GPU known to mankind.
I am in Banglore, so I had to look for NVIDIA distributors here. This is when I came across Fab To Lab . They provide other recreational robotics based hardware also. But, NVIDIA has decided to discontinue the Jetson Dev kit for some reason , so what I have is probably one of the last of it’s species. If you want you can look for other smaller GPU’s (in the range of 30k). My GPU costed me roughly 22.5k . (+2k for accessories).
Note the difference between a Jetson Module , carrier board and a Dev kit.
Module is just the Chip where the processing units are integated. Analogous to what we like the call the Mother board of our PC’s
Carrier board is where the Mother board is mounted , it also has connection for Displays, usb ports and power charging. Dev kit is the combination of the 2 above. But one key difference between a module and the dev kit is that the Module uses NVRAM for RAM, whereas the dev kit used the SD card provided extnally as RAM . For this reason do not use the dev kit in Production environments and applications. The dev kit is just meant for Learning purposes.
After unboxing there were 3 things I had to figure out :
- Power supply
- Burning the SD card with the NVIDIA jetson image and booting the device.
- keyboard, mouse and display connections.
Preparing the SD card :
I purchased a 128 GB san disk Micro sd card separately (1600 Rupees) as suggested in this video. As the SD card was new , I did not have to format it. SD card is used as a hard disk and storage space for the Jetson, and to prepare this disk to be used the Jetson module, you have to burn the Operating system on the card, so that it can be used as a “bootable disk”.
The instructions given on the getting started page are sufficient for this. I wont write them out separately.
What is booting a system or a bootable device
"Boot" refers to the process by which a computer system starts up and initializes its hardware components, loads its operating system (OS), and prepares itself to execute user applications. It's short for "bootstrap," which comes from the phrase "pulling oneself up by one's bootstraps."
When a device is said to be "bootable," it means that it is capable of starting up or "booting" from the storage media connected to it, such as an SD card, a hard drive, or a USB flash drive. In the context of an SD card image, "bootable" means that the contents of the image have been configured in such a way that when the device is powered on, it can read and execute the necessary instructions from the SD card to start up and run the operating system or utility contained within the image.
For example, if you have an SD card with a bootable Linux distribution image written to it, you can insert that SD card into a compatible device (like a Raspberry Pi or a laptop with an SD card reader), power on the device, and it will boot into the Linux operating system contained on the SD card. Similarly, an SD card with a bootable utility image might allow you to boot into a diagnostic or recovery environment.
In summary, "boot" refers to the startup process of a computer system, and "bootable" describes a storage device or medium that contains the necessary components to enable a computer system to start up and run an operating system or utility.
What is burning a device on a SD card or any disk/storage device
"Burning" an SD card with an image typically refers to the process of writing a disk image file onto an SD card in such a way that the SD card becomes bootable and contains the contents of the image file.
Here's a breakdown of the process:
- Disk Image: You start with a disk image file, which is essentially a complete copy of a storage device (such as an operating system installation). This image file contains all the necessary files and data required for the intended purpose, such as an operating system (like Linux distributions) or a bootable utility.
- Burning: "Burning" refers to writing this disk image file onto the SD card. It's called "burning" because, historically, CDs and DVDs were physically "burned" with a laser to encode data onto them. With SD cards and other solid-state storage devices, the term has stuck around, although there's no actual burning involved. Instead, the image file is written bit by bit onto the SD card.
- Bootable SD Card: Once the image file has been successfully written onto the SD card, the SD card becomes bootable. This means that when you insert the SD card into a compatible device (such as a Raspberry Pi, a single-board computer, or an embedded system like the Nvidia Jetson Nano), and you power on the device, it will boot from the contents of the SD card.
"Burning" is commonly done using specialized software or command-line tools designed for this purpose. These tools handle tasks like formatting the SD card, copying the image file, and ensuring that it's written correctly to the card. Examples of such tools include Etcher, Rufus, dd (a command-line utility available on Unix-like operating systems), and others.
Once the burning process is complete, you have a bootable SD card that you can use to boot your device and access the contents of the image file stored on it.

After preparing the SD card and inserting it in my Devce , I plugged in the power supply and connected to the display. Here it extracted all the files and got stuck for a bit on the step which stated :
A start job is running for End-user configuration after initial OEM installation
Surprisingly, disconnecting the power supply and connecting back in did the trick for me , after foolishly waiting for 1.5 hours.
Power supply:
In the documentation it mentions that the device needs 5V ~ 2A power supply. Now the power supply in common India households is roughly 225V , 50Hz (5A) AC current. but our device needs device needs much less Voltage DC current for this. Suprisingly your phone chargers (Adapters to be more specific), can do just the job for you. The power supply will also be written on it if you look more closely. Don’t forget to use a B-type cable pin for the device.
External devices :
Had to purchase a mouse and a keyboard that can be connected through the USB port . (I only have bluetooth keyboard and mouses). For the display, I used an HDMI cable connected to my Ultra HD display.
That’s all
After these steps , it feels good to see the NVIDIA logo pop up on the purple Ubuntu desktop screen. Now onto building projects on this device.
Next steps:
Check if the following softwares and packages are installed on the software or not.
nvidia driver installation
Certainly! Below is a basic CUDA program that you can use to verify whether you can write custom GPU kernels on your Jetson Nano:
#include <stdio.h>
// CUDA kernel to add two arrays
__global__ void addArrays(float *a, float *b, float *c, int n) {
int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < n) {
c[index] = a[index] + b[index];
}
}
int main() {
int n = 10000; // Size of arrays
size_t bytes = n * sizeof(float);
// Allocate memory for arrays on host
float *h_a, *h_b, *h_c;
h_a = (float *)malloc(bytes);
h_b = (float *)malloc(bytes);
h_c = (float *)malloc(bytes);
// Initialize arrays on host
for (int i = 0; i < n; i++) {
h_a[i] = i;
h_b[i] = i;
}
// Allocate memory for arrays on device
float *d_a, *d_b, *d_c;
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);
cudaMalloc(&d_c, bytes);
// Copy arrays from host to device
cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, bytes, cudaMemcpyHostToDevice);
// Define grid and block dimensions
int blockSize = 256;
int gridSize = (n + blockSize - 1) / blockSize;
// Launch kernel
addArrays<<<gridSize, blockSize>>>(d_a, d_b, d_c, n);
// Copy result from device to host
cudaMemcpy(h_c, d_c, bytes, cudaMemcpyDeviceToHost);
// Verify the result
for (int i = 0; i < n; i++) {
if (h_c[i] != h_a[i] + h_b[i]) {
printf("Error: Incorrect result at index %d\\n", i);
break;
}
}
printf("All elements added successfully!\\n");
// Free memory on device
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
// Free memory on host
free(h_a);
free(h_b);
free(h_c);
return 0;
}
This CUDA program adds two arrays a
and b
element-wise and stores the result in array c
. It then verifies the correctness of the result by comparing it with the expected result. If all elements are added correctly, it prints "All elements added successfully!".
To compile and run this CUDA program on your Jetson Nano, you will need the CUDA Toolkit installed. You can compile the program using nvcc
(the NVIDIA CUDA Compiler) and then execute the compiled binary. Here's how you can compile and run the program:
nvcc -o addArrays addArrays.cu
./addArrays
This program will help you verify that you can write and execute custom GPU kernels on your Jetson Nano. If you encounter any issues or have any questions, feel free to ask!
This program will help you verify that you can write and execute custom GPU kernels on your Jetson Nano. If you encounter any issues or have any questions, feel free to ask!
nvcc -o addArrays addArrays.cu
./addArrays
To compile and run this CUDA program on your Jetson Nano, you will need the CUDA Toolkit installed. You can compile the program using nvcc
(the NVIDIA CUDA Compiler) and then execute the compiled binary. Here's how you can compile and run the program:
This CUDA program adds two arrays a
and b
element-wise and stores the result in array c
. It then verifies the correctness of the result by comparing it with the expected result. If all elements are added correctly, it prints "All elements added successfully!".
#include <stdio.h>
// CUDA kernel to add two arrays
__global__ void addArrays(float *a, float *b, float *c, int n) {
int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < n) {
c[index] = a[index] + b[index];
}
}
int main() {
int n = 10000; // Size of arrays
size_t bytes = n * sizeof(float);
// Allocate memory for arrays on host
float *h_a, *h_b, *h_c;
h_a = (float *)malloc(bytes);
h_b = (float *)malloc(bytes);
h_c = (float *)malloc(bytes);
// Initialize arrays on host
for (int i = 0; i < n; i++) {
h_a[i] = i;
h_b[i] = i;
}
// Allocate memory for arrays on device
float *d_a, *d_b, *d_c;
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);
cudaMalloc(&d_c, bytes);
// Copy arrays from host to device
cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, bytes, cudaMemcpyHostToDevice);
// Define grid and block dimensions
int blockSize = 256;
int gridSize = (n + blockSize - 1) / blockSize;
// Launch kernel
addArrays<<<gridSize, blockSize>>>(d_a, d_b, d_c, n);
// Copy result from device to host
cudaMemcpy(h_c, d_c, bytes, cudaMemcpyDeviceToHost);
// Verify the result
for (int i = 0; i < n; i++) {
if (h_c[i] != h_a[i] + h_b[i]) {
printf("Error: Incorrect result at index %d\\n", i);
break;
}
}
printf("All elements added successfully!\\n");
// Free memory on device
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
// Free memory on host
free(h_a);
free(h_b);
free(h_c);
return 0;
}
Certainly! Below is a basic CUDA program that you can use to verify whether you can write custom GPU kernels on your Jetson Nano:
These methods should help you verify whether the NVIDIA device drivers are installed on your Jetson Nano. If you encounter any issues or need further assistance, feel free to ask!
- Check for CUDA Toolkit Installation: If you have installed the CUDA Toolkit, it often comes bundled with NVIDIA GPU drivers. You can check if CUDA is installed and configured correctly by running:
nvcc --version
This command will display the version of the CUDA Toolkit installed.
- Check the NVIDIA Driver Version:
You can check the version of the installed NVIDIA driver by examining the contents of
/proc/driver/nvidia/version
:
cat /proc/driver/nvidia/version
This will display information about the installed NVIDIA driver version.
- Check for NVIDIA Driver Files:
You can also check for the presence of NVIDIA driver files in the system. NVIDIA drivers are typically installed in
/usr/lib
,/usr/lib64
, or similar directories. You can check for the presence of NVIDIA-related files using commands like:
ls /usr/lib | grep -i nvidia
ls /usr/lib64 | grep -i nvidia
If NVIDIA drivers are installed, you should see files and directories related to NVIDIA.
- Check for NVIDIA Kernel Modules: You can check if the NVIDIA kernel modules are loaded by running the following command:
lsmod | grep nvidia
If the NVIDIA kernel modules are installed and loaded, you should see output indicating their presence.
To verify whether the device drivers are installed on your Jetson Nano, you can use alternative methods. One common method is to check the presence of certain files or directories associated with the NVIDIA drivers. Here are a few steps you can take:
Apologies for the confusion. You're correct that nvidia-smi
is not available on Jetson Nano devices since they use different GPU architectures compared to NVIDIA's desktop GPUs.
To reinstall NVIDIA drivers on your Jetson Nano, you can follow these steps:
- Download the Latest NVIDIA Drivers: Visit the NVIDIA website or use the NVIDIA SDK Manager to download the latest version of the NVIDIA drivers compatible with your Jetson Nano.
- Prepare your Jetson Nano: Make sure your Jetson Nano is powered on and connected to the internet.
- Remove Existing NVIDIA Drivers: You can remove existing NVIDIA drivers by running the following commands in the terminal:
sudo apt-get purge nvidia*
sudo apt-get autoremove
This will remove all NVIDIA-related packages from your system.
- Install the New NVIDIA Drivers: Navigate to the directory where you downloaded the new NVIDIA driver package and install it using the following commands:
chmod +x <driver_package_name>.run
sudo ./<driver_package_name>.run
Replace <driver_package_name>
with the actual name of the NVIDIA driver package you downloaded. This will make the driver installer executable and then run it with root privileges.
- Follow the Installation Wizard: The NVIDIA driver installer will launch a graphical or text-based installation wizard. Follow the prompts to install the drivers. Make sure to review any license agreements and choose appropriate installation options.
- Reboot your System: After the installation is complete, reboot your Jetson Nano to apply the changes and ensure that the new NVIDIA drivers are loaded correctly.
- Verify Installation:
Once your system has rebooted, you can verify that the NVIDIA drivers are installed correctly by running
nvidia-smi
in the terminal. If everything is set up correctly, you should see information about your NVIDIA GPU.
By following these steps, you should be able to reinstall the NVIDIA drivers on your Jetson Nano. Make sure to use the latest drivers compatible with your hardware and software setup to ensure optimal performance and compatibility.
If you're not getting any output when running nvidia-smi
, it's likely because the NVIDIA System Management Interface (nvidia-smi) utility is not installed or not functioning properly on your Jetson Nano. This could happen if the NVIDIA drivers are not installed correctly or if there are issues with the installation.
Here are some steps you can take to troubleshoot and potentially resolve the issue:
- Check NVIDIA Driver Installation: Ensure that the NVIDIA drivers are installed correctly on your Jetson Nano. If not, you may need to reinstall them using the appropriate method for your setup.
- Reboot the System: Sometimes, a simple reboot can help resolve issues with driver initialization. Try rebooting your Jetson Nano and see if
nvidia-smi
works after the reboot.
- Check for Errors: If there were any errors during the NVIDIA driver installation process, address those errors. Look for any error messages or warnings that might indicate issues with the driver installation.
- Verify GPU Detection: Confirm that your Jetson Nano is detecting the NVIDIA GPU by running
lspci | grep -i nvidia
and checking for output. If the GPU is not detected, it could indicate a hardware issue or a problem with the driver installation.
- Check for Compatibility: Ensure that the version of the NVIDIA drivers you have installed is compatible with your Jetson Nano hardware and software setup.
- Update NVIDIA Drivers: If you're using an older version of the NVIDIA drivers, consider updating to the latest version to see if it resolves the issue.
- Check System Logs: Look for any relevant error messages or logs that might provide more information about why
nvidia-smi
is not working. Check system logs such as/var/log/syslog
or/var/log/messages
for any NVIDIA-related errors.
- Reinstall NVIDIA Drivers: If none of the above steps resolve the issue, you may need to reinstall the NVIDIA drivers from scratch to ensure a clean installation.
By following these steps, you should be able to diagnose and potentially resolve the issue with nvidia-smi
not providing any output on your Jetson Nano.
To reinstall NVIDIA drivers on your Jetson Nano, you can follow these steps:
forum on nvidia-smi not being there
This difference in capabilities between the GPU and the CPU exists because they are designed with different goals in mind. While the CPU is designed to excel at executing a sequence of operations, called a thread, as fast as possible and can execute a few tens of these threads in parallel, the GPU is designed to excel at executing thousands of them in parallel (amortizing the slower single-thread performance to achieve greater throughput).
The GPU is specialized for highly parallel computations and therefore designed such that more transistors are devoted to data processing rather than data caching and flow control. The schematic Figure 1 shows an example distribution of chip resources for a CPU versus a GPU.
Figure 1 The GPU Devotes More Transistors to Data Processing

def my func