In an effort to rein in illicit fishing, researchers have unveiled a new open-source AI model that can accurately identify what virtually all of the world’s seafaring vessels are doing, including whether a boat is potentially fishing illegally. Seattle-based Ai2 (the Allen Institute for AI) recently released a lightweight model named Atlantes to analyze more than five billion GPS signals a…
]]>Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…
]]>Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual understanding to large language models (LLMs) through the use of a vision encoder. These initial VLMs were limited in their abilities, only able to understand text and single image inputs. Fast-forward a few years and VLMs are now capable of…
]]>The release of NVIDIA Video Codec SDK 13.0 marks a significant upgrade, adding support for the latest-generation NVIDIA Blackwell GPUs. This version brings a wealth of improvements aimed at elevating both video encoding and decoding capabilities. From enhanced compression efficiency to better throughput and encoding quality, SDK 13.0 addresses the ever-evolving demands of the video ecosystem.
]]>NVIDIA announces the implementation of Multi-View High Efficiency Video Coding (MV-HEVC) encoder in the latest NVIDIA Video Codec SDK release, version 13.0. This significant update marks a major leap forward in hardware-accelerated, multi-view video compression. It offers enhanced compression efficiency and quality for stereoscopic and 3D video applications as compared to simulcast encoding.
]]>From mitigating climate change to improving disaster response and environmental monitoring, AI is reshaping how we tackle critical global challenges. Advancements in fast, high-resolution climate forecasting, real-time monitoring, and digital twins are equipping scientists, policy-makers, and industry leaders with data-driven tools to understand, plan for, and respond to a warming planet.
]]>Explore visually perceptive AI agents, the latest vision AI technologies, hands-on training, and inspiring deployments.
]]>Master prompt engineering, fine-tuning, and customization to build video analytics AI agents.
]]>Experience high-performance inference, usability, intuitive APIs, easy debugging with eager mode, clear error messages, and more.
]]>Explore the latest advancements in academia, including advanced research, innovative teaching methods, and the future of learning and technology.
]]>Researchers studying cancer unveiled a new AI model that provides cellular-level mapping and visualizations of cancer cells, which scientists hope can shed light on how—and why—certain inter-cellular relationships triggers cancers to grow. BioTuring, a San Diego-based startup, announced an AI model that can quickly create detailed visualizations of cancerous tumors—at single-cell resolution.
]]>A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK (Multimodal transformer with Unified maSKed modeling), the research aims to advance precision oncology, tailoring treatment plans to each patient based on their unique medical data. “Multimodal foundation models are a new frontier in…
]]>Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is using AI-powered cellular profiling to bridge these gaps and advance personalized medicine. At NVIDIA GTC 2024, Pernice shared insights from his lab’s work with diseases like Charcot-Marie-Tooth (CMT) and mitochondrial disorders.
]]>An advanced deep-learning model that automates X-ray analysis for faster and more accurate assessments could transform spinal health diagnostics. Capable of handling even complex cases, the research promises to help doctors save time, reduce diagnostic errors, and improve treatment plans for patients with spinal conditions like scoliosis and kyphosis. “Although spinopelvic alignment analysis…
]]>With as many as 800,000 forgotten oil and gas wells scattered across the US, researchers from Lawrence Berkeley National Laboratory (LBNL), have developed an AI model capable of accurately locating, at scale, wells that may be leaking toxic chemicals and greenhouse gases, like methane, into the environment. The model is designed to identify many of the roughly 3.7M oil and gas wells dug in…
]]>This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications and their development workflow are typically built on fixed-function, limited models that are designed to detect and identify only a select set of predefined objects. With generative AI, NVIDIA NIM microservices…
]]>Each year, the world recycles only around 13% of its two billion-plus tons of municipal waste. By 2050, the world’s annual municipal waste will reach 3.88B tons. But the global recycling industry is far from efficient. Annually, as much as $120B of potentially recoverable plastic—let alone paper or metals—ends up in landfills rather than within new products made with recycled materials.
]]>Researchers from Weill Cornell Medicine have developed an AI-powered model that could help couples undergoing in vitro fertilization (IVF) and guide embryologists in selecting healthy embryos for implantation. Recently published in Nature Communications, the study presents the Blastocyst Evaluation Learning Algorithm (BELA). This state-of-the-art deep learning model evaluates embryo quality and…
]]>Now available in preview, NVIDIA VILA is an advanced multimodal VLM that provides visual understanding of multi-images and video.
]]>As MONAI celebrates its fifth anniversary, we’re witnessing the convergence of our vision for open medical AI with production-ready enterprise solutions. This announcement brings two exciting developments: the release of MONAI Core v1.4, expanding open-source capabilities, and the general availability of VISTA-3D and MAISI as NVIDIA NIM microservices. This dual release reflects our…
]]>Action recognition models such as PoseClassificationNet have been around for some time, helping systems identify and classify human actions like walking, waving, or picking up objects. While the concept is well-established, the challenge lies in building a robust computer vision model that can accurately recognize the range of actions across different scenarios that are domain- or use case…
]]>Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system that can answer questions about video and image content? This presents a far more complex task. Traditional video analytics tools struggle due to their limited functionality and a narrow focus on predefined objects.
]]>The new release introduces Python support in Service Maker to accelerate real-time multimedia and AI inference applications with a powerful GStreamer abstraction layer.
]]>NVIDIA JetPack has continuously evolved to offer cutting-edge software tailored to the growing needs of edge AI and robotic developers. With each release, JetPack has enhanced its performance, introduced new features, and optimized existing tools to deliver increased value to its users. This means that your existing Jetson Orin-based products experience performance optimizations by upgrading to…
]]>Your eyes could hold the key to unlocking early detection of Alzheimer’s and dementia, with a groundbreaking AI study. Called Eye-AD, the deep learning framework analyzes high-resolution retinal images, identifying small changes in vascular layers linked to dementia that are often too subtle for human detection. The approach offers a rapid, non-invasive screening for cognitive decline…
]]>A new deep learning model could reduce the need for surgery when diagnosing whether cancer cells are spreading, including to nearby lymph nodes—also known as metastasis. Developed by researchers from the University of Texas Southwestern Medical Center, the AI tool analyzes time-series MRIs and clinical data to identify metastasis, providing crucial, noninvasive support for doctors in treatment…
]]>A new cell-phone-sized device—which can be deployed in vast, remote areas—is using AI to identify and geolocate wildlife to help conservationists track endangered species, including wolves around Yellowstone National Park. The battery-powered devices—dubbed GrizCams—are designed by a small Montana startup, Grizzly Systems. Together with biologists, they’re deploying a constellation of the…
]]>Federated learning is revolutionizing the development of autonomous vehicles (AVs), particularly in cross-country scenarios where diverse data sources and conditions are crucial. Unlike traditional machine learning methods that require centralized data storage, federated learning enables AVs to collaboratively train algorithms using locally collected data while keeping the data decentralized.
]]>In the field of automotive vehicle software development, more large-scale AI models are being integrated into autonomous vehicles. The models range from vision AI models to end-to-end AI models for autonomous driving. Now the demand for computing power is sharply increasing, leading to higher system loads that can have a negative impact on system stability and latency.
]]>Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data processing, and emerging technologies like neural radiance fields (NeRFs) and Gaussian splatting are significantly enhancing the capabilities of reality capture. These technologies are revolutionizing interactions with and analyses of the…
]]>Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft’s TuringMM visual embedding model that maps images and text into a shared high-dimensional space. Operating on billions of images across the web, performance is critical. This post details efforts to optimize the TuringMM pipeline using NVIDIA…
]]>NV-CLIP, a cutting-edge multimodal embeddings model for image and text, is now generally available.
]]>Developers in the fields of image-guided surgery and surgical vision face unique challenges in creating systems and applications that can significantly improve surgical workflows. One such challenge is efficiently combining multi-modal imaging data, such as preoperative 3D patient images with intra-operative video. This is key to providing surgeons with real-time…
]]>Some of Africa’s most resource-constrained farmers are gaining access to on-demand, AI-powered advice through a multimodal chatbot that gives detailed recommendations about how to increase yields or fight common pests and crop diseases. Since February, farmers in the East African nation of Malawi have had access to the chatbot, named UlangiziAI, through WhatsApp on mobile phones.
]]>By 2030, John Deere aims for fully autonomous farming, addressing global challenges like labor shortages, sustainability, and food security. Their AI and robotics solutions make farming more efficient and profitable, reduce environmental impact, lower carbon footprints, and promote biodiversity. In this session, Chris Padwick, director of Machine Learning and Computer Vision at John Deere…
]]>Data loading is a critical aspect of deep learning workflows, whether you’re focused on training or inference. However, it often presents a paradox: the need for a highly convenient solution that is simultaneously customizable. These two goals are notoriously difficult to reconcile. One of the traditional solutions to this problem is to scale out the processing and parallelize the user…
]]>Today, over 80% of internet traffic is video. This content is generated by and consumed across various devices, including IoT gadgets, smartphones, computers, and TVs. As pixel density and the number of connected devices grow, continued investment in fast, efficient, high-quality video encoding and decoding is essential. The latest NVIDIA data center GPUs, such as the NVIDIA L40S and NVIDIA…
]]>Machine Learning algorithms are beginning to revolutionize modern agriculture. Enabling farmers to combat pests and diseases in real time, the technology is improving crop production and profits, while reducing waste, greenhouse gas emissions, and pesticide use. Around 6% of the world’s CO2 emissions come from farming. And every year, up to 40% of crops are lost due to pests and disease.
]]>An AI-powered remote sensing study offers a dynamic new tool for global ocean cleanup efforts. Detailed in the ISPRS Journal of Photogrammetry and Remote Sensing, the breakthrough unveils MariNeXt, a deep-learning framework that detects and identifies marine pollution using high-resolution Sentinel-2 imagery. MariNeXt could revolutionize how resource managers and agencies globally monitor and…
]]>A recent study introduced a cutting-edge AI-powered pathology platform that can help doctors diagnose and evaluate lung cancer in patients quickly and accurately. Developed by a team of researchers at the University of Cologne’s Faculty of Medicine and University Hospital Cologne, the tool provides fully automated and in-depth analysis of benign and cancerous tissues, for faster and more…
]]>Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. They operate by mapping a random sample from a high-dimensional space, conditioned on a user-provided text prompt, through a series of denoising steps. This results in a representation of the corresponding image, . These models can also be used for more complex tasks such as image…
]]>NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune them with your own data, and optimize the models for specific use cases without needing deep AI expertise. TAO integrates seamlessly with the NVIDIA hardware and software ecosystem, providing tools for efficient AI model training…
]]>This post is the third in a series on building multi-camera tracking vision AI applications. We introduce the overall end-to-end workflow and fine-tuning process to enhance system accuracy in the first part and second part. NVIDIA Metropolis is an application framework and set of developer tools that leverages AI for visual data analysis across industries. Its multi-camera tracking reference…
]]>Learn how to build high-performance solutions with NVIDIA visual AI agents that help streamline operations across a range of industries.
]]>New research aims to revolutionize video accessibility for blind or low-vision (BLV) viewers with an AI-powered system that gives users the ability to explore content interactively. The innovative system, detailed in a recent paper, addresses significant gaps in conventional audio descriptions (AD), offering an enriched and immersive video viewing experience. “Although videos have become an…
]]>California beaches are becoming safer with a new AI-powered shark detection system. Known as SharkEye, the technology identifies sharks near shorelines in real time and sends text alerts to public safety officials, lifeguards, and the community. This innovative AI-driven system, developed by the Benioff Ocean Science Laboratory (BOSL) at the University of California, Santa Barbara…
]]>Over 300M computed tomography (CT) scans are performed globally, 85M in the US alone. Radiologists are looking for ways to speed up their workflow and generate accurate reports, so having a foundation model to segment all organs and diseases would be helpful. Ideally, you’d have an optimized way to run this model in production at scale. NVIDIA Research has created a new foundation model to…
]]>An exciting breakthrough in AI technology—Vision Language Models (VLMs)—offers a more dynamic and flexible method for video analysis. VLMs enable users to interact with image and video input using natural language, making the technology more accessible and adaptable. These models can run on the NVIDIA Jetson Orin edge AI platform or discrete GPUs through NIMs. This blog post explores how to build…
]]>Large-scale, use–case-specific synthetic data has become increasingly important in real-world computer vision and AI workflows. That’s because digital twins are a powerful way to create physics-based virtual replicas of factories, retail spaces, and other assets, enabling precise simulations of real-world environments. NVIDIA Isaac Sim, built on NVIDIA Omniverse, is a fully extensible…
]]>Full fine-tuning (FT) is commonly employed to tailor general pretrained models for specific downstream tasks. To reduce the training cost, parameter-efficient fine-tuning (PEFT) methods have been introduced to fine-tune pretrained models with a minimal number of parameters. Among these, Low-Rank Adaptation (LoRA) and its variants have gained considerable popularity because they avoid additional…
]]>NVIDIA Video Codec SDK provides a comprehensive set of APIs for hardware-accelerated video encode and decode on Windows and Linux. The 12.2 release improves video quality for high-efficiency video coding (HEVC). It offers a significant reduction in bit rates, particularly for natural video content. This post details the following new features: The lookahead level can help analyze…
]]>SyncTwin GmbH, a company that builds software to optimize production, intralogistics, and assembly, is on a mission to unlock industrial digital twins for small and medium-sized businesses (SMBs). While SyncTwin has helped major global companies like BMW minimize costs and downtime in their factories with digital twins, they are now shifting their focus to enable manufacturing businesses…
]]>Maritime startup Orca AI is pioneering safety at sea with its AI-powered navigation system, which provides real-time video processing to help crews make data-driven decisions in congested waters and low-visibility conditions. Every year, thousands of massive 100-million-pound vessels, ferrying $14T worth of goods, cross the world’s oceans and waterways, fighting to keep to tight deadlines.
]]>Synthetic data in medical imaging offers numerous benefits, including the ability to augment datasets with diverse and realistic images where real data is limited. This reduces the costs and labor associated with annotating real images. Synthetic data also provides an ethical alternative to using sensitive patient data, which helps with education and training without compromising patient privacy.
]]>As vision AI complexity increases, streamlined deployment solutions are crucial to optimizing spaces and processes. NVIDIA accelerates development, turning ideas into reality in weeks rather than months with NVIDIA Metropolis AI workflows and microservices. In this post, we explore Metropolis microservices features: Managing and automating infrastructure with AI is…
]]>Intelligent Transportation Systems (ITS) applications are becoming increasingly valuable and prevalent in modern urban environments. The benefits of using ITS applications include: Importantly, these systems need to process information at the edge for reliable bandwidth, privacy, real-time analytics, and more. This post explains how to use the new Jetson Platform Services from…
]]>The era of AI robots powered by physical AI has arrived. Physical AI models understand their environments and autonomously complete complex tasks in the physical world. Many of the complex tasks—like dexterous manipulation and humanoid locomotion across rough terrain—are too difficult to program and rely on generative physical AI models trained using reinforcement learning (RL) in simulation.
]]>MediaTek is teaming with NVIDIA to integrate NVIDIA TAO training and pretrained models into its development workflow, bringing advanced AI and visual perception to billions of IoT edge devices.
]]>NVIDIA Holoscan is the NVIDIA domain-agnostic multimodal real-time AI sensor processing platform that delivers the foundation for developers to build their end-to-end sensor processing pipeline. NVIDIA Holoscan SDK features include: Holoscan SDK can be used to build streaming AI pipelines for a range of industries and use cases, including medical devices, high-performance computing at…
]]>NVIDIA JetPack SDK powers NVIDIA Jetson modules, offering a comprehensive solution for building end-to-end accelerated AI applications. JetPack 6 expands the Jetson platform’s flexibility and scalability with microservices and a host of new features. It’s the most downloaded version of JetPack in 2024. With the JetPack 6.0 production release now generally available…
]]>This post is the first in a series on building multi-camera tracking vision AI applications. In this part, we introduce the overall end-to-end workflow, focusing on building and deploying the multi-camera tracking system. The second part will cover fine-tuning AI models with synthetic data to enhance system accuracy. Large areas like warehouses, factories, stadiums, and airports are typically…
]]>AI is rapidly changing industrial visual inspection. In a factory setting, visual inspection is used for many issues, including detecting defects and missing or incorrect parts during assembly. Computer vision can help identify problems with products early on, reducing the chances of them being delivered to customers. However, developing accurate and versatile object detection models remains…
]]>Ever spotted someone in a photo wearing a cool shirt or some unique apparel and wondered where they got it? How much did it cost? Maybe you’ve even thought about buying one for yourself. This challenge inspired Snap’s ML engineering team to introduce Screenshop, a service within Snapchat’s app that uses AI to locate and recommend fashion items online that match the style seen in an image.
]]>NVIDIA DeepStream is a powerful SDK that unlocks GPU-accelerated building blocks to build end-to-end vision AI pipelines. With more than 40+ plugins available off-the-shelf, you can deploy fully optimized pipelines with cutting-edge AI Inference, object tracking, and seamless integration with popular IoT message brokers such as REDIS, Kafka, and MQTT. DeepStream offers intuitive REST APIs to…
]]>When it comes to perception for Intelligent Video Analytics (IVA) applications such as traffic monitoring, warehouse safety, and retail shopper analytics, one of the biggest challenges is occlusions. People may move behind structural obstacles, retail shoppers may not be fully visible due to shelving units, and cars may be hidden behind large trucks, for example. This post explains how the…
]]>Note: As of January 6, 2025, VILA is now part of the Cosmos Nemotron VLM family. NVIDIA is proud to announce the release of NVIDIA Cosmos Nemotron, a family of state-of-the-art vision language models (VLMs) designed to query and summarize images and videos from physical or virtual environments. Cosmos Nemotron builds upon NVIDIA’s groundbreaking visual understanding research including VILA…
]]>Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently. However, the existing technology typically only supports one single image. They cannot reason among multiple images, support in context learning or understand videos. Also, they don’t optimize for inference speed. We developed VILA…
]]>Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models is large. Existing methods for reducing memory usage often result in additional computational overheads or imbalanced workloads. This post describes joint research between NVIDIA and NIO, a developer of smart electric vehicles.
]]>Genomics researchers use different sequencing techniques to better understand biological systems, including single-cell and spatial omics. Unlike single-cell, which looks at data at the cellular level, spatial omics considers where that data is located and takes into account the spatial context for analysis. As genomics researchers look to model biological systems across multiple omics at…
]]>This post delves into the capabilities of decoding DICOM medical images within AWS HealthImaging using the nvJPEG2000 library. We’ll guide you through the intricacies of image decoding, introduce you to AWS HealthImaging, and explore the advancements enabled by GPU-accelerated decoding solutions. Embarking on a journey to enhance throughput and reduce costs in deciphering medical images…
]]>A convolutional neural network is a type of deep learning network used primarily to identify and classify images and to recognize objects within images.
]]>Computer vision defines the field that enables devices to acquire, process, understand, and analyze digital images and videos and extract useful information.
]]>Edge AI developers are building AI applications and products for safety-critical and regulated use cases. With NVIDIA Holoscan 1.0, these applications can incorporate real-time insights and processing in milliseconds. With the recent release of NVIDIA Holoscan 1.0, developers can more easily build production-ready applications for multimodal, real-time sensor processing.
]]>Driving the future of healthcare imaging, NVIDIA MONAI microservices are creating unique state-of-the-art models and expanded modalities to meet the demands of the healthcare and biopharma industry. The latest update introduces a suite of new features designed to further enhance the capabilities and efficiency of medical imaging workflows. This post explores the following new features…
]]>Video quality metrics are used to evaluate the fidelity of video content. They provide a consistent quantitative measurement to assess the performance of the encoder. VMAF combines human vision modeling with machine learning techniques that are continuously evolving, enabling it to adapt to new content. VMAF excels in aligning with human visual perception by combining detailed analysis…
]]>While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically usage from Python and Julia. We also demonstrate the performance of cuTENSOR based on benchmarks in a number of application domains. This post explores applications and performance benchmarks for cuTENSOR 2.0. For more information…
]]>NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array slices. The release of cuTENSOR 2.0 represents a major update—in both functionality and performance—over its predecessor. This version reimagines its APIs to be more expressive, including advanced just-in-time compilation capabilities all…
]]>Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated art through denoising diffusion techniques. This can be applied to many enterprise use cases such as creating personalized content for marketing, generating imaginative backgrounds for objects in…
]]>From cities and airports to Olympic Stadiums, AI is transforming public spaces into safer, smarter, and more sustainable environments.
]]>For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to run simulation and optimization calculations in parallel. Multicore linear solvers used in process modeling and simulation have not achieved expected improvements, and in certain cases have underperformed optimized single-core solvers.
]]>Learn how synthetic data is supercharging 3D simulation and computer vision workflows, from visual inspection to autonomous machines.
]]>The past few decades have witnessed a surge in rates of waste generation, closely linked to economic development and urbanization. This escalation in waste production poses substantial challenges for governments worldwide in terms of efficient processing and management. Despite the implementation of waste classification systems in developed countries, a significant portion of waste still ends up…
]]>Discover the transformative power of computer vision and video analytics at GTC. Dive into cutting-edge techniques such as vision transformers, AI agents, multi-modal foundation models, 3D technology, large language models (LLMs), vision language models (VLMs), generative AI, and more.
]]>On March 5, 8am PT, learn how NVIDIA Metropolis microservices for Jetson Orin helps you modernize your app stack, streamline development and deployment, and future-proof your apps with the ability to bring the latest generative AI capabilities to any customer through simple API calls.
]]>Visual generative AI is the process of creating images from text prompts. The technology is based on vision-language foundation models that are pretrained on web-scale data. These foundation models are used in many applications by providing a multimodal representation. Examples include image captioning and video retrieval, creative 3D and 2D image synthesis, and robotic manipulation.
]]>The past decade has seen a remarkable surge in the adoption of deep learning techniques for computer vision (CV) tasks. Convolutional neural networks (CNNs) have been the cornerstone of this revolution, exhibiting exceptional performance and enabling significant advancements in visual perception. By employing localized filters and hierarchical architectures, CNNs have proven adept at…
]]>NVIDIA Metropolis Microservices for Jetson has been renamed to Jetson Platform Services, and is now part of NVIDIA JetPack SDK 6.0. Building vision AI applications for the edge often comes with notoriously long and costly development cycles. At the same time, quickly developing edge AI applications that are cloud-native, flexible, and secure has never been more important. Now…
]]>As industrial automation increases, safety becomes a greater challenge and top priority for enterprises. Safety encompasses multiple aspects: The same technological solution that’s driving automation can be used to also address safety: artificial intelligence. AI-powered stationary outside-in safety platforms, which monitor activity across many distributed machines or robots…
]]>NVIDIA Metropolis Microservices for Jetson has been renamed to Jetson Platform Services, and is now part of NVIDIA JetPack SDK 6.0. NVIDIA Metropolis Microservices for Jetson provides a suite of easy-to-deploy services that enable you to quickly build production-quality vision AI applications while using the latest AI approaches. This post explains how to develop and deploy generative AI…
]]>NVIDIA Metropolis Microservices for Jetson has been renamed to Jetson Platform Services, and is now part of NVIDIA JetPack SDK 6.0. NVIDIA Metropolis microservices provide powerful, customizable, cloud-native APIs and microservices to develop vision AI applications and solutions. The framework now includes NVIDIA Jetson, enabling developers to quickly build and productize performant and…
]]>Robots are typically equipped with cameras. When designing a digital twin simulation, it’s important to replicate its performance in a simulated environment accurately. However, to make sure the simulation runs smoothly, it’s crucial to check the performance of the workstation that is running the simulation. In this blog post, we explore the steps to setting up and running a camera benchmark…
]]>For robotic agents to interact with objects in their environment, they must know the position and orientation of objects around them. This information describes the six degrees of freedom (DOF) pose of a rigid body in 3D space, detailing the translational and rotational state. Accurate pose estimation is necessary to determine how to orient a robotic arm to grasp or place objects in a…
]]>The NVIDIA PyG container, now generally available, packages PyTorch Geometric with accelerations for GNN models, dataloading, and pre-processing using cuGraph-Ops, cuGraph, and cuDF from NVIDIA RAPIDS, all with an effortless out-of-the-box experience.
]]>Railroad simulation is important in modern transportation and logistics, providing a virtual testing ground for the intricate interplay of tracks, switches, and rolling stock. It serves as a crucial tool for engineers and developers to fine-tune and optimize railway systems, ensuring efficiency, safety, and cost-effectiveness. Physically realistic simulations enable comprehensive scenario…
]]>In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the inference stage. We discuss the techniques employed, such as inference computation graph simplification, quantization, and lowering precision. We also showcase the benchmarking results of our scene text detection and recognition models…
]]>To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do incremental learning or fine-tuning as per your use cases and datasets. Keep in mind that this pipeline is the main building block of scene understanding, AI-based inspection, and document processing platforms. It should be accurate and have low…
]]>Identification and recognition of text from natural scenes and images become important for use cases like video caption text recognition, detecting signboards from vehicle-mounted cameras, information retrieval, scene understanding, vehicle number plate recognition, and recognizing text on products. Most of these use cases require near real-time performance. The common technique for text…
]]>Capturing video footage and playing games at 8K resolution with 60 frames per second (FPS) is now possible, thanks to advances in camera and display technologies. Major leading multimedia companies including RED Digital Cinema, Nikon, and Canon have already introduced 8K60 cameras for both the consumer and professional markets. On the display side, with the newest HDMI 2.1 standard…
]]>With the latest NVIDIA TAO 5.2, you can now run zero-shot inference for panoptic segmentation with ODISE, create custom 3D object pose models, and boost inference throughput for vision transformers using FasterViT. Download now.
]]>As we approach the end of another exciting year at NVIDIA, it’s time to look back at the most popular stories from the NVIDIA Technical Blog in 2023. Groundbreaking research and developments in fields such as generative AI, large language models (LLMs), high-performance computing (HPC), and robotics are leading the way in transformative AI solutions and capturing the interest of our readers.
]]>