NODE IK: Solving Inverse Kinematics with Neural Ordinary Differential Equations
for Path Planning

Suhan Park Mathew Schwartz Jaeheung Park

Abstract

This paper proposes a novel inverse kinematics (IK) solver of articulated robotic systems for path planning. IK is a traditional but essential problem for robot manipulation. Recently, data-driven methods have been proposed to quickly solve the IK for path planning. These methods can handle a large amount of IK requests at once with the advantage of GPUs. However, the accuracy is still low, and the model requires considerable time for training. Therefore, we propose an IK solver that improves accuracy and memory efficiency by utilizing the continuous hidden dynamics of Neural ODE. The performance is compared using multiple robots.

\sidecaptionvpos

figurec

\keywords

kinematics, neural networks, robotics, trajectory, path planning

1 Introduction

An Inverse kinematics (IK) problem is to find the joint space solution that satisfies the given Cartesian pose constraint. This process is important as most real-world tasks, such as furniture assembly [16] and roboscuplt [19], have Cartesian targets and provide a meaningful method of determining a necessary joint configuration to achieve the desired end-goal. By extending a single IK solution into a series of multiple IK solutions, one can construct a method for moving along arbitrary locations in Cartesian space – creating a path of motion for a robot. For this reason, IK has been a topic of robotics research for decades, and the improved computational speed, configuration options, and accuracy have been the subject of numerous open-source initiatives. These include e.g., an analytic inverse kinematics solver [7] for six degree-of-freedom (DoF) robots and a solver using numerical optimization [3]. Although the analytical IK solution is fast, it can only be applied to a limited number of robots. In particular, the complexity of the IK solver drastically increases when redundant degrees of freedom are present, as weighting and other priorities must be implemented to reduce the solution space. On the other hand, numerical optimization methods can be used in all robots, but suffer from computational complexity. Therefore, there is a significant time overhead for solving multiple IK requests, such as the IK for a Cartesian space path.

Finding joint configuration for the given Cartesian space path is one of the constrained motion planning problems [14], which is known as a complex problem because it has an implicit constraint on the path. The end-effector pose constraint is nonlinear and implicit for most robots. Therefore, searching path considering this constraint costs considerable time. Prior studies have shown that the constrained motion planning problem can be solved faster using IK [4] [11]. Therefore, batch IK predictions can enhance searching path in the end-effector pose constrained space and provide various paths that can be used as a warm start (initial) path.

Recently, data-driven methods [6, 18] have been proposed to quickly find the multiple solutions required for path planning. By utilizing generative models, such as conditional variational autoencoder (CVAE) and conditional generative adversarial network (CGAN), a prior work [1] improved the sample drawing for constrained motion planning. However, CVAE and CGAN have known issues that manifold mismatch and mode collapse (CGAN), which eventually reduce available space and diversity. Kim and Perez [13] introduced an autoencoder style inverse kinematics network and density estimator using normalizing flows (NF). Similar to our work, IKFlow [2] directly utilizes the conditional NF to improve the accuracy of the generated inverse kinematics. However, IKFlow needs extensive model parameters for highly accurate prediction because the model complexity depends on the layer width and depths.

Figure 1: Cartesian pose constrained path from single batch IK prediction of NODE IK

On the other hand, continuous layer models are known for their capacity to represent complex distribution with a relatively simple model. The continuous model can be structured using an ordinary differential equation (ODE) with hidden neural dynamics, which is called Neural ODE (NODE), and normalizing flows with NODE are called continuous normalizing flows (CNF) [5]. This paper proposes a novel IK method using the Neural ODE for lightweight and accurate inverse kinematics generation. We refer to this process as NODE IK (Neural Ordinary Difference Equations for Inverse Kinematics). NODE IK is demonstrated on a redundant manipulator that has more than 6 DoF, as illustrated in Fig. 1. The objective of NODE IK is to find multiple IK solutions quickly from given target poses and latent variables that can be sampled in tractable normal distribution as shown in Fig. 2. In addition, the proposed model deals with multi-target IK as well as single-target IK using the expressive NODE model. As an application of the batch-IK, finding a joint path for the given Cartesian end-effector path is demonstrated to show the efficiency of the batch-IK.

The contribution of this paper is to reduce the number of model parameters while achieving better accuracy compared to the state-of-the-art method. In addition, multi-target IK has been demonstrated whose joint configuration comprises multi-branch kinematic chains, not single chains. For example, a joint configuration for dual arm tasks of humanoid robots includes common joints, such as waist joints, connecting the pelvis frame to each end-effector frame. Finally, as the objective of the proposed method is a large number of IK resolutions, we demonstrate the path generation results of the Cartesian pose constrained planning problem using the proposed method. Technically, integrating cutting-edge open source software, NODE IK is provided with GPU acceleration for both model training and forward kinematics. This enables online dataset generation in the training process.

2 Background

Figure 2: Objective of the proposed method: $k$ -IK requests are solved at once, batch-IK. Utilizing GPU and neural ODE structure, multiple accurate IK solutions can be quickly acquired from given target poses, such as the end-effector path. Latent variables are used to describe redundancy.

2.1 Inverse Kinematics

Inverse kinematics is, as the name implies, the inverse of forward kinematics defined by a function $f : R^{n} \to S E (3)$ , mapping joint configurations $q \in R^{n}$ to an end-effector pose $x \in S E (3)$ , where $n$ is the number of joints. The methods for the IK problem can be divided into analytic and numerical approaches. An analytic approach directly solves the joint configurations using geometric relationships. Unfortunately, a geometry analysis for inverse kinematics of a robot with more than 6 DoF is challenging and often not feasible. On the other hand, numerical methods can be trivially implemented and used for all single-chain robot configurations. The Newton-Rhapson method [9] and sequential quadratic programming [3] are well-known algorithms for solving the inverse kinematics problem. However, these methods take a considerable time for a large number of solutions – a requirement for a smooth joint path through a Cartesian target path.

There can be infinite solutions for a single target pose for a redundant robot (one in which a typical arm has more than six DoF). Thus, introducing a latent variable $z \in R^{n}$ , the unique solution to the inverse kinematics can be determined as $q = f^{- 1} (x, z)$ . Note that the latent variable has the same dimension as the configuration because NF require the same dimension for the invertibility of the function. In this paper and prior work [2], NF is utilized to transform between latent variable $z$ and joint configuration $q$ .

Figure 3: Structure of Node IK. Left figure is sampling process in the complex joint configuration distribution (IK). Right figure is structure of hidden dynamics.

2.2 Normalizing Flows

The normalizing flows (NF) [17], which are used in prior work [2], are generative models with a tractable distribution for efficient density estimation and sampling in the complex distribution. Using a tractable probability density function, such as normal distribution, NF are used to estimate the complex target distribution. When a sample is drawn from a tractable distribution, for example, $x_{t} \sim N (0, I)$ , a model with learnable parameters $f_{θ} : R^{D} \to R^{D}$ transforms the sample $x_{t}$ to a desired state $x_{c}$ in the complex distribution:

x_{c} = f_{θ} (x_{t}) .

(1)

Conversely, the training process proceeds through the inverse function of the model $g_{θ} = f_{θ}^{- 1}$ , maximizing log-likelihood. Let $x_{t}, x_{c}$ be the random variables in tractable and complex distribution and $g_{θ} : R^{D} \to R^{D}$ be an invertible function where $x_{t} = g_{θ} (x_{c})$ . Then, $p (x_{t})$ is known by tractable distribution, and the change of variable theorem enables the computation of exact changes in probability:

p (x_{c}) = p (x_{t}) {∣ ∣ ∣ det \frac{\partial f}{\partial x_{t}} ∣ ∣ ∣}^{- 1} .

(2)

And the log probability is

log p (x_{c}) = log p (x_{t}) - log ∣ ∣ ∣ det \frac{\partial f}{\partial x_{t}} ∣ ∣ ∣ .

(3)

The model parameter $θ$ is updated to maximize $log p (x_{c})$ .

The prior work IKFlow [2] modeled the transform functions by a composition of the invertible functions:

f_{θ} = f_{1} \circ \dots \circ f_{N - 1} \circ f_{N} .

(4)

The model depth $N$ needs to be large enough for an expressive model. However, the number of parameters increases when the model’s depth $N$ increases.

2.3 Neural ODE and Continuous Normalizing Flows

A neural network for hidden dynamics $h_{θ}$ representing ODE:

\frac{d z (t)}{d t} = h_{θ} (z (t), t),

(5)

which is called Neural ODE [5], is known for its ability that estimates complex distribution using a relatively simple model. Starting from the initial value, which is used as an input layer, $z (t_{0})$ , the desired value $z (t_{1})$ is the solution to the initial value problem at some time $t_{1}$ :

z (t_{1}) = z (t_{0}) + \int_{t_{0}}^{t_{1}} h_{θ} (z (t), t) d t .

(6)

A black-box differential equation solver can compute the desired value, automatically evaluating the hidden dynamics $h_{θ}$ as needed with the desired accuracy. This structure lets the hidden dynamics $h_{θ}$ form continuous vector fields for transforming from the initial state to the desired state. The gradients for training are computed using the adjoint sensitivity method. Let $L (z (t_{1}))$ be a loss function for the output of ODE solver:

L (z (t_{1})) = L (z (t_{0}) + \int_{t_{0}}^{t_{1}} h_{θ} (z (t), t) d t) .

(7)

Utilizing the adjoint $a (t) = \partial L / z (t)$ , the gradient with regard to $θ$ for optimizing the hidden neural dynamics:

\frac{d L}{d θ} = - \int_{t_{1}}^{t_{0}} a (t)^{T} \frac{\partial h_{θ} (z (t), t)}{\partial θ} d t .

(8)

This neural ODE form can be used for NF structures, which is called continuous normalizing flows (CNF) [5]. Let $z (t_{0}), z (t_{1})$ be the random variables in tractable and complex distribution as in Sec. 2.2 . Then (6) can be used as a transform function like (1). This form is easily invertible by reverting $t$ . Thus, $h_{θ}$ does not need to be bijective for NF. In addition, this structure can be seen as having infinitely deep recurrent layers, enabling the expressive model. Note that time $t$ does not indicate robot trajectory time but the degree of transformation from $z$ to $q$ . We will describe $z (t = 0)$ as $z$ and $z (t = 1)$ as $q$ in the rest of the paper.

CNF do not require determinant computation of the Jacobian because the change in log-density follows the instantaneous change of variables formula [5]:

\frac{\partial log p (z (t))}{\partial t} = - T r (\frac{\partial h_{θ}}{\partial z (t)}) .

(9)

Therefore, the target log-density is

log p (z (t_{1})) = log p (z (t_{0})) - \int_{t_{0}}^{t_{1}} T r (\frac{\partial h_{θ}}{\partial z (t)}) d t,

(10)

which is solved by ODE solver. To maximize the log-likelihood, the loss function is

L = - log p (z (t_{1})) .

(11)

Additionally, free-form Jacobian of reversible dynamics (FFJORD) [8] was proposed to reduce computation costs of the Jacobian trace of (9). This method utilizes Hutchinson’s trace estimator [10], which allows a scalable unbiased estimation of the log-density. While computing Tr costs $O (n^{2})$ , FFJORD can approximate the log-density at cost $O (n)$ .

3 Methods

3.1 Overview

The proposed model exploits the aforementioned property of the NF and Neural ODE for lightweight and accurate IK prediction. The proposed method uses conditional CNF to learn the configuration distribution for a given Cartesian pose $x$ . The proposed method is implemented on top of SoftFlow [12] architecture for conditioning and FFJORD [8] for fast training. NVIDIA Warp [15] is utilized to compute the forward kinematics using GPU resources.

3.2 Architecture

The proposed model utilizes conditional CNF using neural ODE:

\frac{d z (t)}{d t} = h_{θ} (z (t), x, t) .

(12)

The hidden dynamics model are conditioned by $t$ and $x$ for the dynamics corresponding to the end-effector pose. During a training procedure, instantly generated dataset is utilized.

Joint configurations $q$ are sampled in uniform distribution within the joint range. And, the forward kinematics for the joints is computed online:

	$q$	$\sim U (q -, ¯ q),$		(13)
	$x$	$= f (q),$		(14)

where $q -$ and $¯ q$ are the lower and upper bounds of the joints. Then, the trace of Jacobian (9) and the target value $z (0)$ of the current network model is computed as following relationship:

	$z (1)$	$= q,$		(15)
	$z ≜ z (0)$	$= z (1) - \int_{0}^{1} h_{θ} (z (t), x, t) d t .$		(16)

The proposed model uses a normal distribution as a tractable base distribution: $z \sim N (0, I)$ to easily sample and calculate the log density $log p (z (0))$ . The model is directly updated by (8) with the loss function (11).

The structure of neural dynamics model $h_{θ}$ is shown in Fig. 3. Because CNF do not require Jacobian determinant, the structure of hidden dynamics can be simply designed. Every hidden layer is conditioned in affine form.

3.3 Batch IK and Path generation

When acquiring the batch IK for $k$ targets, a set of latent variable $Z = [z_{1}, \dots, z_{k - 1}, z_{k}]^{T} \in R^{n \times k}$ are randomly sampled in normal distribution and target poses $X = [x_{1}, \dots, x_{k - 1}, x_{k}]^{T} \in S E (3)^{k m}$ are set for the given path, where $m$ is the number of IK targets, e.g., dual end-effector targets for humanoid robots. The output of the NODE IK is the corresponding joint configurations $Q = [q_{1}, \dots, q_{k - 1}, q_{k}]^{T} \in R^{n \times k}$ . Fig. 2 illustrates this relationship. This process is a one-time calculation taking advantage of GPUs.

When generating IK solutions for the given Cartesian path, $z$ should change continuously. The same or continuous change of $z$ is required for the continuous joint solution path. However, because the NF can learn multi-modal distribution, the path can be discontinuous even with the continuous change of the input variables. Therefore, after batch generation, every path is checked whether the path is continuous.

4 Results

4.1 Performance

The proposed method and IKFlow were compared in accuracy, prediction time, and the number of model parameters.

The accuracy was measured using 7 DoF Franka Emika Panda robot. Model hyperparameters for IKFlow [2] were set to 1024 wide network width, 12 coupling layers as they claimed the best performance with these parameters with the robot. The hidden dynamics of NODE IK comprised four 1024 width layers. The accuracy was evaluated from 1000 sampled Cartesian target poses $x$ and the predicted 250 joint solutions for each target pose as in the experiments of previous work [2]. By passing the joint solutions to the forward kinematics function, the end-effector pose of predicted joint solutions $^x$ was computed.

NODE IK was able to learn faster and more accurately than IKFlow with the same number of data passed to the network during the training process. Fig. 4 illustrates the position error of the end-effector poses $| | x_{p} - {^x}_{p} | |_{2}$ . The position error of NODE IK was approximately half compared to that of IKFlow. Fig. 5 shows the orientation errors, which is quaternion geodesic distance between two poses. The orientation error of NODE IK was much lower than IKFlow. Outlier data with a large error existed among the generated solutions by IKFlow, which adversely affected the average value.

Although NODE IK had high expressive power, it required a few parameters. The number of model parameters of NODE IK was 3,316,320 for achieving the sub-centimeter error, while that of IKFlow was 50,934,364. NODE IK had 6.5 % parameters compared to IKFlow.

Figure 5: Orientation error while training

However, NODE IK took more time for prediction than IKFlow because multiple prediction steps of the hidden dynamics are necessary for the ODE solver. System setup for the evaluation was with Intel i7-10700K CPU and NVIDIA RTX 2080 Ti GPU. NODE IK and IKFlow took 85.93 ms and 14.97 ms, respectively, to find 1000 IK solutions. Still, the IK generation speed was faster than that of TRAC-IK [3], which is a numerical IK solver. Numerical solvers, TRAC-IK (C++) and Klampt IK (Python) took 1 s and 2.6 s, respectively, for 1000 IK solutions.

Figure 6: First row illustrates generated IK solutions following dual circle paths. Green circles are desired end-effector paths. Second row shows position error of the dual target path. Third and fourth rows show orientation error of the dual target path.

4.2 IK for path generation

Path generation utilizing the advantages of batch IK was performed to evaluate the performance of the proposed model. Franka Emika Panda and TOCABI were used for the path generation. TOCABI is a 33 DoF humanoid robot, but, only upper body joints (arm and waist have eight and three joints, respectively) are used for IK. Target paths are given arbitrarily, which are circular paths. Orientation of the end-effector was fixed along the path. $z$ was sampled in $N (0, I)$ once and all IK solutions were acquired using each target pose and the same $z$ . The discontinuous paths were rejected. Path errors are the average of position and orientation errors at each target pose. Especially, NODE IK was able to learn dual target IK for humanoid robots. 19 joints are used for dual target IK, which includes three shared waist joints.

The average path errors are listed in Tab. LABEL:tab:path_errors . Overall, NODE IK showed less error than IKFlow. As the IK complexity increased, the errors increased. A path generated from NODE IK is shown in Fig. 6 as a qualitative result. The position and orientation errors differed according to the targets, but the relative change was small. Especially, the orientation error was barely changed. NODE IK had better performance for learning orientation conditions than IKFlow.

Robots	Position (mm)		Orientation (deg)
Robots	F	N	F	N
Panda	8.15	4.39	0.77	0.32
TOCABI (S)	27.11	14.31	6.64	0.82
TOCABI (D)	-	48.09	-	8.71

Table 1: Average path errors. F: IKFlow, N: NODE IK. S: Single target, D: Dual target.

5 Conclusion

The proposed method has 93% fewer parameters than the state-of-the-art method IKFlow, allowing the highly accurate learning-based IK for the low memory system. In addition, from the increased complexity, NODE IK could demonstrate dual arm tasks. However, the prediction took longer time than the state-of-the-art method because the ODE solver requires multiple predictions of the hidden dynamics network. We expect that the performance of NODE IK can be improved with future work and contributions from the machine learning community by implementing optimized solvers.

References

[1] C. Acar and K. P. Tee (2021) Approximating constraint manifolds using generative models for sampling-based constrained motion planning. In 2021 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 8451–8457. External Links: Document Cited by: §1.
[2] B. Ames and J. Morgan (2022) IKFlow: generating diverse inverse kinematics solutions. IEEE Robotics and Automation Letters. Cited by: §1, §2.1, §2.2, §2.2, §4.1.
[3] P. Beeson and B. Ames (2015) TRAC-IK: an open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 928–935. Cited by: §1, §2.1, §4.1.
[4] D. Bertram, J. Kuffner, R. Dillmann, and T. Asfour (2006) An integrated approach to inverse kinematics and path planning for redundant manipulators. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., Vol. , pp. 1874–1879. External Links: Document Cited by: §1.
[5] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) Neural ordinary differential equations. Advances in neural information processing systems 31. Cited by: §1, §2.3, §2.3, §2.3.
[6] A. D’Souza, S. Vijayakumar, and S. Schaal (2001) Learning inverse kinematics. In Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), Vol. 1, pp. 298–303. Cited by: §1.
[7] R. Diankov (2010) Automated construction of robotic manipulation programs. Cited by: §1.
[8] W. Grathwohl, R. T. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud (2018) FFJORD: free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, Cited by: §2.3, §3.1.
[9] K. Gupta and K. Kazerounian (1985) Improved numerical solutions of inverse kinematics of robots. In Proceedings. 1985 IEEE International Conference on Robotics and Automation, Vol. 2, pp. 743–748. Cited by: §2.1.
[10] M. F. Hutchinson (1989) A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation 18 (3), pp. 1059–1076. Cited by: §2.3.
[11] K. Jang, J. Baek, S. Park, and J. Park (2022) Motion planning for closed-chain constraints based on probabilistic roadmap with improved connectivity. IEEE/ASME Transactions on Mechatronics (), pp. 1–9. External Links: Document Cited by: §1.
[12] H. Kim, H. Lee, W. H. Kang, J. Y. Lee, and N. S. Kim (2020) Softflow: probabilistic framework for normalizing flow on manifolds. Advances in Neural Information Processing Systems 33, pp. 16388–16397. Cited by: §3.1.
[13] S. Kim and J. Perez (2021) Learning reachable manifold and inverse mapping for a redundant robot manipulator. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4731–4737. Cited by: §1.
[14] Z. Kingston, M. Moll, and L. E. Kavraki (2019) Exploring implicit spaces for constrained sampling-based planning. The International Journal of Robotics Research 38 (10-11), pp. 1151–1178. Cited by: §1.
[15] M. Macklin (2022-03) Warp: a high-performance python framework for gpu simulation and graphics. Note: NVIDIA GPU Technology Conference (GTC)https://github.com/nvidia/warp Cited by: §3.1.
[16] S. Park, H. Lee, S. Kim, J. Baek, K. Jang, H. C. Kim, M. Kim, and J. Park (2022) Robotic furniture assembly: task abstraction, motion planning, and control. Intelligent Service Robotics, pp. 1–17. Cited by: §1.
[17] D. Rezende and S. Mohamed (2015) Variational inference with normalizing flows. In International conference on machine learning, pp. 1530–1538. Cited by: §2.2.
[18] M. Rolf, J. J. Steil, and M. Gienger (2010) Goal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development 2 (3), pp. 216–229. Cited by: §1.
[19] M. Schwartz and J. Prasad (2013) RoboSculpt. In Rob| Arch 2012, pp. 230–237. Cited by: §1.

NODE IK: Solving Inverse Kinematics with Neural Ordinary Differential Equations for Path Planning