Fiducial Marker Detection in Multi-Viewpoint Point Cloud

Yibo Liu, Hunter Schofield, Jinjun Shan This work was supported in part by NSERC Alliance Program under Grant ALLRP 555847-20, and in part by Mitacs Accelerate Program under Grant IT26108. The authors are with Department of Earth and Space Science and Engineering, York University, Toronto, Ontario M3J 1P3, Canada {yorklyb,hunterls,jjshan}@yorku.ca

Abstract

The existing LiDAR fiducial marker systems have usage restrictions. Especially, LiDARTag requires a specific marker placement and Intensity Image-based LiDAR Fiducial Marker demands that the point cloud is sampled from one viewpoint. As a result, with point cloud sampled from multiple viewpoints, fiducial marker detection remains an unsolved problem. In this letter, we develop a novel algorithm to detect the fiducial markers in the multi-viewpoint point cloud. The proposed algorithm includes two stages. First, Regions of Interest (ROIs) detection finds point clusters that could contain fiducial markers. Specifically, a method extracting the ROIs from the intensity perspective is introduced on account of the fact that from the spatial perspective, the markers, which are sheets of paper or thin boards, are non-distinguishable from the planes to which they are attached. Second, marker detection verifies if the candidate ROIs contain fiducial markers and outputs the ID numbers and vertices locations of the markers in the valid ROIs. In particular, the ROIs are transmitted to a predefined intermediate plane for the purpose of adopting a spherical projection to generate the intensity image, and then, marker detection is completed through the intensity image. Qualitative and quantitative experimental results are provided to validate the proposed algorithm. The codes and results are available at: https://github.com/York-SDCNLab/Marker-Detection-General

I Introduction

Fiducial markers are artificial objects that have been widely utilized in robotic applications, including Simultaneous Localization and Mapping (SLAM) [18, 27], Structure from Motion (SFM) [5], multi-sensor calibration [8], autonomous navigation [4, 14], and Augmented Reality (AR) [1]. The features provided by the fiducial markers, also known as fiducials, are easy and robust to detect, which makes the feature detection and matching simple and reliable. Feature detection and matching are also prerequisites for LiDAR applications, and thus, fiducials are also favorable to the LiDAR. Despite this, owing to the fact that the fiducial marker systems were originally developed for cameras, the marker detection methods focus on 2D images. As a result, the LiDAR cannot adopt visual fiducial marker (VFM) directly considering that its data format is a 3D point cloud.

Fig. 1: An illustration of the limitations of the previous LiDAR fiducial marker systems. (a): The LiDARTag system [9] requires that there is adequate clearance around the marker’s object due to its clustering method [10], which makes the marker an extra 3D object added to the environment. (b): The IILFM system adopts spherical projection to transfer the 3D point cloud to a 2D image and then carry out marker detection. As a result, in cases where the spherical projection is not applicable, the IILFM is not applicable.

Some recent studies, such as LiDARTag [9] and Intensity Image-based LiDAR Fiduical Marker (IILFM) [15], introduce fiducial marker systems to LiDAR sensors. Nevertheless, as shown in Fig. 1, previous approaches have limitations. In particular, LiDARTag [9] has restrictions on marker placement (See Fig. 1(a)). IILFM [15] does not have any marker placement restrictions; however, as shown in Fig. 1(b), it has a prerequisite that the spherical projection [15, 6, 2] is applicable, which indicates that all points in the point cloud are sampled from one viewpoint [6]. This one-viewpoint prerequisite implies that the point cloud is just one LiDAR scan or that the LiDAR is stationary. For a low-cost LiDAR, such as Livox MID-40 which has a limited Field of View (FoV), the point cloud sampled in one scan is extremely sparse [13]. Thus, if the employed LiDAR is Livox MID-40, the one-viewpoint prerequisite in the IILFM system [15] implies that the LiDAR has to be stationary. Suppose that the point cloud is sampled when the LiDAR is in motion, that is, the point cloud is a stack of multiple LiDAR scans sampled from multiple viewpoints, then IILFM [15] is not applicable as the one-viewpoint prerequisite is not satisfied.

In real-world applications, the multi-viewpoint point cloud can be a 3D point cloud map built by the SLAM framework [26, 13, 24] as shown in Fig. 2, a 3D point cloud model reconstructed by the SFM framework [28] or just a stitch of multiple one-viewpoint point clouds. Note that point clouds that satisfy the requirements of LiDARTag [9] (Fig. 1(a)) or IILFM [15] (Fig. 1(b)) are essentially special cases of the multi-viewpoint point cloud. Supposing that the proposed algorithm can resolve the marker detection problem in the multi-viewpoint point cloud, then it definitely can be applied to these special cases. This work aims to remove the limitations of the existing methods and extend fiducial marker detection to the multi-viewpoint point cloud.

In this letter, we propose a generic LiDAR fiducial marker detection algorithm that has no restrictions on marker placement (the limitation of LiDARTag [9]) and does not require that the spherical projection be applicable to the inputted point cloud (the limitation of IILFM [15]). The pipeline of the proposed algorithm is illustrated in Fig. 3. Considering that the markers are not spatially distinguishable from the planes to which they are attached, we design an intensity-feature-based clustering algorithm to find the point clusters that could contain the fiducial markers, which are called Regions of Interest (ROIs). With ROIs detected, we then introduce a method to detect the locations of the 3D fiducials with respect to the global coordinate system ${G}$ . Specifically, one after another, the valid ROIs are projected onto a predefined intermediate plane, which makes the spherical projection feasible there. Hereafter, marker detection is carried out on the projected clusters using the method proposed in our previous research, IILFM [15]. Thereafter, by simple geometry transmission, the locations of the fiducials are expressed in ${G}$ .

An example of the multi-viewpoint point cloud. (a): A schematic of the
generation of a general point cloud. In contrast with the case shown in Fig. — Fig. 2: An example of the multi-viewpoint point cloud. (a): A schematic of the generation of a general point cloud. In contrast with the case shown in Fig. 1(b) where there is only one viewpoint, this point cloud is a stack of multiple scans from multiple viewpoints. (b): The 3D point cloud generated by the modified livox_mapping SLAM framework¹¹1https://github.com/York-SDCNLab/m_livox_mapping. The original livox_mapping ²²2https://github.com/Livox-SDK/livox_mappingdoes not save intensity values in the 3D map so we modified it. There are two ArUco markers [19] (family: original; ID: 0 and 2) in this scene. The employed LiDAR is Livox MID-40. This LiDAR only has a front-facing and conical-shaped FoV spanning 38.4 degrees [13]. Hence, to scan this scene, the LiDAR has to sample in motion.

Fig. 3: The pipelines of the proposed algorithm.

The letter is organized as follows. Section II presents the related work. Section III describes the procedures for ROIs detection. Section IV introduces the marker detection pipeline. Section V provides the qualitative and quantitative experimental evaluations. Section VI discusses the conclusions.

Ii Related Work

To the best of our knowledge, LiDARTag [9] and IILFM [15] are the only two existing fiducial marker systems for LiDAR sensor so far. They follow two different 3D object detection ideas to design the marker detection methods. This section introduces these two ideas employed by LiDARTag and IILFM, and thus, explains the source of their limitations.

Utilizing the spatial features. The conventional idea in 3D object detection/segmentation is to utilize spatial features, such as edge points and plane points. Thus, from a spatial perspective, a detectable object should be distinguishable from other objects in the environment. LiDARTag [9, 10] follows this concept to develop its marker detection algorithm. In particular, to guarantee that the fiducial marker is detectable, there must be $t \sqrt{2} / 4$ clearance around the marker’s object, where $t$ represents the marker’s size. Moreover, $τ > t \sqrt{2} / 4$ has to be satisfied when the marker is attached to another object, where $τ$ is the thickness of the marker’s 3D object. These prerequisites define the marker placement restrictions of LiDARTag [9] (Refer to Fig. 1(a)). To satisfy this requirement, a tripod is needed to support the marker. Consequently, the marker of LiDARTag is a huge extra 3D object added to the environment.

Utilizing the range/intensity image(s). Another research trend in 3D object detection research [6, 16] is to adopt neural networks that were originally designed for 2D object detection to recognize objects in the intensity/range image(s) and then project the results back into 3D space. The development of IILFM [15] is inspired by this springing trend. However, according to the theory of spherical projection [15, 6, 2], the 2D projection of a 3D point is specified by the inclination and azimuth of the point with respect to the global coordinate system ${G}$ . Hence, if multiple points share the same inclination and azimuth, they will correspond to the same projection, therefore ambiguity occurs. Suppose that there is only one viewpoint in the point cloud, the azimuth and inclination of each point are unique [6]. Hence, the points that overlap will not appear. However, in the multi-viewpoint point cloud, the points may share the same inclination and azimuth since they are sampled from multiple viewpoints. This time, the overlapping of points is fatal since one projection could correspond to multiple intensity/range values. As a result, the theory behind the spherical projection is not established in the multi-viewpoint point cloud.

In summary, to get rid of the marker placement restrictions, the idea of utilizing spatial features is not appropriate since the spatial features imply the spatial restrictions [9]. Utilizing the range/intensity image(s) does not lead to any spatial restrictions [15] while the vital bridge from 3D space to 2D space, the spherical projection, is not applicable to the multi-viewpoint point cloud [6].

Iii ROIs Detection

The ROIs in this work refer to the point clusters that could contain the fiducial markers. However, as mentioned previously, the fiducial marker is inherently a thin sheet of paper attached to a plane (See Fig. 4(a)). Hence, the marker is indistinguishable from the plane where it is placed in view of ranging (See Fig. 4(b)). This indicates that the conventional clustering methods, including edge-based [22], region-based [11], attributes-based [3], model-based [7], graph-based [10] (the method adopted in LiDARTag [9]), as well as current deep learning-based methods [6, 16, 25] are not able to find the marker cluster since there are no available spatial features in this case. In contrast, as shown in Fig. 4(c), the marker is distinctly seen in the point cloud rendered by the intensity values. Following this inspiration, IILFM [15] utilizes the intensity image and left the marker detection to the 2D VFM detector. However, when the spherical projection is not applicable, neither is the intensity image. Hence, we propose a novel algorithm to solve this tricky clustering problem. In this Section and Section IV, we adopt the point cloud shown in Fig. 4(c) to visualize the result of each step of the proposed algorithm. Note that it is not a multi-viewpoint point cloud since there is only one viewpoint (See Fig. 4(a) where the LiDAR is stationary). However, through this simple case, we can explicitly express the marker detection procedures. The marker detection results in the multi-viewpoint point cloud are given in the qualitative evaluation part of Section V.

Fig. 4: (a): A Livox MID-40 LiDAR is scanning an AprilTag (family: tag36h11; ID: 0) placed on the desktop. LiDARTag [9] is not applicable in this case since there is no clearance around the marker. (b): The point cloud after plane segmentation through random sample consensus [7]. Each plane is rendered with a random but unique color. The fiducial marker is completely merged into the dark blue plane, which is the cluster of the desktop. (c) The point cloud rendered with the intensity values.

Iii-a Intensity Feature Extraction

Despite the fact that a spherical projection is not feasible, Fig. 4(c) still illustrates that the intensity values and their gradients are appropriate references for the detection of markers. Thus, the first step of the proposed ROIs detection pipeline is to extract the intensity features. Inspired by the edge detection algorithm employed in the VFM system [12] which picks up the regions with high grayscale-contrast, the proposed feature extraction method finds the 3D points that have high intensity-contrast with their neighbors. Specifically, the $3 \times 1$ intensity gradient vector, $\nabla I$ , is employed to quantitatively evaluate the intensity-contrast. For a given point, $p_{0} = [x_{0}, y_{0}, z_{0}, I_{0}]^{T}$ , where $[x_{0}, y_{0}, z_{0}]^{T}$ are the Cartesian coordinates and $I_{0}$ is the intensity value. Following [20], the definition of $\nabla I$ of a given point $p_{0}$ is as follows. Define the point cloud composed of the nearest $n$ points around $p_{0}$ as $P_{I} = {p_{1}, p_{2}, \dots, p_{n}}$ . In particular, $P_{I}$ is obtained using k-dimensional (k-d) tree [17] searching with a preset search radius $r_{f}$ . For the point cloud sampled by the Livox MID-40, $r_{f}$ is selected as 0.01 m. Then, a $3 \times 3$ covariance matrix $A$ and a $3 \times 1$ vector $b$ are formulated using the points in $P_{I}$ :

		$A = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ \begin{matrix} n \sum i = 1 Δ^{2} x_{i} & n \sum i = 1 Δ x_{i} Δ y_{i} & n \sum i = 1 Δ x_{i} Δ z_{i} n \sum i = 1 Δ x_{i} Δ y_{i} & n \sum i = 1 Δ^{2} y_{i} & n \sum i = 1 Δ y_{i} Δ z_{i} n \sum i = 1 Δ x_{i} Δ z_{i} & n \sum i = 1 Δ y_{i} Δ z_{i} & n \sum i = 1 Δ^{2} z_{i} \end{matrix} ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦$		(1)
		$b = {[\begin{matrix} n \sum i = 1 Δ x_{i} Δ I_{i} n \sum i = 1 Δ y_{i} Δ I_{i} n \sum i = 1 Δ z_{i} Δ I_{i} \end{matrix}]}^{T}$		(1)

where $Δ x_{i} = (x_{i} - x_{0})$ , $Δ y_{i} = (y_{i} - y_{0})$ , $Δ z_{i} = (z_{i} - z_{0})$ , and $Δ I_{i} = (I_{i} - ¯ I)$ . $[x_{i}, y_{i}, z_{i}]^{T}$ are the Cartesian coordinates of the $i$ th point in $P_{I}$ with $I_{i}$ being the intensity value. $¯ I$ is the mean of the $n$ intensity values. Hereafter, the spectral decomposition of $A$ is:

A = [\begin{matrix} v_{1} v_{2} v_{3} \end{matrix}] ⎡ ⎢ ⎣ \begin{matrix} λ_{1} & 0 & 0 0 & λ_{2} & 0 0 & 0 & λ_{3} \end{matrix} ⎤ ⎥ ⎦ ⎡ ⎢ ⎣ \begin{matrix} {v_{1}}^{T} {v_{2}}^{T} {v_{3}}^{T} \end{matrix} ⎤ ⎥ ⎦

(2)

where $v_{1}$ , $v_{2}$ , and $v_{3}$ are the $3 \times 1$ eigenvectors of $A$ with the corresponding eigenvalues $λ_{1}$ , $λ_{2}$ , and $λ_{3}$ . Then, project $b$ on the eigenvectors:

(3)

Next, normalize $b^{'}$ using the eigenvalues:

b^{''} = ⎡ ⎢ ⎢ ⎣ \begin{matrix} ({v_{1}}^{T} b) / λ_{1} ({v_{2}}^{T} b) / λ_{2} ({v_{3}}^{T} b) / λ_{3} \end{matrix} ⎤ ⎥ ⎥ ⎦

(4)

Note that if the above equation is infeasible due to zero eigenvalue(s), we set the corresponding element(s) in $b^{''}$ as zero(s). The intensity gradient vector $\nabla I$ is given by:

\nabla I = [\begin{matrix} v_{1} v_{2} v_{3} \end{matrix}] b^{''}

(5)

The modulo of the intensity gradient vector, $| \nabla I |$ , is utilized to extract the features. Specifically, a point is identified as a feature if its $| \nabla I |$ satisfies the following:

| \nabla I | \geq t_{I}

(6)

where $t_{I}$ is the threshold. In practice, $t_{I}$ is selected to preserve the points with top 2% intensity gradients.

Iii-B Feature Clustering

Fig. 5(a) shows the feature extraction result of Fig. 4(c). Namely, only the points with high-intensity gradients are preserved in the point cloud. As illustrated in Fig. 5(a), the point cloud composed of the intensity features is very sparse compared to the raw point cloud shown in Fig. 4(c). Moreover, although the features are regarding intensity, the spatial distribution of the intensity features of the fiducial marker is also notable. To explain this spatial distribution, we need to review the VFM design shown in Fig. 6.

Fig. 5: (a): The point cloud after intensity feature extraction on the raw point cloud shown in Fig. 4(c). (b): The clustering result of the point cloud shown in (a).

Fig. 6: A diagram to illustrate the design of a typical square fiducial marker [12]. A square fiducial marker is a combination of the prototype marker (a black frame inside a white frame) and the encoding area.

The aim of adopting the prototype marker (the black-white protector frame) is to isolate the marker from the environment and to construct a square shape that is easy to detect. Similarly, thanks to the prototype marker, from the spatial perspective, the intensity features inside the marker are isolated from the other features in the environment. That is, ”spatial features” are also obtained after the intensity feature extraction. Hereafter, clustering of the features is straightforward. The simple but effective Euclidean clustering method introduced in [21] is employed to extract the clusters in the point cloud of intensity features. The clustering result is given in Fig. 5(b). Each cluster is labeled with a bounding box.

Iii-C Cluster Filtering and ROIs Extraction

As can be seen from Fig. 5(b), many of the clusters are too large or too small to be the cluster of the fiducial marker. Hence, a filtering algorithm is developed to filter out any impractical candidates. In particular, a bounding box needs to satisfy two criteria to be verified as a valid candidate. The first criterion, as shown in Eq. (9), is subject to the marker size(s). The second criterion, as shown in Eq. (10), is based on the fact that the shape of the marker is square. Note that prior knowledge of the marker size(s) is not mandatory. However, if it is not applicable, all candidates that satisfy the second criterion have to be processed as the valid candidates, though the invalid ones will be filtered out at the marker detection stage (See Section IV).

To illustrate the two criteria, the adopted bounding box needs to be introduced first. The bounding box in this work indicates the oriented bounding box (OBB) which is obtained through principal component analysis (PCA) [23]. In particular, the OBB is the minimum 3D box that encloses the cluster while its sides that correspond to the length, width, and height are parallel to the first, second, and third primary components (PCs) of the feature point cluster. Moreover, the three PCs are perpendicular to each other [23]. Ideally, each cluster is a set of coplanar points and thus the third primary component (PC) is the normal of the plane. Hence, the height of the OBB is zero or the OBB is a 2D box instead of a 3D one. However, in the real world, the points in a cluster are not perfectly coplanar due to the LiDAR ranging measurement noise. As a result, the OBB that contains the point cloud of the fiducial marker is still a 3D box but it is flat since the height is trivial compared to the length and width.

Let us neglect the height for now and consider the possible size range of the OBB in 2D space. The possible size of the OBB for a given marker is shown in Fig. 7. In light of the definition of the OBB [23], we know that AC $⊥$ AE and AE $⊥$ AF. Suppose that the side length of the marker is $a$ , it is illustrated in Fig. 7 that BC = $a c o s θ$ and DC = $a s i n θ$ . Thus, AB = DC, such that AC = $a (c o s θ + s i n θ)$ . Similarly, AE = EF = CF = $a (c o s θ + s i n θ)$ . At this point, we conclude that: 1) the OBB in this case is a square; 2) the side length of the OBB is $a (c o s θ + s i n θ)$ . Given that $θ \in [0, 2 π]$ , we have $a (c o s θ + s i n θ) \in [a, \sqrt{2} a]$ . In practice, the orientations of the first and second PCs are subject to the feature points distribution, which is determined by the unique black-and-white pattern of the marker (the encoding area as shown in Fig. 6). Although the pattern cannot be known in advance, the cases where the OBB reaches the smallest and largest sizes (See Fig. 8) are already found according to the proof above.

Fig. 7: A diagram of the possible OBB size for a given marker. $O_{M} - X_{M} - Y_{M}$ is the marker coordinate system. $θ$ is the angle between the first PC and the x-axis. On account of the fact that the first PC and second PC are perpendicular to each other, the angle between the second PC and the y-axis is $θ$ and $∠ D B C = θ$ as well.

Fig. 8: A rough schematic of the smallest (a) and the largest (b) OBBs. (a): 1st PC $∥$ $s_{1}$ and 2nd PC $∥$ $s_{2}$ where $s_{1}$ and $s_{2}$ are the horizontal side and vertical sides of the marker, respectively. (b): 1st PC $∥$ $d_{1}$ and 2nd PC $∥$ $d_{2}$ where $d_{1}$ and $d_{2}$ being the two diagonals of the marker.

In summary, if the side length of the marker is $a$ , the area of the 2D OBB, $S_{O B B}$ , is in the following range:

a^{2} \leq S_{O B B} \leq 2 a^{2}

(7)

Now let us return to 3D space. Define the height of the OBB as $δ$ . The OBB volume, $V_{O B B}$ , belongs to the following range:

a^{2} δ \leq V_{O B B} \leq 2 a^{2} δ

(8)

Note that $δ$ is trifling such that if we employ Eq. (8) as the criterion, many OBBs with tiny volumes will be taken as valid candidates. To avoid this, we adopt the following equation as the first criterion:

\sqrt{2 a^{2} + δ^{2}} \leq L_{O B B} \leq \sqrt{4 a^{2} + δ^{2}}

(9)

where $L_{O B B} = \sqrt{l^{2} + w^{2} + h^{2}}$ is the cuboid diagonal of the OBB with $l$ , $w$ , and $h$ ( $h \leq δ$ ) being the length, width, and height, respectively. In addition, as mentioned previously, the OBB projection on the plane of length and width is square. Thus, Eq. (10) is adopted as the second criterion:

\frac{1}{t_{V}} \leq \frac{l}{w} \leq t_{V}

(10)

where $t_{V}$ is a predefined threshold. Ideally, $t_{V} \approx 1$ since $l \approx w$ . Whereas in the real world, the marker cannot be perfectly scanned by LiDAR, and LiDAR also has the ranging noise. As a result, the shape of OBB on the length and width plane could have distortion. Hence, $t_{V}$ is set as 1.5 in this work.

The last step of ROIs detection is to extract the points that fall into the OBBs of the valid candidates from the raw point cloud (See Fig. 4(c)). Note that a buffer is adopted when extracting these points. The explanation of the buffer is as follows. As mentioned above, the length $l$ , width $w$ , and height $δ$ of OBB is obtained. Furthermore, the 6-DOF pose of the OBB, $T_{O B B}$ , is also acquired [23]. The OBB frame denotes a coordinate system whose X, Y, and Z axes are parallel to the length, width, and height of the bounding box. The origin of the OBB frame locates at the center of the bounding box. Thereafter, $T_{O B B}$ denotes the transmission from the global coordinate system ${G}$ to the OBB frame. The adoption of the buffer implies that the extracted points are those that fall into the extended OBB rather than the original OBB given by the Euclidean clustering [21]. In particular, the extended OBB utilizes the same $T_{O B B}$ as the original OBB; however, the length, width, and height of the extended OBB are $l \times t_{b}$ , $w \times t_{b}$ , and $δ \times t_{b}$ where $t_{b}$ is an amplification factor. Note that the aim of adopting the buffer is to preserve more regions around the original OBB in case it does not cover the fiducial marker completely. Fig. 9(b) shows the ROIs detection result.

Fig. 9: (a): The clustering result after filtering out the impractical candidates of Fig. 5(b). (b): ROIs extracted from the raw point cloud. The area of ROI is larger than the original OBB which is due to the adoption of the buffer.

Iv Marker Detection

Marker detection is carried out in the ROIs detected in the previous Section. Although the spherical projection [15] is not applicable to the multi-viewpoint point cloud, by virtue of the algorithm introduced in this Section, the marker detection is realized through a boosted spherical projection.

Define the point set of the $j$ -th ROI as $P_{j}$ (See Fig. 9(b)). Suppose that $p_{j} \in R^{3}$ is a point of $P_{j}$ . $p_{j}$ can be transmitted to the origin of the global coordinate system ${G}$ through the inverse of $T_{O B B}$ :

	$p_{G}$	$= T_{O B B}^{- 1} \cdot p_{j}$		(11)
		$= R^{- 1} \cdot p_{j} - R^{- 1} t$		(11)

where $p_{G}$ is the point transmitted to the origin of ${G}$ . $T_{O B B} = [R | t]$ , where $R$ is the $3 \times 3$ orthogonal rotation matrix and $t$ is the $3 \times 1$ translation vector. After transmitting all the points in $P_{j}$ using Eq. (11), we obtain a new point set $P_{j}^{G}$ . Then, define an intermediate plane $P$ in ${G} \in R^{3}$ . The plane equation of $P$ is $x = 1$ (the unit is in meter). Fig. 10(a) shows a schematic of the intermediate plane. All points belonging to $P_{j}^{G}$ are then transferred to the intermediate plane:

	$p_{j}^{'}$	$= T_{i n} \cdot p_{G}$		(12)
		$= R_{i n} \cdot p_{G} + t_{i n}$		(12)

where $p_{j}^{'}$ is the point transmitted to the intermediate plane. $T_{i n} = [R_{i n} | t_{i n}]$ where $R_{i n} = I_{3 \times 3}$ is an identity matrix and $t_{i n} = [1, 0, 0]^{T}$ (the unit is in meters) as the plane equation is $x = 1$ . Define the transmitted point set on the intermediate plane as $P_{j}^{'}$ . Fig. 10(b) shows the comparison of the raw ROI, $P_{j}$ , and the transmitted ROI, $P_{j}^{'}$ .

Fig. 10: (a): A schematic of the intermediate plane. Every ROI is transmitted to this plane to conduct marker detection. (b): Comparison of the raw ROI and the ROI transmitted to the intermediate plane.

Note that thus far the assumption that the points in the raw point cloud are sampled from one viewpoint has never been used. All the above processes are applicable to the multi-viewpoint point cloud. No matter how many viewpoints exist in the raw point cloud, after transferring the ROIs to the intermediate plane, there will only be one viewpoint. This viewpoint is from the perspective of the LiDAR sensor placed at the origin of ${G}$ , with all axes of the sensor frame aligned to those of ${G}$ , and with all points in ROIs placed one meter in front of the LiDAR sensor. Hence, the spherical projection is applicable to the ROIs transferred to the intermediate plane. Given that the spherical projection is applicable on the intermediate plane, the method proposed in the IILFM system [15] is utilized to detect the marker through the intensity image. Suppose that the Cartesian coordinates and the spherical coordinates of $p_{j}^{'}$ with respect to ${G}$ are $[x, y, z]^{T}$ and $[θ, ϕ, r]^{T}$ , respectively. $θ$ denotes the azimuth and $ϕ$ denotes the inclination. $r$ represents the range from $p_{j}^{'}$ to the origin of ${G}$ . Define the projection of $p_{j}^{'}$ on the image plane as u and $[u, v]^{T}$ are the image coordinates of u. The process of projecting $p_{j}^{'}$ onto the image plane can be summarized as transferring the Cartesian coordinates into the spherical coordinates, and then into the image coordinates:

		$θ = arctan (\frac{y}{x})$		(13)
		$ϕ = arctan (\frac{z}{\sqrt{x^{2} + y^{2}}})$
		$r = \sqrt{x^{2} + y^{2} + z^{2}}$
		$u = ⌈ \frac{θ}{Θ_{a}} ⌋ + u_{o}, v = ⌈ \frac{ϕ}{Θ_{i}} ⌋ + v_{o}$

where $⌈ ⌋$ represents the rounding of a value to the nearest integer. $Θ_{a}$ and $Θ_{i}$ are the angular resolutions in $u$ (azimuth) and $v$ (inclination) directions, respectively. They are attributes determined by the employed LiDAR model. $u_{o}$ and $v_{o}$ are the offsets. Then, the corresponding pixel is rendered by the intensity value of $p_{j}^{'}$ . After projecting all points in $P_{j}$ onto the image plane, the intensity image is generated. Hereafter, the raw intensity image is transformed to a binarized image via preprocessing. The binarized image is inputted into the embedded VFM system, which outputs the detected 2D features. Afterwards, the 2D features are projected back to 3D space through the reversal of Eq. (13). The brief outline of the above-introduced marker detection process is shown in Fig. 11. Please refer to [15] to see the detailed implementation.

Fig. 11: The pipeline of the IILFM system [15].

The IILFM system [15] outputs the 3D fiducials, however, as shown in Fig. 11, they are still in the intermediate plane since the inputted point cloud is the transmitted ROI. For the $j$ th ROI, $P_{j}$ , define the point set composed of the 3D fiducials outputted by the IILFM system as $F_{j}$ . Note that if $F_{j}$ exists, it implies that $P_{j}$ is a point set that contains the fiducial marker. Otherwise, $P_{j}$ will be rejected at this step. This is the reason why the first criterion, which subjects to the marker size(s), in Section III-C is not mandatory. Suppose that $f^{'} \in R^{3}$ is a fiducial belonging to $F_{j}$ , $f^{'}$ can be transmitted from the intermediate plane back to the original location by the reverse processes of Eqs. (11-12):

	f	$= T_{O B B} T_{i n}^{- 1} \cdot f^{'}$		(14)
		$= R R_{i n}^{- 1} \cdot f^{'} - R R_{i n}^{- 1} t_{i n} + t$		(14)

where f is a 3D fiducial in the raw point cloud. After applying Eq. (14) to all the points belonging to $F_{j}$ , we obtain the locations of the 3D fiducials with respect to ${G}$ . Note that each fiducial is labeled with an ID number and a vertex index. So far marker detection of the $j$ -th candidate ROI, $P_{j}$ , is accomplished. Hereafter, the above marker detection process is conducted on every valid candidate ROI until all ROIs are checked.

V Experimental Results

V-a Qualitative Evaluation

The qualitative evaluation of the proposed algorithm is first conducted on the point cloud as shown in Fig. 2. Fig. 12 illustrates the results of the crucial procedures and marker detection. As shown in this Figure, the two ArUco markers are detected and the vertice locations of the markers are visually precise. This demonstrates that the proposed algorithm is competent detecting the fiducial markers in the multi-viewpoint point cloud.

Fig. 12: The evaluation of marker detection on a multi-viewpoint point cloud. A detailed introduction to the construction of this multi-viewpoint point cloud is given in the caption of Fig. 2.

Fig. 13: The evaluation of marker detection on a multi-viewpoint point cloud, which is a stitch of multiple one-viewpoint point clouds.

The second benchmark, as shown in Fig. 13, is a stitch of two sub-point-clouds and each of them is sampled from a specific viewpoint by the Livox MID-40 LiDAR. As illustrated in the top view, observing along the X-axis of the global coordinate system ${G}$ , the back sub-point-cloud is totally blocked by the front one. Again, the two AprilTags are detected and the vertices are labeled in the point cloud.

V-B Quantitative Evaluation

Fig. 14: The experimental setup for quantitative evaluation. A Letter size AprilTag is put on the wall and the distance between the LiDAR and the marker is 2 meters. The OptiTrack system provides the ground truth of the 6-DOF LiDAR pose. The sampled point cloud is shown on the right side.

As shown in Figs. 12 and 13, the locations of the vertices are evidently marked. Despite that, a quantitative evaluation is necessary to illustrate the accuracy of the proposed algorithm. However, it is unfair to compare the proposed algorithm with IIFLM on the multi-viewpoint point cloud since the latter one is incompetent for operating on the multi-viewpoint point cloud. For this reason, we adopt the scenario in which IIFLM also works to carry out the comparison in terms of accuracy and efficiency between both methods. In particular, the dataset collected in [15] is utilized. The detailed setup is shown in Fig. 14.

System	Vertex	x (m)	y (m)	z (m)	Error (m)
Ground Truth	1	2.001	0.170	0.130	–
	2	2.002	0.005	0.129	–
	3	2.001	0.014	0.293	–
[15]	4	2.002	0.178	0.294	–
IILFM	1	1.992	0.169	0.122	0.012
	2	1.970	0.003	0.122	0.033
	3	1.998	0.011	0.290	0.005
[15]	4	1.965	0.170	0.283	0.039
Proposed	1	1.992	0.171	0.123	0.011
	2	1.998	0.003	0.123	0.033
	3	1.998	0.013	0.294	0.017
Algorithm	4	1.962	0.171	0.284	0.042

TABLE I: Comparison of the IILFM system and the proposed algorithm

Table I shows the comparison of the IILFM system [15] and the proposed system regarding vertices estimation. As illustrated in the table, IILFM and the proposed algorithm output similar vertices estimation results and the accuracy is similar. It is not complicated to explain the similarity as the marker detection of the proposed algorithm is still executed through IILFM [15] (See Section IV) with the preprocessing introduced in Section III and IV and the post-process given in Eq. (14) added. Moreover, the preprocessing (ROIs detection) only involves point cloud extraction, which does not affect the points’ physical locations and intensity values; The post-process (Eq. (14)) is a simple Euclidean transmission. Hence, the trivial difference between the results of IILFM and the proposed algorithm comes from the transmission of the ROIs to the intermediate plane (See Fig. 10). Eqs. (11-12) demonstrate that the azimuth and inclination of a point will be changed after it is transmitted to the intermediate plane. Moreover, as indicated in Eq. (13), the change of inclination and azimuth has a direct influence on the intensity image. That is, the intensity images obtained through IILFM and the proposed algorithm are different. However, as demonstrated in table I, the influence on the estimation of vertices due to the difference of intensity images is trivial.

V-C Computational Time Analysis

The computational time analysis is conducted on a desktop with an Intel Xeon W-1290P CPU. The IILFM system takes around 115 ms to detect the marker in the point cloud shown in Fig. 14 while the newly proposed algorithm requires 965 ms. Compared to IILFM [15], new procedures, such as ROIs detection and transmission (See Sections III and IV), are added in the proposed algorithm. However, the proposed algorithm is designed as a back-end process to detect the fiducial markers in the multi-viewpoint point cloud, for instance, the large-scale 3D map built by SLAM.

Vi Conclusions

A novel algorithm to detect the fiducial markers in the multi-viewpoint point cloud is developed in this letter. The proposed algorithm gets rid of the limitations of the existing LiDAR fiducial marker detection methods, such as marker placement restrictions [9] and the requirement for the establishment of a spherical projection [15]. Despite the fact that the fiducial markers are not spatially distinguishable from their attached planes, the point clusters that could include the fiducial markers, called ROIs, are detected through the intensity feature extraction and clustering. The problem that the spherical projection is not applicable to the multi-viewpoint point cloud is resolved by introducing an intermediate plane, where the ROIs are projected and the marker detection is conducted. The proposed algorithm allows fiducial marker detection when LiDAR is in motion.

Acknowledgment

The authors would like to thank Han Wang, Brian Lynch, Shuo Zhang, Marc Savoie, and Shiyuan Jia for insightful discussions.

References

[1] D. Avola, L. Cinque, G. L. Foresti, C. Mercuri, and D. Pannone (2016) A practical framework for the development of augmented reality applications by using aruco markers. In Proc. International Conference on Pattern Recognition Applications and Methods, Vol. 2, pp. 645–654. Cited by: §I.
[2] T. D. Barfoot (2017) State estimation for robotics. Cambridge University Press. Cited by: §I, §II.
[3] J. M. Biosca and J. L. Lerma (2008) Unsupervised robust planar segmentation of terrestrial laser scanner point clouds based on fuzzy clustering methods. ISPRS Journal of Photogrammetry and Remote Sensing 63 (1), pp. 84–98. Cited by: §III.
[4] A. Borowczyk, D. Nguyen, A. P. Nguyen, D. Q. Nguyen, D. Saussié, and J. Le Ny (2017) Autonomous landing of a quadcopter on a high-speed ground vehicle. Journal of Guidance, Control, and Dynamics 40 (9), pp. 2378–2385. Cited by: §I.
[5] J. DeGol, T. Bretl, and D. Hoiem (2018) Improved structure from motion using fiducial marker matching. In Proc. of the European Conference on Computer Vision, pp. 273–288. Cited by: §I.
[6] L. Fan, X. Xiong, F. Wang, N. Wang, and Z. Zhang (2021) RangeDet: in defense of range view for lidar-based 3d object detection. In Proc. IEEE/CVF International Conference on Computer Vision, Cited by: §I, §II, §II, §III.
[7] M. A. Fischler and R. C. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (6), pp. 381–395. Cited by: Fig. 4, §III.
[8] J. Huang and J. W. Grizzle (2020) Improvements to target-based 3d lidar to camera calibration. IEEE Access 8 (), pp. 134101–134110. External Links: Document Cited by: §I.
[9] J. Huang, S. Wang, M. Ghaffari, and J. W. Grizzle (2021) LiDARTag: a real-time fiducial tag system for point clouds. IEEE Robotics and Automation Letters 6 (3), pp. 4875–4882. External Links: Document Cited by: Fig. 1, §I, §I, §I, §II, §II, §II, Fig. 4, §III, §VI.
[10] S. C. Johnson (1967) Hierarchical clustering schemes. Psychometrika 32 (3), pp. 241–254. Cited by: Fig. 1, §II, §III.
[11] K. Koster and M. Spann (2000) MIR: an approach to robust clustering-application to range image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (5), pp. 430–444. Cited by: §III.
[12] M. Krogius, A. Haggenmiller, and E. Olson (2019) Flexible layouts for fiducial tags. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1898–1903. External Links: Document Cited by: Fig. 6, §III-A.
[13] J. Lin and F. Zhang (2020) Loam livox: a fast, robust, high-precision lidar odometry and mapping package for lidars of small fov. In Proc. IEEE International Conference on Robotics and Automation, pp. 3126–3131. Cited by: Fig. 2, §I, §I.
[14] Y. Liu, H. Schofield, and J. Shan (2021) Navigation of a self-driving vehicle using one fiducial marker. In Proc. IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Vol. , pp. 1–6. External Links: Document Cited by: §I.
[15] Y. Liu, H. Schofield, and J. Shan (2022) Intensity image-based lidar fiducial marker system. IEEE Robotics and Automation Letters 7 (3), pp. 6542–6549. External Links: Document Cited by: §I, §I, §I, §II, §II, §II, §III, Fig. 11, §IV, §IV, §IV, §V-B, §V-B, §V-C, TABLE I, §VI.
[16] G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. K. Wellington (2019) Lasernet: an efficient probabilistic 3d object detector for autonomous driving. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12677–12686. Cited by: §II, §III.
[17] M. Muja and D. G. Lowe (2009) Fast approximate nearest neighbors with automatic algorithm configuration.. VISAPP (1) 2 (331-340), pp. 2. Cited by: §III-A.
[18] R. Muñoz-Salinas, M. J. Marin-Jimenez, and R. Medina-Carnicer (2019) SPM-slam: simultaneous localization and mapping with squared planar markers. Pattern Recognition 86, pp. 156–171. Cited by: §I.
[19] F. J. Romero-Ramirez, R. Muñoz-Salinas, and R. Medina-Carnicer (2018) Speeded up detection of squared fiducial markers. Image and vision Computing 76, pp. 38–47. Cited by: Fig. 2.
[20] R. B. Rusu and S. Cousins (2011) 3d is here: point cloud library (pcl). In Proc. IEEE international conference on robotics and automation, pp. 1–4. Cited by: §III-A.
[21] R. B. Rusu (2010) Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24 (4), pp. 345–348. Cited by: §III-B, §III-C.
[22] A. D. Sappa and M. Devy (2001) Fast range image segmentation by an edge detection strategy. In Proc. International Conference on 3-D Digital Imaging and Modeling, pp. 292–299. Cited by: §III.
[23] P. Schneider and D. H. Eberly (2002) Geometric tools for computer graphics. Elsevier. Cited by: §III-C, §III-C, §III-C.
[24] H. Wang, C. Wang, and L. Xie (2021) Lightweight 3-d localization and mapping for solid-state lidar. IEEE Robotics and Automation Letters 6 (2), pp. 1801–1807. External Links: Document Cited by: §I.
[25] B. Yang, W. Luo, and R. Urtasun (2018) Pixor: real-time 3d object detection from point clouds. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7652–7660. Cited by: §III.
[26] J. Zhang and S. Singh (2014) LOAM: lidar odometry and mapping in real-time.. In Robotics: Science and Systems, Vol. 2, pp. 1–9. Cited by: §I.
[27] S. Zhang, J. Shan, and Y. Liu (2022) Variational bayesian estimator for mobile robot localization with unknown noise covariance. IEEE/ASME Transactions on Mechatronics (), pp. 1–9. External Links: Document Cited by: §I.
[28] W. Zhen, Y. Hu, H. Yu, and S. Scherer (2020) LiDAR-enhanced structure-from-motion. In Proc. IEEE International Conference on Robotics and Automation, pp. 6773–6779. Cited by: §I.