Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF timestamp problem #717

Open
Renbago opened this issue Jul 3, 2024 · 4 comments
Open

TF timestamp problem #717

Renbago opened this issue Jul 3, 2024 · 4 comments

Comments

@Renbago
Copy link

Renbago commented Jul 3, 2024

Required Info:
Tested on SAHA-ROBOTIK Speedy

  • Operating System:
    Ubuntu 20.04
  • Installation type:
    Source
  • ROS Version
    Ros2 Humble
  • Version or commit hash:
    Latest Humble Branch
  • Laser unit:
    RPlidar s2

Problem description

With large amount(the total amount of edges and nodes in current graph is 2190) with pose_graph data the

msg.header.stamp = scan_timestamp + transform_timeout_;
time difference between time.now and the last data coming from scan_timestamp increases (6-8 seconds) while publishing the info. It's synchronized with time.now when reached goal or pose.
The LOG:

[INFO] [1719387024.208723337] [SR64S1.global_costmap.global_costmap]: Message Filter dropping message: frame 'main_lidar_link' at time 1719387023.839 for reason 'the timestamp on the message is earlier than all the data in the transform cache'
[INFO] [1719387024.227065683] [SR64S1.global_costmap.global_costmap]: Message Filter dropping message: frame 'obstacle_lidar_link' at time 1719387023.926 for reason 'the timestamp on the message is earlier than all the data in the transform cache'
[INFO] [1719387024.272666888] [SR64S1.global_costmap.global_costmap]: Message Filter dropping message: frame 'base_link' at time 1719387023.802 for reason 'the timestamp on the message is earlier than all the data in the transform cache' 
[ERROR] [1720007812.847318626] [transformPoseInTargetFrame]: Extrapolation Error looking up target frame: Lookup would require extrapolation into the past.  Requested time 1720007802.415600 but the earliest data is at time 1720007802.817890, when looking up transform from frame [base_footprint] to frame [map]

After ERROR robot stops, then matching pose_graph with blue link and robot creating path again.
By using same map with less pose_graph, I get less INFO and ERROR,(if its per seconds with less maps like 10 sec)

From past issues i found this merge: #377 , also after searched ros.org I found the same issue: https://answers.ros.org/question/393581/for-nav2-lidar-timestamp-on-the-message-is-earlier-than-all-the-data-in-the-transform-cache/

His comment:

ok, this issue and my [other one] 
([https://answers.ros.org/question/395470/slam_toolbox-map-odom-transform-stuck-at-time-0200/)]) are somehow linked. 
Under Galactic, the slam_toolbox timestamps the TF map-odom with the timestamp of the last 
laserscan + transform_timeout offset ([see here](1.).
Under Foxy, it was timestamped with now() + transform_timeout offset ([see here](2.)).

I replaced in Galactic the laser timestamp by now(), and my two issues got resolved...
 definitively something to look here.

the links in comment are :

  1. msg.header.stamp = scan_timestamped + transform_timeout_;
  2. msg.header.stamp = this->now() + transform_timeout_;

After changing the
msg.header.stamp = scan_timestamp + transform_timeout_; with msg.header.stamp = this->now() + transform_timeout_;

Not getting INFO and ERROR. But this is not solving the problem, because of the matching blue link robot losing localisation.

The large pose_graph map large_map

With scan_timestamp :

Screencast.from.07-03-2024.03.39.08.PM.webm

With this->now():

Screencast.from.07-03-2024.03.59.13.PM.webm
Screencast.from.07-03-2024.04.02.21.PM.webm

Additional information

tf2_monitor outputs:

RESULTS: for all Frames

Frames:
Frame: base_footprint, published by <no authority available>, Average Delay: 0.00701269, Max Delay: 0.0276206
Frame: base_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: bottom_shelf_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: charging_block_bottom_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: charging_block_top_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: chassis_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: cliff_sensor_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: docking_virtual_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: eyes_cyl_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: eyes_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: front_camera_bottom_screw_frame, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: front_camera_color_frame, published by <no authority available>, Average Delay: 880.685, Max Delay: 880.685
Frame: front_camera_color_optical_frame, published by <no authority available>, Average Delay: 880.685, Max Delay: 880.685
Frame: front_camera_depth_frame, published by <no authority available>, Average Delay: 880.714, Max Delay: 880.714
Frame: front_camera_depth_optical_frame, published by <no authority available>, Average Delay: 880.714, Max Delay: 880.714
Frame: front_camera_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: imu_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: imu_link_transform, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: left_mast_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: left_wheel_link, published by <no authority available>, Average Delay: 0.00311589, Max Delay: 0.135685
Frame: logo_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: main_lidar_dummy_cloud_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: main_lidar_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: middle_shelf_link, published by <no authority available>, Average Delay: 881.54, Max Delay: 881.54
Frame: obstacle_lidar_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: odom, published by <no authority available>, Average Delay: -0.898396, Max Delay: 0
Frame: right_mast_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: right_wheel_link, published by <no authority available>, Average Delay: 0.00311771, Max Delay: 0.135687
Frame: scu_v3_left_bottom, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: supp_wheel_bl_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: supp_wheel_br_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: supp_wheel_fl_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: supp_wheel_fr_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541
Frame: trays_head_link, published by <no authority available>, Average Delay: 881.541, Max Delay: 881.541

All Broadcasters:
Node: <no authority available> 117.914 Hz, Average Delay: -0.377333 Max Delay: 0.0276206
^C[INFO] [1720008636.915834597] [rclcpp]: signal_handler(signum=2)
@SteveMacenski
Copy link
Owner

SteveMacenski commented Jul 3, 2024

time difference between time.now and the last data coming from scan_timestamp increases (6-8 seconds) while publishing the info

Is that in the sync mode? What about async mode? As the documentation describes, async is meant to take data as it comes rather than sync which buffer up all data and makes sure that you process every possible scan. If you have a small compute and a sufficiently large space, you can find the bounds of what you can do in real-time before your computer can't keep up with the process. async was developed specifically with that in mind so that it didn't fall behind - it only looks for new scans to process after the last scan was processed.

That's not a particularly large map, but I also suspect that you don't have a very powerful computer on there.

Another option is to live use async to get a good but not best map for use then-and-there and then use cloud services with bagged data with sync to get the best map with all the data for later use offline. For one reason or another, one way or another, most companies end up having some level of duality of cloud services for getting 'improved' maps.

But, this is all described in the docs and in the ROSCon talk.

After changing the
msg.header.stamp = scan_timestamp + transform_timeout_; with msg.header.stamp = this->now() + transform_timeout_;
Not getting INFO and ERROR. But this is not solving the problem, because of the matching blue link robot losing localisation.

That is mis-stamping your laser scans, that's not a real solution and will cause issues in time synchronization of your product.

@Renbago
Copy link
Author

Renbago commented Jul 5, 2024

Hi, sorry for late reply. In issue we wre using async mode. The hardware is nuc11tnki5

For more to understand problem:

Screencast.from.07-05-2024.03.10.27.PM.webm
/*****************************************************************************/
void SlamToolbox::publishTransformLoop(
  const double & transform_publish_period)
/*****************************************************************************/
{
  if (transform_publish_period == 0) {
    return;
  }

  rclcpp::Rate r(1.0 / transform_publish_period);
  while (rclcpp::ok()) {
    {
      boost::mutex::scoped_lock lock(map_to_odom_mutex_);
      auto current_time = this->now();
      rclcpp::Time scan_timestamp = scan_header.stamp;
      RCLCPP_INFO(get_logger(), "pasted_time_last_recieved_scan_data = %f", (current_time - scan_timestamp).seconds());
      // Avoid publishing tf with initial 0.0 scan timestamp
      if (scan_timestamp.seconds() > 0.0 && !scan_header.frame_id.empty()) {
        geometry_msgs::msg::TransformStamped msg;
        msg.transform = tf2::toMsg(map_to_odom_);
        msg.child_frame_id = odom_frame_;
        msg.header.frame_id = map_frame_;
        msg.header.stamp = scan_timestamp + transform_timeout_;
        tfB_->sendTransform(msg);
      }
    }
    r.sleep();
  }
}

@SteveMacenski
Copy link
Owner

I'm not entirely sure what you're showing or explaining, please provide some context for that video / code.

Are you sure that your laser scanner node is giving you properly stamped times? Is there any major latency in the system? Id be interested in seeing the time difference between the node->now() time and the scan message header in the main laser callback in async mode.

@CihatAltiparmak
Copy link
Contributor

CihatAltiparmak commented Jul 8, 2024

When i saw this ticket related to my problem, i wanted to reply this ticket.

Required Info:

  • Operating System:
    Ubuntu 22.04 jammy
  • Installation type:
    from source
  • ROS Version
    • ros2 iron
  • Version or commit hash:
    a108d26
  • Laser unit:
    • Default Lidar Plugin In turtlebot3_gazebo
  • Testted on:
    • Turtlebot3 Gazebo Simulation

Steps to reproduce issue

just clone navigation2 and slam_toolbox into your workspace

mkdir slam_ws/src -p
cd slam_ws/src
git clone https://github.com/SteveMacenski/slam_toolbox
git clone https://github.com/ros-navigation/navigation2
cd ..
colcon build

And then download this pose graph from google drive and extract to your ws.

While running localization launch of slam_toolbox, use this config file.

mapper_params_localization.yaml

slam_toolbox:
  ros__parameters:
    solver_plugin: solver_plugins::CeresSolver
    ceres_linear_solver: SPARSE_NORMAL_CHOLESKY
    ceres_preconditioner: SCHUR_JACOBI
    ceres_trust_strategy: LEVENBERG_MARQUARDT
    ceres_dogleg_type: TRADITIONAL_DOGLEG
    ceres_loss_function: None

    # ROS Parameters
    odom_frame: odom
    map_frame: map
    base_frame: base_footprint
    scan_topic: /scan
    mode: localization

    # if you'd like to start localizing on bringup in a map and pose
    #map_file_name: test_steve
    #map_start_pose: [5.0, 1.0, 0.0]
    map_file_name: /home/<your_username>/slam_ws/big_pose_graph/my_pose
    map_start_pose: [-2.0, -0.5, 0.0]

    debug_logging: false
    throttle_scans: 1
    transform_publish_period: 0.02 #if 0 never publishes odometry
    map_update_interval: 5.0
    resolution: 0.05
    min_laser_range: 0.0 #for rastering images
    max_laser_range: 20.0 #for rastering images
    minimum_time_interval: 0.5
    transform_timeout: 0.2
    tf_buffer_duration: 30.
    stack_size_to_use: 40000000 #// program needs a larger stack size to serialize large maps

    # General Parameters
    use_scan_matching: true
    use_scan_barycenter: true
    minimum_travel_distance: 0.5
    minimum_travel_heading: 0.5
    scan_buffer_size: 3
    scan_buffer_maximum_scan_distance: 10.0
    link_match_minimum_response_fine: 0.1  
    link_scan_maximum_distance: 1.5
    do_loop_closing: true 
    loop_match_minimum_chain_size: 3
    loop_match_maximum_variance_coarse: 3.0  
    loop_match_minimum_response_coarse: 0.35    
    loop_match_minimum_response_fine: 0.45

    # Correlation Parameters - Correlation Parameters
    correlation_search_space_dimension: 0.5
    correlation_search_space_resolution: 0.01
    correlation_search_space_smear_deviation: 0.1 

    # Correlation Parameters - Loop Closure Parameters
    loop_search_space_dimension: 8.0
    loop_search_space_resolution: 0.05
    loop_search_space_smear_deviation: 0.03
    loop_search_maximum_distance: 3.0

    # Scan Matcher Parameters
    distance_variance_penalty: 0.5      
    angle_variance_penalty: 1.0    

    fine_search_angle_offset: 0.00349     
    coarse_search_angle_offset: 0.349   
    coarse_angle_resolution: 0.0349        
    minimum_angle_penalty: 0.9
    minimum_distance_penalty: 0.5
    use_response_expansion: true

and then just run this 4 launch in separate terminals (firstly go to workspace per each terminal)

ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py use_sim_time:=True
ros2 launch nav2_bringup navigation_launch.py use_sim_time:=True
ros2 launch slam_toolbox localization_launch.py use_sim_time:=True
rviz2 -d src/navigation2/nav2_bringup/rviz/nav2_default_view.rviz

Finally send goal pose to upper-right corners of map like below.

Screenshot from 2024-07-08 16-12-14

Expected behavior

Localization should work well even with big pose graph

Actual behavior

ProcessLocalization() -> m_pGraph->AddEdges(pScan, cov) -> LinkNearChains(pScan, means, covariances) function chain in karto/Mapper.cpp takes 0.9 seconds when bigger pose is used for localization. This time consuming execution breaks navigation2 trajectory execution lifecycle because of time delay between map and odom frame.

Additional information

When i change link_scan_maximum_distance parameter as 0.5, it works. But with default configuration, below parts of code becomes mostly time-consuming.

if (m_pUseScanMatching->GetValue()) {
// add to graph
scan_vertex = m_pGraph->AddVertex(pScan);
m_pGraph->AddEdges(pScan, cov);

LinkNearChains(pScan, means, covariances);

const std::vector<LocalizedRangeScanVector> nearChains = FindNearChains(pScan);
const_forEach(std::vector<LocalizedRangeScanVector>, &nearChains)
{
if (iter->size() < m_pMapper->m_pLoopMatchMinimumChainSize->GetValue()) {
continue;
}
Pose2 mean;
Matrix3 covariance;
// match scan against "near" chain
kt_double response = m_pMapper->m_pSequentialScanMatcher->MatchScan(pScan, *iter, mean,
covariance, false);
if (response > m_pMapper->m_pLinkMatchMinimumResponseFine->GetValue() - KT_TOLERANCE) {
rMeans.push_back(mean);
rCovariances.push_back(covariance);
LinkChainToScan(*iter, pScan, mean, covariance);
}
}

In my case, FindNearChains gives 70 sample. ScanMacher does its own job in approximately 0.01~0.02 seconds. Total time approximately equals 0.8 seconds.

It would be great that you give advice for this problem. Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants