SpringBoot scheduled task executes abnormally

I packaged a Tcp client based on Netty, the purpose is to initiate a connection to the server to get data and decode, I designed a main workflow, based on the SpringBoot timed task api, I query the database once every 5 seconds to query the latest full amount of device information, and then check the device’s online information for devices that are not online then re-initiate a connection, here is my main logic, in order to avoid blocking the main thread of the timed task, I use SpringBoot asynchronous api combined with ComptableFutrue to pull device information from the database.connection, the following is my main logic, in order to avoid blocking the main thread of the timed task, I use SpringBoot’s asynchronous api combined with ComptableFutrue to pull device information from the database The main implementation code is as follows TCP Client Manager class:

    @Scheduled(cron = "0/5 * * * * ?")
    public void checkConnection() {
        log.info("check connection at {}", LocalDateTime.now());
        unitService.getAll().whenCompleteAsync((res,ex)-> {
            if (ex != null) {
                log.error("get all units error", ex);
                return;
            }
            for (Unit unit : res) {
                Client client = getById(unit.getId());
                if (client == null) {
                    var newClient=createClient(unit);
                    if (newClient!=null){
                        CLIENT_MAP.put(unit.getId(), newClient);
                    }
                }
                else {
                    if (!client.isOnline()) {
                        log.info("client {} is offline, reconnecting...", unit.getId());
                        client.connectAsync();
                    }
                }
            }
        });
    }


    private Client createClient(Unit unit) {
        switch (unit.getUnitType()) {
            case  ELEVATOR,TWIN->{
                return new CanTCPClient( unit,propProcessor,canProtocolLoader);
            }
            case ESCALATOR,TRAVELATOR ->{
                return new ModbusClient(unit,propProcessor);
            }
            default->{
                log.error("Unsupported device type: {}", unit);
            }
        }
        return null;
    }
    public Client getById(Long id) {
        return  CLIENT_MAP.get(id);
    }

TCP client class:

@Slf4j
public class CanTCPClient extends Client{
    private static final EventLoopGroup eventLoopGroup= new NioEventLoopGroup();
    private final PropProcessor propProcessor;
    private final CanProtocolLoader canProtocolLoader;
    @Setter
    private Channel channel;
    public CanTCPClient(Unit unit, PropProcessor propProcessor, CanProtocolLoader canProtocolLoader) {
        super(unit);
        this.propProcessor = propProcessor;
        this.canProtocolLoader = canProtocolLoader;
    }

    @Override
    public boolean isOnline() {
        return gatewayOnline&&deviceOnline;
    }

    @Override
    public CompletableFuture<Void> close() {
        if (channel == null) {
            return CompletableFuture.completedFuture(null);
        }
        CompletableFuture<Void> completableFuture = new CompletableFuture<>();
        ChannelFuture channelFuture = channel.close();
        channelFuture.addListener(future -> {
            if (future.isSuccess()) {
                completableFuture.complete(null);
            } else {
                completableFuture.completeExceptionally(future.cause());
            }
        });
        return completableFuture;
    }

    @Override
    public void connectAsync(){
        String ipAddress = unit.getIpAddress();
        int port = unit.getPort();
        Bootstrap bootstrap = new Bootstrap();
        bootstrap.group(eventLoopGroup)
                .channel(NioSocketChannel.class)
                .option(ChannelOption.SO_KEEPALIVE, true)
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000)
                .option(ChannelOption.TCP_NODELAY, true);
        bootstrap.handler(new ChannelInitializer<SocketChannel>() {
            @Override
            protected void initChannel(SocketChannel socketChannel) {
                socketChannel.pipeline()
                        .addLast(new CanCodec())
                        .addLast(new DeviceOnlineHandler())
                        .addLast(new CanFrameDecoder(canProtocolLoader,new BasicDecoder()))
                        .addLast(propProcessor)
                        .addLast(new ExceptionHandler());
                socketChannel.attr(CLIENT_ATTRIBUTE_KEY).set(CanTCPClient.this);
                setChannel(socketChannel);
            }
        });
        ChannelFuture channelFuture= bootstrap.connect(ipAddress, port);
        channelFuture.addListener(future -> {
            if (future.isSuccess()) {
                this.setGatewayOnline(true);
                log.info("connect to [{}] can server success,ip={},port={}",unit.getName(), ipAddress, port);
            } else {
                this.setGatewayOnline(false);
                log.error("connect to[{}]CAN server fail,ip={},port={}",unit.getName(), ipAddress, port);
            }
        });
    }
}

After SpringBoot starts up normally I get an exception to the running condition The logs are printed except for the first time when the application starts up after getting the following logs

2024-09-13 13:17:13.821 ERROR [][] [nioEventLoopGroup-5-1] com.rms.session.CanTCPClient:97 - connect to [L1 shaft upper car] CAN gateway fail,ip=192.168.103.113,port=4001
2024-09-13 13:17:13.822 ERROR [][] [nioEventLoopGroup-5-3] com.rms.session.CanTCPClient:97 -  connect to [L3 shaft upper car] CAN gateway fail,ip=192.168.200.103,port=4001

The programme seems to be in a stalled state, but not completely blocked, as I have observed that the logs appear randomly at random moments, not once every 5 seconds as I would have expected.

I expect the programme to work according to the vision I’ve described, with the aim of initiating a request to check the connection once every five seconds, and re-initiating the connection for offline clients

It sounds like you have a solid plan for managing your TCP connections with Netty in a Spring Boot application, but you’re encountering issues with connection attempts failing and the application appearing to stall or not behave as expected. Let’s go through a few areas where things might be going awry and some suggestions for improving your implementation.

Potential Issues and Suggestions

  1. Concurrency with EventLoopGroup: You are using a static EventLoopGroup for your CanTCPClient. This can lead to problems if multiple clients are being handled concurrently. A single EventLoopGroup might not be sufficient, or it might get overwhelmed if you’re making multiple connection attempts simultaneously.Suggestion: Consider creating a separate EventLoopGroup for each client or managing a pool of EventLoopGroups to handle the load more effectively.
  2. Channel Initialization: The ChannelFuture returned from bootstrap.connect(ipAddress, port) should be handled properly. It seems like the failure message might be due to network issues or configuration problems.Suggestion: Make sure the IP address and port are correct and reachable from your application. You might want to add more detailed logging for the exception cause to understand why the connection fails.
  3. Asynchronous Execution in checkConnection: Using whenCompleteAsync with asynchronous execution is generally correct, but it’s important to ensure that the async execution doesn’t run into issues with the Spring Boot thread pool.Suggestion: Verify the thread pool configuration used by Spring Boot for asynchronous tasks. Ensure it’s adequate for your load. You might want to try a different async method or explicitly specify a thread pool if you’re experiencing contention.
  4. Logging and Debugging: The logs indicate connection failures, but don’t provide detailed reasons. Enhancing the logging in your connectAsync method to capture more information can be helpful.Suggestion: Add more granular logging in the connectAsync method and in the exception handling parts to understand better what’s going wrong during connection attempts. Also, log the complete stack trace of the exceptions.
channelFuture.addListener(future -> {
    if (future.isSuccess()) {
        this.setGatewayOnline(true);
        log.info("Connect to [{}] CAN server success, ip={}, port={}", unit.getName(), ipAddress, port);
    } else {
        this.setGatewayOnline(false);
        log.error("Connect to [{}] CAN server failed, ip={}, port={}", unit.getName(), ipAddress, port, future.cause());
    }
});
  • Network Issues: Ensure that there are no network issues or firewall rules blocking the connections. Sometimes connection failures might be due to external network issues.
  • Handling Client Reconnection: If a client is offline and needs to reconnect, ensure the reconnect logic doesn’t interfere with the connection of other clients. Implement a backoff strategy if reconnect attempts are failing frequently.Suggestion: Introduce exponential backoff or retry limits to avoid flooding the server with reconnection attempts in case of persistent failures.
  • Scheduled Task Execution: Ensure that your @Scheduled task is executing as expected. The logs showing random moments might indicate the task isn’t running at the expected intervals.Suggestion: Verify the cron expression and ensure that the Spring Scheduler is correctly configured. You can also add logging to the start and end of your scheduled method to confirm its execution.
  • Client Map Management: Make sure the CLIENT_MAP is thread-safe, especially if it is accessed from multiple threads. If it’s being accessed concurrently, consider using ConcurrentHashMap or another concurrent collection.Suggestion: Check the implementation of CLIENT_MAP to ensure thread safety.