MoreSMIRK

Summary
MoreSMIRK dataset enhances the existing SMIRK dataset by extending its single-pedestrian-only design to multi-pedestrian crossing scenarios. MoreSMIRK dataset contains a total of 104 sequences that systematically construct a dictionary of multiple pedestrian crossing situations. Each sequence represents a specific crossing configuration in the perspectives of pedestrian grouping and crossing direction. The dataset supports the development of PCICF, an end-to-end framework to identity and classify the pedestrian crossing in real-world traffic.
Construction Principle
Each sequence in MoreSMIRK dataset represents a unique pedestrian crossing event that focuses on two properties: i) initial location, ii) grouping configuration, as shown in following figure.

The initial location represents where pedestrians start to cross, either from left to right or from right to left. The grouping configuration vary up to three individuals, clearly separated and following each other when crossing a road. This is because three people following each other with some spacing in between would almost occupy half of the street to cross, and if two groups cross from both sides, the entire area in front of the ego vehicle is nearly fully covered.
An important feature in MoreSMIRK dataset is Region of Interests (RoI), shown as red grids in figure. Based on the activation of RoI grid, the offset of maximum five is invoked to delay the start of the pedestrians X on the left. i.e., offset=1 means when pedestrian X on the left reaches the RoI grid 0, the right pedestrian Y is already located at RoI grid 4.
Thus, a MoreSMIRK sequence, i.e., ‘_ _ X; 0; Y _ _’, represents a crossing event that one pedestrian crosses from the left and one pedestrian crosses from the right with the zero offset. Please refer to here for full configurations of all sequences.
More details of the construction principle can be found on Section 3.1 in this paper.
Technical Details
- A total of 104 sequences and each sequence contains 100 RGB frames.
- There is semantic ground truth annotation for each RGB frame. Both of them are in PNG format with size 640x480.
- The sequences are organized in folder ’event_0’ to ’event_103’. Inside each event folder, the RGB and annotation PNG images are organized from 000 to 099.
- Each RGB image is ~250KB, each annotation image is ~5B. The whole MoreSMIRK archive file is ~2.8GB.
Terms ad Conditions
- The MoreSMIRK dataset is owned by the University of Gothenburg, Sweden.
- The dataset is licensed under CC BY-NC 4.0
- The dataset maybe downloaded from the Data Factory.
- Any public use, distribution, or display of this dataset have to contain proper attribution as set out in the license including the following reference to the creators of the licensed material.
Gu, J., Cabrero-Daniel, B., Nouri, A., Armini, L., & Berger, C. (2025). PCICF: A Pedestrian Crossing Identification and Classification Framework. arXiv preprint arXiv:2509.24386.
Supplemental Materials
- PCICF arXiv preprint paper
- PCICF GitHub repository