1,720,972 research outputs found
Querying moving events in wireless sensor networks
The detection and tracking of composite events in Wireless Sensor Networks often employ ad-hoc solutions that aim at detecting and tracking only specific types of events or use generic query languages that are not specifically built to manage events. We propose a new query language and an in-network query processing solution that enable the definition of queries to track and gather information from events, using wireless sensor networks. The proposed language provides clauses aimed at defining dynamic tracking tasks and the autonomous migration of the queries on the network, depending on the event mobility. We describe the query model and the language, discuss its implementation, and present the results of the comparison with a TinyDB-like approach. We show that our approach is scalable with event mobility and that it is more energy efficient than TinyDB-like approache
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from MVK Dataset
<p>This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (<a href="https://www.videobrowsershowdown.org/">https://www.videobrowsershowdown.org/</a>). </p>
<p>We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at <a href="https://download-dbis.dmi.unibas.ch/mvk/">https://download-dbis.dmi.unibas.ch/mvk/</a>. It comprises 1,372 video files. We divided each video into 1 second segments. </p>
<p>This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:</p>
<blockquote>
<pre>@inproceedings{amato2023visione,
title={VISIONE at Video Browser Showdown 2023},
author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio},
booktitle={International Conference on Multimedia Modeling},
pages={615--621},
year={2023},
organization={Springer}
}</pre>
</blockquote>
<p> </p>
<p>This repository comprises the following files:</p>
<ul>
<li><strong><em>msb.tar.gz </em></strong> contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione"). </li>
<li><em><strong>extract-keyframes-from-msb.tar.gz</strong></em> contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.</li>
<li><strong><em>features-aladin.tar.gz<sup>†</sup></em> </strong>contains <a href="https://github.com/mesnico/ALADIN">ALADIN</a> [Messina N. et al. 2022] features extracted for all the segment's middle frames. </li>
<li><em><strong>features-clip-laion.tar.gz<sup>†</sup></strong></em> contains <a href="https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K">CLIP ViT-H/14 - LAION-2B </a>[Schuhmann et al. 2022] features extracted for all the segment's middle frames.</li>
<li><em><strong>features-clip-openai.tar.gz<sup>†</sup> </strong></em>contains <a href="https://huggingface.co/openai/clip-vit-large-patch14">CLIP ViT-L/14</a> [Radford et al. 2021] features extracted for all the segment's middle frames. </li>
<li><em><strong>features-clip2video.tar.gz<sup>†</sup> </strong></em>contains <a href="https://github.com/CryhanFang/CLIP2Video">CLIP2Video</a> [Fang H. et al. 2021] extracted for all the 1s video segments. <strong> </strong></li>
<li><em><strong>objects-frcnn-oiv4.tar.gz<sup>*</sup> </strong></em>contains the objects detected using <a href="http://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1">Faster R-CNN+Inception ResNet</a> (trained on the Open Images V4 [Kuznetsova et al. 2020]). </li>
<li><em><strong>objects-mrcnn-lvis.tar.gz<sup>*</sup></strong></em> contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).</li>
<li><em><strong>objects-vfnet64-coco.tar.gz<sup>*</sup></strong></em> contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).</li>
</ul>
<p>*Please be sure to use the <strong>v2 version </strong>of this repository, since v1 feature files may contain inconsistencies that have now been corrected</p>
<p><em><strong>*Note on the object annotations:</strong></em> Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the <em>"_id"</em> corresponds to the <em>"id_visione"</em> used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:</p>
<ul>
<li><em>"object_class_names"</em>: vector with the class name of each detected object.</li>
<li><em>"object_scores"</em>: scores corresponding to each detected object.</li>
<li><em>"object_boxes_yxyx"</em>: bounding boxes of the detected objects in the format <em>(ymin, xmin, ymax, xmax).</em></li>
</ul>
<p> </p>
<p><em><strong><sup>†</sup>Note on the cross-modal features: </strong></em>The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that t<strong>he service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.</strong></p>
<p>We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.</p>
<p> </p>
<p><strong>References:</strong></p>
<p>[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.</p>
<p>[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. </p>
<p>[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.</p>
<p>[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).</p>
<p>[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.</p>
<p>[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.</p>
<p>[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).</p>
<p>[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.</p>
<p>[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.</p>
<p>[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV</p>
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset
<p>This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (<a href="https://www.videobrowsershowdown.org/">https://www.videobrowsershowdown.org/</a>).</p>
<p>The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at <a href="https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/">https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/</a>.</p>
<p>It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.</p>
<p>We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.</p>
<p>This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:</p>
<blockquote>
<pre>@inproceedings{amato2023visione,
title={VISIONE at Video Browser Showdown 2023},
author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio},
booktitle={International Conference on Multimedia Modeling},
pages={615--621},
year={2023},
organization={Springer}
}</pre>
</blockquote>
<p> </p>
<p>This repository comprises the following files:</p>
<ul>
<li><strong><em>msb.tar.gz </em></strong> contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione"). </li>
<li><em><strong>extract-keyframes-from-msb.tar.gz</strong></em> contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.</li>
<li><strong><em>features-aladin.tar.gz<sup>†</sup></em> </strong>contains <a href="https://github.com/mesnico/ALADIN">ALADIN</a> [Messina N. et al. 2022] features extracted for all the segment's middle frames. </li>
<li><em><strong>features-clip-laion.tar.gz<sup>†</sup></strong></em> contains <a href="https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K">CLIP ViT-H/14 - LAION-2B </a>[Schuhmann et al. 2022] features extracted for all the segment's middle frames.</li>
<li><em><strong>features-clip-openai.tar.gz<sup>†</sup> </strong></em>contains <a href="https://huggingface.co/openai/clip-vit-large-patch14">CLIP ViT-L/14</a> [Radford et al. 2021] features extracted for all the segment's middle frames. </li>
<li><em><strong>features-clip2video.tar.gz<sup>†</sup> </strong></em>contains <a href="https://github.com/CryhanFang/CLIP2Video">CLIP2Video</a> [Fang H. et al. 2021] extracted for all the video segments. <strong> </strong>In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s. </li>
<li><em><strong>objects-frcnn-oiv4.tar.gz<sup>*</sup> </strong></em>contains the objects detected using <a href="http://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1">Faster R-CNN+Inception ResNet</a> (trained on the Open Images V4 [Kuznetsova et al. 2020]). </li>
<li><em><strong>objects-mrcnn-lvis.tar.gz<sup>*</sup></strong></em> contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).</li>
<li><em><strong>objects-vfnet64-coco.tar.gz<sup>*</sup></strong></em> contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).</li>
</ul>
<p>*Please be sure to use the <strong>v2 version </strong>of this repository, since v1 feature files may contain inconsistencies that have now been corrected</p>
<p><em><strong>*Note on the object annotations:</strong></em> Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the <em>"_id"</em> corresponds to the <em>"id_visione"</em> used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:</p>
<ul>
<li><em>"object_class_names"</em>: vector with the class name of each detected object.</li>
<li><em>"object_scores"</em>: scores corresponding to each detected object.</li>
<li><em>"object_boxes_yxyx"</em>: bounding boxes of the detected objects in the format <em>(ymin, xmin, ymax, xmax).</em></li>
</ul>
<p> </p>
<p><em><strong><sup>†</sup>Note on the cross-modal features: </strong></em>The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that t<strong>he service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.</strong></p>
<p>We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.</p>
<p> </p>
<p><strong>References:</strong></p>
<p>[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.</p>
<p>[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. </p>
<p>[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.</p>
<p>[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).</p>
<p>[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.</p>
<p>[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.</p>
<p>[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).</p>
<p>[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.</p>
<p>[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.</p>
<p>[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).</p>
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset
<p>This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (<a href="https://www.videobrowsershowdown.org/">https://www.videobrowsershowdown.org/</a>). </p>
<p>We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at <a href="https://download-dbis.dmi.unibas.ch/mvk/">https://download-dbis.dmi.unibas.ch/mvk/</a>. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.</p>
<p>This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:</p>
<blockquote>
<p>@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} } </p>
</blockquote>
<p> </p>
<p>This repository (v2) comprises the following files:</p>
<ul>
<li><em><strong>msb.tar.gz </strong></em> contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").</li>
<li><em><strong>extract-keyframes-from-msb.tar.gz</strong></em> contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.</li>
<li><em><strong>features-aladin.tar.gz†</strong></em><strong> </strong>contains <a href="https://github.com/mesnico/ALADIN">ALADIN</a> [Messina N. et al. 2022] features extracted for all the segment's middle frames.</li>
<li><em><strong>features-clip-laion.tar.gz†</strong></em> contains <a href="https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K">CLIP ViT-H/14 - LAION-2B </a>[Schuhmann et al. 2022] features extracted for all the segment's middle frames.</li>
<li><em><strong>features-clip-openai.tar.gz† </strong></em>contains <a href="https://huggingface.co/openai/clip-vit-large-patch14">CLIP ViT-L/14</a> [Radford et al. 2021] features extracted for all the segment's middle frames.</li>
<li><em><strong>features-clip2video.tar.gz† </strong></em>contains <a href="https://github.com/CryhanFang/CLIP2Video">CLIP2Video</a> [Fang H. et al. 2021] extracted for all the video segments. <strong> </strong></li>
<li><em><strong>objects-frcnn-oiv4.tar.gz* </strong></em>contains the objects detected using <a href="http://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1">Faster R-CNN+Inception ResNet</a> (trained on the Open Images V4 [Kuznetsova et al. 2020]).</li>
<li><em><strong>objects-mrcnn-lvis.tar.gz*</strong></em> contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).</li>
<li><em><strong>objects-vfnet64-coco.tar.gz*</strong></em> contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).</li>
</ul>
<p>*Please be sure to use the <strong>v2 version </strong>of this repository, since v1 feature files may contain inconsistencies that have now been corrected</p>
<p><em><strong>*Note on the object annotations:</strong></em> Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the <em>"_id"</em> corresponds to the <em>"id_visione"</em> used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:</p>
<ul>
<li><em>"object_class_names"</em>: vector with the class name of each detected object.</li>
<li><em>"object_scores"</em>: scores corresponding to each detected object.</li>
<li><em>"object_boxes_yxyx"</em>: bounding boxes of the detected objects in the format <em>(ymin, xmin, ymax, xmax).</em></li>
</ul>
<p> </p>
<p><em><strong>†Note on the cross-modal features: </strong></em>The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that t<strong>he service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.</strong></p>
<p>We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.</p>
<p> </p>
<p><strong>References:</strong></p>
<p>[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.</p>
<p>[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. </p>
<p>[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.</p>
<p>[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).</p>
<p>[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.</p>
<p>[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.</p>
<p>[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).</p>
<p>[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.</p>
<p>[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.</p>
<p>[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV</p>
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset
<p>This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (<a href="https://www.videobrowsershowdown.org/">https://www.videobrowsershowdown.org/</a>). </p><p>We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at <a href="https://download-dbis.dmi.unibas.ch/mvk/">https://download-dbis.dmi.unibas.ch/mvk/</a>. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.</p><p>This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:</p><blockquote><p>@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} } </p></blockquote><p> </p><p>This repository comprises the following files:</p><ul><li><i><strong>msb.tar.gz </strong></i> contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").</li><li><i><strong>extract-keyframes-from-msb.tar.gz</strong></i> contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.</li><li><i><strong>features-aladin.tar.gz†</strong></i><strong> </strong>contains <a href="https://github.com/mesnico/ALADIN">ALADIN</a> [Messina N. et al. 2022] features extracted for all the segment's middle frames.</li><li><i><strong>features-clip-laion.tar.gz†</strong></i> contains <a href="https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K">CLIP ViT-H/14 - LAION-2B </a>[Schuhmann et al. 2022] features extracted for all the segment's middle frames.</li><li><i><strong>features-clip-openai.tar.gz† </strong></i>contains <a href="https://huggingface.co/openai/clip-vit-large-patch14">CLIP ViT-L/14</a> [Radford et al. 2021] features extracted for all the segment's middle frames.</li><li><i><strong>features-clip2video.tar.gz† </strong></i>contains <a href="https://github.com/CryhanFang/CLIP2Video">CLIP2Video</a> [Fang H. et al. 2021] extracted for all the video segments. <strong> </strong></li><li><i><strong>objects-frcnn-oiv4.tar.gz* </strong></i>contains the objects detected using <a href="http://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1">Faster R-CNN+Inception ResNet</a> (trained on the Open Images V4 [Kuznetsova et al. 2020]).</li><li><i><strong>objects-mrcnn-lvis.tar.gz*</strong></i> contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).</li><li><i><strong>objects-vfnet64-coco.tar.gz*</strong></i> contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).</li></ul><p> </p><p><i><strong>*Note on the object annotations:</strong></i> Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the <i>"_id"</i> corresponds to the <i>"id_visione"</i> used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:</p><ul><li><i>"object_class_names"</i>: vector with the class name of each detected object.</li><li><i>"object_scores"</i>: scores corresponding to each detected object.</li><li><i>"object_boxes_yxyx"</i>: bounding boxes of the detected objects in the format <i>(ymin, xmin, ymax, xmax).</i></li></ul><p> </p><p><i><strong>†Note on the cross-modal features: </strong></i>The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that t<strong>he service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.</strong></p><p>We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.</p><p> </p><p><strong>References:</strong></p><p>[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.</p><p>[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. </p><p>[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.</p><p>[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).</p><p>[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.</p><p>[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.</p><p>[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).</p><p>[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.</p><p>[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.</p><p>[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV</p>
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
