Towards Automatic Photo-Identification of Cetaceans: A Fine-Grained, Few-Shot Problem in Marine Ecology

My PhD thesis was submitted to the Department of Electrical and Electronic Engineering, Newcastle University. Awarded June 2023. It can be viewed here.


In recent years there has been a concerted effort to apply computer vision techniques to areas which can have a positive societal impact. A highly important area where computer vision can help is conservation. One of the main goals of conservation research is to monitor animal populations in their distribution area, undertaking abundance estimates to inform policy change. This is most commonly performed using capture-recapture surveys where researchers identify the presence and abundance of individual animals in an area over time to produce population estimates. These surveys can be classified as invasive where animals are physically trapped, tagged, and released, or non-invasive where monitoring is performed passively such as via image collection.

Photo-id is one of the main non-invasive capture-recapture methods utilised by cetacean researchers. Surveys are usually undertaken from vessels at sea, although monitoring from coastlines or aircraft may also be utilised. Photo-id has been employed for the monitoring of multiple cetacean species, with proven use cases in a range of studies. Outside of cetaceans, photo-id has further found use studying other marine life and terrestrial species.

All capture-recapture methodologies rely on the target species having some form of individually identifiable markings. Depending on the species, different parts of the body are the primary identifying feature; for dolphins this is usually the dorsal fin as this body part is most likely to be visible above the waterline. During photo-id surveys, researchers often focus on long lasting stable markers such as dorsal fin shape, notches, scarring, and pigmentation. These markings can be difficult to capture in detail due to the free roaming nature of the animals causing high variances in angles of approach, direction of travel, distance from camera, and surfacing elevation. This is exacerbated when dealing with cetacean species which travel in pods, making it difficult to distinguish the individuals present.

Marine photo-id can be extremely labour and cost intensive compared to on-land surveys, which rely on the use of camera traps placed in stationary locations to capture images when they detect movement. This setup is not possible at sea due to the lack of stationary objects to attach devices to and rapid movement in the observed scene due to waves which would cause the camera to capture, producing a high false positive rate.

Upon survey completion, photo-id data must be analysed and individuals identified to produce a catalogue. Images collected during surveys are large in size and contain significant amounts of background noise. Historically curation of these data have been a manual process that often takes longer than the entire data collection period, further increasing labour and costs. As such, any techniques to speed up the curation process would be welcomed both by researchers and their funding bodies. As photo-id surveys are not guaranteed to capture all individuals in a given geographic area, naive approaches such as training a simple image classifier on existing catalogue examples do not suffice as they are incapable of flagging previously uncatalogued individuals.


To solve the above issues, my PhD research focussed around the development of a framework for fully automatic catalogue matching based on unprocessed photo-id imagery. This is achieved by a pipeline of trained computer vision models and robust post-processing techniques capable of automatic fin detection and most likely catalogue matching based on latent space similarity.

An overview of the developed framework for automatic photo-id

Images are passed through a Mask R-CNN dorsal fin detector, removing the need for manual data pre-processing. Detections are then post-processed ready for fine-grain, few-shot catalogue matching utilising a Siamese Neural Network trained using triplet loss to create a latent space based on the provided catalogue. Catalogue matches are obtained using the Euclidean distances between an input and class prototypes stored in the latent space, allowing for the flagging of potentially uncatalogued individuals to the researcher. As a result the proposed system vastly reduces the workload of researchers, affording more time to work on application of their data, for example to inform mitigation and policy change rather than curation.

The Northumberland Dolphin Dataset

Very few open-source datasets exist for use within a conservation or ecological space; those that do often focus on object detection of animals in a scene or species level identification. Some large scale datasets showing animals in natural environments do exist, although these often only provide labels at a species level, which is not fine-grained enough for population estimation which requires the identification of individuals. Of the datasets which do allow for species identification currently, most primarily focus on the development of land- based camera trap systems, although work has also been undertaken in the development of marine life species detection systems.

In order to aid the development of automatic photo-id systems, including my own, part of my PhD focussed on the development of the Northumberland Dolphin Dataset, a challenging image dataset annotated for both coarse and fine-grained instance segmentation and categorisation.

The dataset has also recieved interest from outside of the computational ecology space, utilised as a zero-shot evaluation dataset for works such as Meta’s Segment Anything.

The Northumberland Dolphin Dataset is publicly available, and can be downloaded here.