Tactile sensing in manipulation
GelSight, DIGIT, BioTac, e-skins — the sensors that finally give robots hands with feeling. What they measure, what to do with the data, and the workflow that turns 90% grasps into 99%.
Vision tells you where an object is. Tactile sensing tells you what's happening at the contact. For decades robots had vision and force/torque at the wrist — coarse compared to what a human fingertip provides. The 2018+ wave of vision-based tactile sensors (GelSight, DIGIT) changed that. In 2026, tactile sensing is starting to be standard equipment on serious manipulation platforms.
What tactile sensors actually measure
Different sensor types capture different aspects of contact:
- Force/torque (wrist-mounted): 6 DOF — net force and torque at the wrist. Coarse contact info.
- Vision-based fingertip (GelSight, DIGIT): a small camera looks through a soft elastomer. Object impressions deform the gel; the camera images the deformation. Sub-millimeter spatial resolution; frame rate 30–120 Hz.
- Capacitive arrays (BioTac): arrays of capacitive sensors on a fingertip. Detect texture, slip, vibration.
- Resistive e-skins: pressure-sensitive resistive elements on a flexible substrate. Cover larger areas (palm, arm); lower resolution.
- Piezo sensors: detect vibration / slip at high frequencies (1–10 kHz). Fast but limited spatial info.
Each modality answers different questions. Vision-based tactile (GelSight/DIGIT) currently dominates research because the data is image-shaped — easy to feed into convnets and VLAs.
What you can do with tactile data
Slip detection
The classical first use case. As an object starts to slip, characteristic vibrations appear in the high-frequency band. Detect them, increase grip force, prevent the drop.
Hand-engineered detectors work; learned classifiers (CNN on a window of tactile frames) work better. Either way, sub-100 ms response prevents most slips.
Grasp verification
After closing the gripper, the tactile pattern tells you what you're holding (or whether you're holding nothing). A textbook "no contact" tactile reading after a grasp attempt → retry. This is the major lift from 90% to 95% pick-and-place success.
Object pose estimation
The tactile imprint of a known object encodes its orientation in the gripper. With a learned model, you can recover sub-millimeter relative pose from a single tactile reading — useful for precise placement after pickup.
Texture and material recognition
Different materials produce different tactile signatures. A cloth feels different from a plastic cup. Learned classifiers achieve high accuracy on small material sets.
In-hand reorientation
Tactile data tells you where the object is and how it's moving inside your hand. Combined with policy learning, this closes the perception-action loop for dexterous manipulation. State-of-the-art papers (2024+) use tactile-conditioned policies that don't need vision after the initial grasp.
Shape exploration
Touch an unknown object many times; piece together its shape. Slow but works without vision; useful for occluded scenarios.
The 2026 sensor catalog
| Sensor | Type | Cost | Status |
|---|---|---|---|
| GelSight Mini | Vision-based | ~$1k | Research workhorse |
| DIGIT (Meta) | Vision-based | ~$300 | Open hardware |
| DIGIT 360 (Meta, 2024) | Vision + multimodal | ~$5k | Higher-end research |
| BioTac (SynTouch) | Capacitive + vibrational | ~$10k | Niche; bio-inspired |
| Robotous F/T | Wrist 6-DOF F/T | ~$2k | Industrial standard |
| Sensapex / iniLabs e-skin | Resistive arrays | ~$5k | Emerging for arms |
The minimum hardware setup
To start: get a Robotiq 2F-85 gripper + 2× GelSight Mini fingertips. ~$5k, plug-and-play with most cobots. You'll have:
- Real-time tactile imaging at 60 Hz.
- Slip detection out of the box (existing libraries).
- Grasp success/failure binary signal.
- Path to learned manipulation policies.
That's the production-realistic entry point.
Software pipelines
Hand-engineered slip detection
def detect_slip(tactile_history):
# Compute frame-to-frame difference; high-frequency content indicates slip
diffs = [np.linalg.norm(tactile_history[i+1] - tactile_history[i])
for i in range(len(tactile_history)-1)]
high_freq_energy = np.std(diffs[-10:])
return high_freq_energy > SLIP_THRESHOLD
Tune the threshold for your sensor and gripper. Works well enough for most production tasks.
Learned grasp success classifier
tactile_image = sensor.capture()
features = grasp_cnn(tactile_image)
success_prob = classifier(features)
if success_prob < 0.7:
retry_grasp()
Train on a dataset of successful + failed grasps with tactile readings labeled. Hundreds of examples are sufficient; thousands give better generalization.
Tactile-conditioned policies
The 2024+ pattern: feed tactile + vision + proprioception into a single policy network. Output joint actions. Treats tactile as just another sensor modality.
Frameworks: LeRobot supports tactile observations in its dataset format. The Tactile-VLM literature is the cutting edge.
Common gotchas
- Calibration drift: GelSight gels deform permanently after thousands of contacts. Replace gels every few thousand grasps.
- Lighting changes: vision-based tactile data can be sensitive to internal LED brightness drift. Re-calibrate periodically.
- Coordinate frames: tactile imprints come in fingertip frame; you usually want them in the world frame. Compose with TF.
- Frame rate matters: slip detection needs > 60 Hz. Cheap webcam-based DIGITs hit this; lower-frame sensors don't.
- Compute load: processing two 60 Hz tactile streams + vision + control runs out of CPU on an embedded controller. Offload tactile inference to a separate machine.
What tactile is starting to enable
- Robust pick-and-place: tactile verification + slip recovery. Production use.
- Cable / cord manipulation: combining vision + tactile to follow flexible objects.
- Precision insertion: tactile-guided assembly with sub-millimeter accuracy.
- Texture-based sorting: separating items by feel as well as look.
- Dexterous manipulation: in-hand rotation and reorientation, increasingly viable.
Where it's still research
- Generalist tactile foundation models (analogous to vision-language models).
- Tactile + language: "this feels rough" understood by VLAs.
- High-bandwidth tactile (1+ kHz) for very fast manipulation.
- Whole-body e-skins for full-arm contact awareness.
Exercise
If you have access to a DIGIT (cheapest entry), instrument it on a parallel-jaw gripper. Collect 50 grasps with various objects: 25 successful, 25 missing. Train a binary classifier on the tactile images. Deploy: every grasp goes through the classifier; failed grasps trigger a retry. The 5–10% bump in end-to-end success rate is what tactile delivers.
That's the Manipulation track done
You've covered the full progression: pick-and-place pipeline → grasp analysis → deep grasping → MoveIt for manipulation → trajectory generation → impedance for assembly → non-prehensile → dexterous → mobile manipulation → tactile. Together they trace the field from 1980s theory to 2026 production. Read papers in any of these subfields and the math will be familiar.
Comments
Sign in to post a comment.